Enhancing the JPEG Ghost Algorithm using Machine Learning

(1)

Master of Science in Computer Science September 2020

Enhancing the JPEG Ghost Algorithm using Machine Learning

Siddharth Rao Gondlyala

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

(2)

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulﬁlment of the requirements for the degree of Master of Science in Computer Science.

The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identiﬁed as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:

Author(s):

Siddharth Rao Gondlyala E-mail: sigo18@student.bth.se

University advisor:

Abbas Cheddad (Senior lecturer/Associate professor) Department of Computer Science

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

Abstract

Background: With the boom in the internet space and social media platforms, a large number of images are being shared. With this rise and advancements in technology, many image editing tools have made their way in giving rise to digital image manipulation. Being able to diﬀerentiate a forged image is vital to avoid misinfor- mation or misrepresentation. This study focuses on the splicing image forgery to localizes the forged region in the tampered image.

Objectives: The main purpose of the thesis is to extend the capability of the JPEG Ghost model by localizing the tampering in the image. This is done by analyzing the diﬀerence curves formed by compressions in the tampered image, and thereafter comparing the performance of the models.

Methods: The study is carried out by two research methods; one being a Litera- ture Review, whose main goal is gaining insights on the existing studies in terms of the approaches and techniques followed; and the second being Experiment; whose main goal is to improve the JPEG ghost algorithm by localizing the forged area in a tampered image and to compare three machine learning models based on the performance metrics. The machine learning models that are compared are Random Forest, XGBoost and Support Vector Machine.

Results: The performance of the above-mentioned models has been compared with each other on the same dataset. Results from the experiment showed that XGBoost had the best overall performance over other models with the Jaccard Index value of 79.8%.

Conclusions: The research revolves around localization of the forged region in a tampered image using the concept of JPEG ghosts. This is We have concluded that, the performance of XGBoost model is the best, followed by Random Forest and then Support Vector Machine.

Keywords: JPEG Ghost, Splicing forgery, Localization, Image Processing, Machine Learning

(4)

(5)

Acknowledgments

Being able to end this journey of my education is something that has been in my sights since the past 5 years. I am ﬁnally at the end of the generic question, "Where do you see yourself in 5 years time". Haha! Looking back, I believe I have learned, struggled and achieved in these past 5 years! I have been waiting for the moment to write the Acknowledgments in my Thesis document for over 8 months now! Haha!

Here’s To Those Crazy Ones!

I would ﬁrstly like to thank my supervisor, Abbas Cheddad, for his guidance, his relentless support and patience. This study would not have been possible to complete without his guidance. Thank you for everything!

I would also take the opportunity to thank my parents and family for their constant support, unconditional trust and incredible motivation over the course of my education. I know that there have been phases in my education where I was not performing upto their expectations, but I was never burdened and was given my own room of comfort and growth.

I would also like to thank my friends, with whom I have struggled day and night completing our own respective Theses, and helping each others in any way possible if stuck, and motivating all along the way! Tum log best che!

And most importantly, I would like to thank Shiksha, who has been a strong and constant support for the past 3 years, and will be for the years to come! Thank you for everything!

iii

(6)

(7)

List of Figures

2.1 Examples of forgery: (a) Original Image (b) Forged Image . . . 10 2.2 Image forgery with post-processing technique (a) Original Image (b)

Forged Image . . . 10 2.3 JPEG Ghost Example [24] . . . 13 4.1 Example of images chosen in dataset [62] (a) Original Image (b) Forged

Image (c) Ground truth . . . 25 4.2 Example of images chosen in dataset [62] (a) Original Image (b) Forged

Image (c) Ground truth . . . 25 5.1 (a) Forged Image (b) Ground truth corresponding to the image . . . 31 5.2 Models Comparison . . . 33 5.3 Localization of forgery (a) XGBoost (b) SVM (c) Random Forest . . 34 5.4 Localization of forgery (a) SVM (b) Random Forest (c) XGBoost . . 34 5.5 Localization of forgery in XGBoost model . . . 34

1

(10)

(11)

List of Tables

4.1 Environment . . . 23 5.1 Models Comparison . . . 32

1

(12)

(13)

Nomenclature

BDCT Block Discrete Cosine Transform.

CASIA Chinese Academy of Sciences’ Institute of Automation.

CNN Convolutional Neural Network.

DCT Discrete Cosine Transform.

JPEG Joint Photographic Experts Group.

PNG Portable Network Graphics.

RBF Radial Basis Function.

RF Random Forest.

RGB Red Green Blue.

RoI Region of Interest.

SVM Support Vector Machine.

TIFF Tagged Image File Format.

UCID Uncompressed Color Image Database.

XGBoost eXtreme Gradient Boosting.

3

(14)

(15)

Chapter 1 Introduction

Over the past few years, with the increased accessibility to the internet across the globe, there has been an exponential increase in active users on the internet. This is majorly due to the social media revolution, which was spearheaded by the coming of Facebook, Twitter, Instagram, Reddit, and the list goes on. People shared articles, photos, videos, thoughts, posts, etc. to the world. This led to the usage of basic tools to enhance the image quality and appearance. These platforms came at the perfect time, and all the global news coverages, images, stories became at our ﬁngertips. It is indeed said that “actions speak louder than words”. However, with this rise, also came the rise of digital image technology and digital image manipulation. An image is said to be tampered or forged, if the image has been enhanced, manipulated or changed to achieve the desired results.

Image manipulation/Image forgery is referred to any operation done either to enhance the image or add any incorrect elements to the input image, by using a software on a computer or mobile device or a tablet. Image forgery is speciﬁc to images which are forged or tampered in order to deceive the viewer. Tampered contents often leave no visual traces [85].

The tampering or editing of an image has become much more straightforward as there are multiple photo-friendly editing software tools available to help an inexpe- rienced forger as well [61]. These are with the help of tools like Aﬃnity Photo, GNU image manipulation programs (GIMP) [76], Snapseed, Paintshop, Adobe Photoshop [68], and Pixlr.

It is now easier to create false images and to spread false propaganda with the coming of user-friendly and advanced image processing tools. Due to this, without a reliable forensic analysis report, no images or videos are entertained in the court of law. The information gathered from an image, however appealing or worthy it may look, is not authentic without a reliable forensic analysis report [5].

The usage of image manipulation techniques has always had its pros and cons. In times like these , we can mostly recollect the negative aspects. However, it is most prominently used for commercial purposes in VFX studios and adding special eﬀects in movies like the Marvel series, Harry Potter series, and many more, and in turn, giving us an experience of a lifetime. [5]

The process of proving if the input image is authentic or forged is called forgery detection. Finding the forged area or region in the forged image is called localization.

There has been ample research when it comes to the detection of spliced image forgery, but there are relatively fewer algorithms that address the localization of the forged area. Localization of the forged area/region is considered more useful than just

5

(16)

6 Chapter 1. Introduction detection of the forged image, as localization gives the exact location of the forgery whereas in detection, the location of the forgery is unknown. Only information on whether the image is authentic or fake is given [74]. Localization is determining the pixels in the forged image, which have been spliced [64].

It is often time-consuming and most importantly, challenging for a human to detect the forgery in an image, given the lengths that a forger goes, in order to make the image look more and more convincing. It is of utmost importance to reduce/remove the human aspect as far as possible, and have automated algorithms to ensure quick veriﬁcation process and also not miss anything which may be a human eye couldn’t catch [3].

JPEG Ghost [24] is a statistical algorithm in which we hypothesize that it could be boosted using Machine Learning. We intend to eliminate the manual scanning of images and also localize the forged region in the input image.

1.1 Problem Statement

JPEG Ghost algorithm was initially spearheaded by Farid [24], in which the authors proposed that by computing the diﬀerences in the quality factor of the set of images, a darker region or a ghost region is formed. This ghost region was formed over the forged area. We have discussed Farid’s model in-depth in Section 2.4. The main drawback was that the process was not automated, and required human interaction, and also lacked the mechanism to localize the forged area. There have been multiple researches using the concept of JPEG Ghost, and after reviewing and understanding the past literature’s, many have addressed the above issues and automated the process but have gone only as far as detection on whether the image is real or fake.

In this study, we attempt to automatically localize the area of forgery with the concept of JPEG Ghost algorithm using Machine Learning.

1.2 Aim and Objectives

Aim: The thesis aims to extend the capability of the JPEG Ghost by automatically localizing the forged area in the digitally enhanced image using Machine Learning models and comparing the performance of these models.

Objectives:

• To understand the various existing techniques which localize the tampered area.

• To build a model which can localize the tampered area in tampered images.

• To compare the performance of the implemented model.

(17)

1.3. Research Questions 7

1.3 Research Questions

RQ1: What are the existing techniques in identifying the tampered area in a forged image?

Motivation: This research question is answered by conducting a Literature Review.

Through this research question, we gather knowledge and insights on the previous approaches and methods used in the localization of the forged region.

RQ2: Which Machine Learning model provides better results and performance in localizing the tampered area in the input image of the JPEG Ghost algorithm?

Motivation: This research question is answered by conducting an Experiment. We consider three models namely, XGBoost, SVM and Random Forest, and compare the three models through the Performance Metrics. Percentage of localization of the forged area is calculated using Jaccard Index as the performance metric.

1.4 Ethical Aspects

The datasets selected for our study are publicly available datasets and is used for research purposes. The information present in the datasets is in accordance with the GDPR regulations and do not contain any personal information. In addition to that, publicly available python libraries have been used in the experiment.

1.5 Outline

Chapter 1 introduces the topic brieﬂy, and gives insights on the aim and objectives of the study along with the Research Questions and a brief Problem Statement. Chap- ter 2 discusses the background of this study and describes all the important concepts related to our study. Chapter 3 consists of related work, which presents previous researches by the authors on the ﬁeld of detection and localization of image forgery.

Chapter 4 describes the research methodologies selected to answer the research questions formulated in this study. Chapter 5 presents the results and analysis of the methodology. Chapter 6 presents the discussion regarding the results, and the study as a whole. Chapter 7 sums up the research in conclusion and provides the future work in the area of study.

(18)

(19)

Chapter 2 Background

2.1 Image Forgery

The study of tools and methods for the distinction of authentic images from the digitally manipulated is called Digital Image Forensics (DIF). Due to the coming of age in the advancements in image manipulation techniques, this has become an emerging subject and research related. DIF is divided into two main categories:

Active methods: These consist of images that are generally marked with digital watermarking and digital signatures. In these, the owners ideally put their signatures in order to be seen and also mark it as authentic [14]. However, a limitation of this method is that the quality of the image can reduce due to the addition of the watermarks or signatures, and also prior information of the image is required [39] [2].

Passive methods (Blind): Passive forgery detection is more challenging since the algorithms are required to deal with unknown images with no prerequisites [75].

There is no prior information on any protective methods such as watermarks or signatures [5].

Fig 2.1 and Fig 2.2 are examples of forgery in images. An image can be tampered or forged in two ways, Inter image tampering and Intra image tampering. If the ﬁnal image is the combination of 2 or more images to achieve the desired result, it is the former, whereas if the image is tampered or enhanced without the use of any new images, it is the latter [20].

Copy-move: When a part of some image is copied and pasted on some other part of the same image, this phenomenon is called copy-move image forgery. It is considered relatively easy, and there is ample research in this ﬁeld [5].

Splicing: When regions of two or images are merged into the same or diﬀerent images, this phenomenon is called splicing forgery. This often causes some diﬀerences or inconsistency on the boundaries, which is further removed using post-processing techniques.

9

(20)

10 Chapter 2. Background

Figure 2.1: Examples of forgery: (a) Original Image (b) Forged Image

In Fig 2.1 (img src: Google), the image to the left is the original image, while the image to the right is the forged image. If you look closely, one could see the forgery on the eyebrows of the cat. In addition to that, the chest area of the cat has also been tampered with.

The structural changes that occurred in the forged image due to Inter or Intra image tampering are hard to grasp with the human eye. According to the studies done by the authors of [56] [66] the ability to distinguish between an authentic and a forged image by humans is to minimal capacity. Hence, it is of utmost importance to understand the techniques proposed in the past, whether they were reliable and robust enough to examine the input image and state whether the said image is tampered or forged [5].

Forgers are using more and more post-processing operations on the tampered image in order to make the image look more realistic. This often erases the unique traces left behind on the tampered image, and makes it more diﬃcult to detect the tampered region.

Some other common manipulation techniques are cut-paste, erase-fill, Median filtering, erase-fill, resizing and re-sampling an image, composting, seam curve, blend- ing, retouching, matting, contrast enhancement, multiple JPEG compression, color adjustment.

Figure 2.2: Image forgery with post-processing technique (a) Original Image (b) Forged Image

Fig 2.2 (img src: Google) is an example of how post-processing methods can hide any visual tampering eﬀects. The image on the left is the original, while to the right

(21)

2.2. Machine Learning 11 is the forged image. The cyclist has been forged into the image. The image has also been enhanced in terms of the contrast, brightness etc. of the background.

Looking at the Fig 2.2 (b) image individually, we would not be able to distinguish the authenticity of the image. Hence, it is of utmost importance to be able to diﬀerentiate between tampered and original, and also be able to localize the forgery.

For most of the already existing approaches in image forgery detection and localization; data processing is initially carried out, and relevant features are extracted from the image, which is then fed into a classiﬁer for training, and thereafter the models are trained according to the objectives of their respective studies.

2.2 Machine Learning

Machine Learning is a subject that deals with the usage of computers to simulate human learning activities and find out self-improvement methods for computers that help in obtaining new knowledge and skills, identifying existing knowledge, and con- tinuously enhance the performance and achievement. Compared to human learning, machine learning learns faster, and the accumulation of information makes the learning results easier and quicker to process. Thus, any human progress in the field of machine learning would enhance the capability of computers and thus affect on human society [71].

Supervised Learning Algorithms In supervised machine learning, the anno- tated data is considered the “training” set, while the “unannotated” data is the testing set. When annotations are discrete and unordered, they are called class identiﬁers, when they are continuous numerical values, they are called continuous target (or output) variables. Training and testing sets are composed of instances (in our case referred to as the data set). The instances are typically represented by a ﬁxed-size set of numeric or nominal variables; each variable in this set is called a function (or predictor) and represents an instance property [23].

Unsupervised Learning Algorithms: The unsupervised learning algorithms learns few features from data. When new data is entered, the functionality previously learned to identify the class of the data. These learning algorithms are used primarily in clustering and feature reduction [21].

2.3 JPEG Compressions

JPEG is the most popular and widely used format in which images are saved. When a forger decides to tamper an image, the image is first compressed to its suitability. As mentioned above, while forging, a part of an image is cropped either from the same image or a different image and added into the desired location. While doing this, the forger compresses the cropped part in order to align it with the image and places it in the location. The forger recompresses the image and applies post-processing tools if any, to make the image look more realistic. This causes the image to have different compressions, which to the human eye is deceivable, but when imputed into a classifier after proper feature extraction, can be detected and often also be localized. The forged area is of double compression, while the original area of the

(22)

12 Chapter 2. Background image is compressed once. This clue is the basis on which we have developed this study.

One of the major advantages and reasoning of why we are considering JPEG based forgery, is the fact that ﬁrstly it is the most widely saved ﬁle format. Secondly, compressions in JPEG images causes lossy compression. This means that, certain actual contents of the image is lost. The forgers would also remove such losses, and hence add post-processing techniques as mentioned above. This is an added help in order to know if the image is tampered with or not, and also thereafter locate the forgery.

The addition of Gaussian white noise, Gaussian smoothing, edge smoothing and JPEG compression are the most common post-processing operations that are generally applied by the forger to make the image look more realistic. In-depth algorithms or techniques need to be applied in order to understand or identify the tampered images, as these post-processing techniques make it more challenging to detect or localize the presence of forgery [75].

In this research, we intend to target the compressions of the image, and thereby be able to detect and localize the area which is tampered. We use the concept of JPEG ghost algorithm, which was introduced initially by Farid et al. [24] and analyze the curves formed due to the compressions in the tampered image. JPEG Ghost is a clever and a good algorithm, but lacks the mechanism to localize the forged areas.

2.4 Farid’s Ghost Observation

Farid et al. [24] explores the compressions related to the low-quality JPEG images.

The authors proposed a method in which a part of the image was compressed at a different quality level than that of that image, and the JPEG Ghost would detect it. The approach is based on the concept of differentiating that the entire input image will be of a higher quality than that of the forged part. According to the authors, the approach was only effective if the forged area that is inserted is of lower quality than that of the entire image. An equation is devised which considers the difference, which is computed directly from the pixel values, rather than calculating the difference between the quantized DCT coefficients.

A set of coefficients c are obtained which are quantized by a value q, if the image is converted into a block, and each block is converted to frequency space by using a 2 Dimensional DCT (Discrete Cosine Transform). Let us now consider an amount q0, which quantizes a coefficientc0. Similarly, the amountq1 quantizes coefficientc1, and q₂ quantizes coefficient c₂, where q₁<q₀. According to Farid, in his evaluation, when q2=q1, the difference in coefficients, will be minimal, and a second minimal would ideally be found when q2=q0.

This minimal, when viewed closely, formed a darker region than the rest, and was termed as JPEG Ghost. This is because the squared diﬀerences became smaller for this part of the image as q₀ approaches q₀.

Instead of computing the differences between the DCT quantized coefficients, we can compute the differences in the curves that are formed after the said compressions.

The curves formed due to the single and double compression areas are distinct. The Regions of Interest are computed, and a comparison can be made between singly and

(23)

2.4. Farid’s Ghost Observation 13 doubly compressed.

Using the equation proposed in Farid’s paper, D_q₂(x, y) = 1

3

i∈{R,G,B}

(I_q₁(x, y, i) − I_q₂(x, y, i))² (2.1)

Where Iq1 is the image compressed with quality q1, and similarly Iq2 is the re- compression of I_q₁, with quality q₂. In layman’s terms, I_q₁ is a singly compressed image, while Iq2 is the double compressed image. Dq2 is the diﬀerence image, and x and y denote the pixel coordinates, and i {R, G, B} the red, green, and blue color channels.

Figure 2.3: JPEG Ghost Example [24]

In the Figure 2.3, the image on the top left corner is the orginal image, while the image on the top right corner is the fake image, with the yellow car in the air being the tampered area. Below, are the collection of images, which are at various quality levels; and if seen carefully, there is a darkened region being formed in the area of

(24)

14 Chapter 2. Background the tampering. This darkened region is termed as the ghost area of the image. In other words, this area shows the forged part of the image.

(25)

Chapter 3 Related Work

To develop a better understanding about the existing techniques, several scientiﬁc papers have been referred to. This has helped in providing an insightful view on various existing algorithms as well as the aspects of the problem. There has been an ample amount of research that has been done by authors from across the globe.

There are various techniques which have been explored by authors in the detection of forgery/splicing in images based on geometric properties [44], [19], [79], motion blur [36], chromatic abbreviation [35], [45], inconsistencies of re-sampling [58], [59], feature based [70], multiple survey papers [52], [48], [25] and reviews [49], [4].

3.1 Previous research advancements in detection of tampered image

While performing splicing forgery, the resultant image leaves some artifacts and un- wanted traces. Forgers eliminate these by adopting retouching operations or blurring to give a smoother effect. This is achieved by averaging the values of the neigh- bouring pixels. Liu et al. [43] proposed a method for detecting images that have been forged with artificial blurred boundary, in which Non-Sub-sampled Contourlet Transform(NSCT) is used to distinguish the edge types. The blurred edge points are thereby detected using 3 SVM’s. Defocus is eliminated by defining a local defi- nition and filtering out the blurred edge points. Mathematical Morphology was an approach proposed by Zhou et al. to detect blurred edges [86].

Asghar et al. [6] takes a diﬀerent approach and considers the structural changes in terms of contrast and texture in the forged image after forgery. In order to encode the structural changes, the author employs Discriminative Robust Local Binary Pat- terns (DRLBP). Forged Real Images Throughout History (FRITH) is a new dataset explicitly designed by the authors which contain a compilation of real forged images, false captioning, real splicing, and copy-move forgeries.

Wang et al. [72] considered the YCbCr format of the image and adopted Gray Level Co-occurrence Matrix (GLCM) to detect the spliced image forgery .In their succeeding paper [73], Wang used the Cb channel to extract the transition probability features from the image. The accuracy achieved was 95.6%, as compared to the 90.5% on the Cr channel in the former paper. The dataset used were CASIA v1.0 and CASIA v2.0 respectively.

In another research, for feature extraction, Zhao et al. [79] used Cr channel and proposed a technique based on a modiﬁed version of Run-Length Run-Number

15

(26)

16 Chapter 3. Related Work (RLRN) vectors, and achieved a detection rate of 94%.

Muhammad et al. [55] took an approach involving SVM classiﬁer for which features were extracted using Local Binary Patterns (LBP) from the sub-bands contain- ing the Cb and Cr components which were decomposed using the Steerable Pyramid Transform (SPT). The accuracies reported were 97.33%, 96.39%, and 94.8% on the CASIA v2.0, Columbia Color DVMM, and CASIA v1.0 dataset respectively.

Hussain et al. [34] proposed that the chrominance components could show traces of tampering after passing through Weber Local Descriptor (WLD) and then Local Binary Patterns (LBP) could encode as features, and they were fed into an SVM classiﬁer. CASIA v1.0, DVMM and CASIA v2.0 were the datasets used.

Zhao et al. [82] proposed a non-causal Markov model that captures the underly- ing statistical characteristics by the usage of a 2-D non-causal signal instead of the traditional 1-D casual signal. The model is then applied to Block DCT Domain and Discrete Meyer Wavelet Transform Domain (DMWT) for image splicing detection.

The detection accuracy was 93.36%. He et al. [33] proposed image splicing detection by introducing a casual Markov model in BDCT and DMT domains, and the features were extracted by cross-domain classiﬁcation. The detection accuracy was 87.61%.

Shi et al. [67] proposed image splicing detection by introducing Markov model in the BDCT domain, in which the adjacent difference array of BDCT coefficients were treated as a 1-D signal. They were modelled as a casual Markov model process, and the transition probability matrix was then fed into the SVM classifier as a discriminative feature vector. The detection accuracy was 90.15%.

Zhang et al. [80] proposed a method for image splicing detection based on multi- size block discrete cosine transform (MBDCT) and image quality metrics (IQMs).

Regression analysis is performed for evaluation. ANN is used as a classiﬁer. After experimental analysis, the author believed that the detection rates of the proposed algorithm were low.

Lu et al. [46] proposed that the higher-order statistical features be classiﬁed using a shallow (Radial Basis Function) RBF network for splicing detection.

Rao et al. [61] defines a calculated strategic model that uses SRM and a basic high-pass filter in the layers of the pre-trained CNN network. Dense features are extracted using the pre-trained CNN model, which also serves as a local patch descriptor, and these features are fed into an SVM classifier after a feature fusion technique is implemented. The datasets used were CASIA v1.0, CASIA v2.0 and DVMM, and the method was better than some state of the art methods like Muhammad [55], Ze [33], Ho [83] with accuracies 98.04%, 97.83% and 96.38% respectively.

Chen et al. [15] proposed a method in which features were extracted by the second-order statistics in the Markov random process. The transition probability matrices are resized using the thresholding technique. The features are fed into an SVM classiﬁer which helps to distinguish between singly and doubly compressed JPEG images. The authors conducted the experiment with multiple quality factors on either image, and the proposed method was compared with Popescu et al. [58]

and the proposed method outperformed with higher detection accuracy. Archana et al.[53] discusses JPEG based forgery detection and brieﬂy discusses Farid et al.

algorithm [24]. The human interaction is eliminated by using a Graph-Based Segmen- tation approach. Initially, while employing Graph-Based Segmentation, the image is divided into diﬀerent segments. Weights are considered on the edges of the pix-

(27)

3.2. Previous research advancements in localization of the tampered area 17 els. The suspected image is then recompressed and undergoes RGB colour channels.

Using Farid’s approach, the diﬀerence image is converted to black and white and then undergoes another set of segmentation. Although this approach eliminates the manual scanning of images, the detection rate is a little low.

Zhao et al. [84] proposed a method which considered diﬀerent quality levels of a re-saved compressed image with the tampered image and computed the average sum of the absolute diﬀerence between them. These images were termed as paint- doctored, and the image quality did not alter the working of the method. The proposed method detected the input image automatically, meaning it did need to be examined manually. 1000 uncompressed color images were chosen at random from the UCID dataset to perform the detection, and the experimental accuracy was 95.9% for JPEG images with quality factor as 90.

The authors of these papers [38] [31] have proposed methods to reduce the compressions caused due to JPEG artifacts.

3.2 Previous research advancements in localization of the tampered area

Garcia et al. [29] proposed a method based on the diﬀerence in compression levels in the forged image, which is created when splicing a region. This is a blind tampering detection method as there is no access to the original image. The author calculated the error variations at diﬀerent quality levels between the re-compressed image, and these features were fed into an SVM model. The datasets used were Columbia and CASIA v1.0, providing a very high hit-rate.

Bianchi et al. [10] proposed a method based on the hypothesis that Non-Aligned Double JPEG compression (NA-JPEG) artifacts are presented when there is a presence of a tampered region in the given JPEG image. The authors propose a statistical model which can detect and localize the tampered area. This is based on 8*8 DCT blocks present in the image, and is used a reference while detection and localization of the forged area. The authors tested their model on a dataset consisting of 100 non-compressed TIFF images, which was put together by them.

In Machine Learning based algorithms, the proposed model learns the diﬀerence between the authentic image and the forged image, while in the training phase of the model. Large datasets of images are to be fed into the model, which would help the user detection of the forged images. In deep learning models, higher accuracy is achieved as the parameters used to classify the images are used in multiple layers [16].

Rota et al [63] proposes an approach based on CNN which uses patch-based processing strategy to perform binary classiﬁcation for detection of forgery. A weak labelling method which could generate a segmentation layer from the tampered parts extracted from the forged image was proposed for localization. The detection rate accuracy was 97.44% on the CASIA TIDE v2.0 dataset.

Wei et al. [75] proposed a method which can learn rich tampered image features based on the combination of edge detection algorithms with Faster R-CNN. LoG operator and Prewitt operator applied to overcome the inﬂuence of noise by per-

(28)

18 Chapter 3. Related Work forming Gaussian convolution ﬁltering and removal of false edges respectively [69]

[60]. In order to improve the input image features, ResNet101 [32] is chosen as the network model. The edge information of the forged images is eﬀectively extracted in either direction by the two operators. Experimental results show that the proposed algorithm is able to detect and localize the tampered region better than [87] on the three datasets that were used; NIST 16, CASIA and Columbia.

Ghosh et al. [30] proposed a blind forgery localization method using SpliceR- adar. The Gaussian mixture model is used for segmenting the features over the entire image, which in turn eﬀectively helps in localizing the tampered region. Se- mantic contents were suppressed using a high pass rich ﬁlter. The author tested their approach on three datasets, namely DSO-1, NC16 and NC17-dev1, and compared the results with three other state of the art methods. The authors also noticed some instances in which all the algorithms tested, failed to identify the spliced region correctly, which prompted them to re-examine their algorithm and conduct further investigation.

Mazaheri et al. [51] proposed a method with DCNN based architecture to improve the forgery detection by exploiting the tampering artifacts near the boundary, and Laplacian ﬁlter along with Radon transform being used for the extraction of resampling features which are then fed into Long short-term memory (LSTM) network.

JPEG quality loss, downscaling, upscaling shearing and rotation are the resampling features which are extracted. An encoder-decoder layer is proposed keeping the blurring concept in mind, in order to localize the region at the pixel level eﬀectively.

According to the experiments conducted by the authors, the proposed method localized the tampering regions with improved performance when compared to other LSTM network based methods. The dataset used for evaluation were NIST ‘16, CASIA v2.0 for training, and CASIA v1.0 for testing.

Marra et al. [50] propose a CNN based network, which uses image-wise feature aggregation, patch-wise feature extraction and global decision which comprises three blocks in cascade. The parameters are jointly optimized by the usage of gradient checkpointing. Xception was used as a feature extractor. This allows the proposed model to localize any forgeries in the tampered image irrespective of its tampered size and the image size. The author also conducted an ROI based analysis for localization of forgery. These are some of the papers [81] [8] [42] which also perform patch classiﬁcation to localize the forged region using Machine Learning techniques.

Wang et al. [73] proposed a method which employs compression properties of JPEG and utilizes JPEG compressor to check if the output is the unchanged region of a lossless compressed image. For eﬀective tampered region localization, high- frequency quantization noise is extracted by Principal Component Analysis (PCA).

While conducting their experiments, the authors also noted that their algorithm might fail to localize the region if the source image has slightly higher quality than the quality used by them. The datasets chosen by the authors was CASIA TIDE v2.0.

(29)

Chapter 4 Method

In this Thesis work, we ﬁrstly focus on understanding the concepts along splicing forgery and thereafter localization of the tampered region. A literature review is performed in Section 4.1 in order to gather knowledge and insights on the previous approaches and methods used to tackle this problem. On an overview format, we also double-check if there have been any published researches on the localization of forged area using the concept of JPEG Ghosts.

We then ﬁnally proceed into the experiment; which is explained in detail in Section 4.2.

4.1 Literature Review

A literature review is the research method chosen to answer the RQ1. The main focus is to gain knowledge on existing popular techniques that detect if the image is forged or not, and most importantly, identifying or localizing the forged area in the input image. The information obtained from the literature review is presented in detail below.

Identifying the past literature provides knowledge, helps gain better insights on what has been done, and gives ideation on the unexplored areas. It is of utmost importance to understand and keep oneself updated by studying the existing literatures in the ﬁeld of the area of research to ensure that the research is not being re-done.

4.1.1 Search String Formulation

One of the most critical aspects while conducting a literature review is understanding the classifications correctly as it will play a significant role while formulating the search string. Before finalizing on the search string, specific keywords were identified from the papers and their classifications. From them, keywords that were more related to the domain were considered and selected. Databases like IEEEXplore, Scopus, ScienceDirect were used as the primary sources for obtaining the literature.

While reviewing the literature, truncation was used to enhance the search results, by providing variations of the word, which provided a variety of papers. Papers were then ﬁltered extensively using the exclusion and inclusion criteria.

The search strings went through multiple revamps, in order to understand which produced most eﬀective.

19

(30)

20 Chapter 4. Method Search string 1: (JPEG OR image) AND (detect* OR local* OR tamper*) AND ("machine learning") AND (splicing) AND (forgery) NOT ("deepfake") NOT (“audio”)

Search string 2: (JPEG OR image) AND (detect* OR local* OR tamper*) AND ("machine learning") AND (splicing) AND (forgery)

These were two versions of the search string used to search the databases for obtaining papers. Diﬀerent databases had diﬀerent responses to the string, and hence it is reiterated in the exclusion criteria.

For identifying the relevant literature, an in-depth reading of the Abstract, and the ﬁnal paragraph of the Introduction was done. If the understanding was vague, the conclusion and discussion were given a brief read.

Additional papers can be identiﬁed by using the references of the current paper, and this helped in obtaining valuable papers that were missed during the database search. This method is known as the snowballing method [77]. The papers obtained were again ﬁltered using the inclusion/exclusion criteria.

4.1.2 Inclusion and Exclusion Criteria

Inclusion Criteria:

1. Articles in the English language are selected.

2. Articles that have been published between the years 2009 - 2019.

3. Articles which are published in books, magazines, conferences and journals.

4. Articles Matching with the problem domain.

Exclusion Criteria:

1. Papers regarding facial forgery, deepfakes, and audio are not considered.

2. Papers only talking about copy-move forgeries are not considered.

3. Full text is not available.

4. Abstracts and PPT’s are excluded.

5. Articles not published in English.

Applying inclusion and exclusion criteria, papers were narrowed down and fol- lowing is a summary of the papers reviewed:

Lin et al. [40] proposed a method, which according to the author is the ﬁrst forgery detection method which can detect forgeries related to copy-move and splicing. The author focuses on JPEG double compression and computes DCT coeﬃcient analysis, and thereby features are extracted using SURF descriptors which also lo- cates same object copies on the forged image. According to the author’s experimental results, the proposed method can localize the detection of forgery, and the average execution time, while testing ISCAS images was about 200 seconds. Chen et al.[17]

(31)

4.1. Literature Review 21 proposed an image blur detection method which performs feature-based classification and localization based on no-reference quality assessment [54]. The features are mapped using Support Vector Regression which are extracted from MSCN coefficients and are thereby fed into an SVM classifier.

Lyu et al. [47] proposed a method based on the well-documented factor of local noise characteristics differences between regions of spliced and original regions of an image. These differences are generally introduced due to post-processing steps on the tampered image. The author uses these inconsistencies in the local noise levels, for detecting the spliced region. The author compares it with the Observed projec- tion kurtosis concentration phenomenon and uses the statistical relationship between noise characteristics and kurtosis. According to the author, the main drawback to the method is that, if the difference in the local noise is not significant enough, the proposed method might not detect the forged region.

The authors of [13] experimentally evaluated the performance of Lin et al. [41]

algorithm. The performance was compared with the CASIA TIDE v2.0 dataset, a public dataset with more variety and high-quality images, which is different from the dataset used for the performance comparison of Lin et al.’s algorithm which was a proprietary dataset. While computing its performance, the authors noticed an anomaly which was a difference in the luminance quality factor in the authentic and the forged images. Using this, the authors of [13] developed a variant of the algorithm, by adding three new features to be fed into an SVM classifier which coincidentally enough, gave a better performance on the same dataset.

Pham et al. [57] used Markov features in DCT domain for splicing forgery detection, and these features were fed into an SVM classiﬁer. The detection accuracy was 96.90% while being performed on the CASIA v1.0 and CASIA v2.

Zhou et al. [87] in his paper, proposes a two-stream network, one to extract features in order to ﬁnd the forgery evidence, and another to discover noise inconsistencies between the forged and authentic images. The streams are RGB and Noise stream respectively, within a Faster R-CNN network. For the second stream, the local noise features from the RGB image are used as inputs. The author also compared the performance of the two stream network with individual streams and noted that the two stream network outperformed them. NIST Nimble 2016, COVER, CASIA, and the Columbia dataset were used for evaluating the proposed method.

Salloum et al. [64] proposed that the tampering localization could be made more accurate if a framework is developed which can learn the boundaries of the spliced region, and the ground truth mask, in order to predict the tampered area in the input image. The authors use a Multi-task Fully Convolutional Network for their experiment. Their experiment result gave them satisfactory results with the accuracy being modest.

As a part of the IEEE IFS-TC Phase 1 challenge, Cozzolino et al. [18] uses a pow- erful descriptor-based forgery detection technique, which is inspired from Fridrich et al.’s [26] approach. A high pass ﬁltering method with local descriptors was proposed, and it gave good results in the detection of the spliced forgery. PatchMatch search algorithm was proposed for detection in copy-move forgery.

For the Phase 2 of the challenge, which was explicitly devoted to the localization of the forgery, Cozzolino et al. [18] designed a sliding window version of his Phase 1 detection algorithm. It localized the region by detecting the splicing traces at the

(32)

22 Chapter 4. Method boundaries. However, after careful inspection, the author saw that there were some irregular artifacts that were being localized and hypothesized that it maybe have been the camera residue.

Verdoliva et al. [70] then proposed a technique that takes features that are characterized based on a dense local descriptor from the extracted blocks. Feature training is done by a multi-dimensional model. For tampering localization, the log- likelihood of each feature is considered, and a histogram is generated; thereby creating a smooth decision map and Thresholding is done.

For the detection and localization of the forged region, the authors proposed two methods in their paper [12]. The first method, the authors use a Random Walker segmentation method to localize the forged region. They localize a heatmap extracting the resampling features while using Radon transform, and are fed into a Deep Learning classifier and a Gaussian conditional random field model. In the second method, the authors extract the resampling features and are localized using LSTM network as a classifier. Expectation Maximization (EM) algorithm is used to estimate the weights. The patches of raw uncompressed images were extracted from UCID and RAISE dataset, and the evaluation was done on the NIST Nimble 2016 dataset. During image forgery, the forged image is often compressed and resized as part of post-processing and then resaving it into JPEG format. This leaves out about some compression artifacts which can be detected by estimating the quality factor of the compression in every region.

One of the biggest advantages of the JPEG Ghost method is that it works for tamper detection and in low-quality images. However initially, when the method was proposed by Hany Farid [24], it had a lack of automation. Another disadvantage was with non-aligned Discrete Cosine Transform (DCT) blocks.

Azarian et al. [7] modified the method by proposing SE-MinCut segmentation algorithm [22] to extract the ghost borders and then the Bhattacharyya distance is computed to calculate the distance between the original and the tampered regions which is then fed into the classifier with a specific threshold. Although the automation of Ghost was solved, the author could not get around the drawback of Hany Farid’s [24] method which was assuming that the JPEG image is inserted into a higher quality JPEG image.

Alherbawi et al. [3] proposed a method which could identify the forgery effec- tively and quickly and could achieve this by classifying the images into three different classes. This is done by extracting features with images being converted into blocks and running a DCT coefficient analysis. The authors draw inspiration from the models proposed in [11] and [9] The author compared the performance of the proposed model by firstly, overcoming the overfitting issue experienced by applying the 10 fold cross-validation, and thereby comparing by using five algorithms, namely SVM learner, Tree, Random Forest Learner, KNN and CN2 rule inducer on the CASIA V2 dataset. According to the results of the experiments conducted, the model worked well in classifying forgeries with the first compression of the original image having lesser quality factor than that of the forged area, and also detected if the image had any errors. The authors also mentioned that the model sometimes failed to identify a couple of forged images.

After understanding the methods and techniques from the LR, we felt machine learning models be more apt to this research over deep learning models, as the

(33)

4.2. Experiment 23

System HP Pavilion x360 Convertible

CPU Processor Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz

OS Windows 10 Home

Installed RAM 8GB

Table 4.1: Environment

features we wanted to extract, were not from the original image, but rather from the output of Farid’s algorithm. Hence, we wanted to have leverage over the feature extraction process.

4.2 Experiment

The experiment phase firstly begins with ensuring that all the gathered literature is sufficiently analyzed, and a proper mindset and understanding are developed based on past methods and approaches. The dataset is selected, which is publicly available and preprocessing is done with the structuring of the data in accordance to our need, and the experiment set up is finalized. Different models which are to be analyzed are developed, and thereafter, it concludes with observations based on the comparisons of the metric performances. The main goal of this experiment is to find an accurate and robust technique and compare the predictive performance of various algorithms to give a closer insight in localization of the forged area in the tampered image.

4.2.1 Experiment Set-up / Tools used

Software Environment:

Python was chosen as the programming language in which this study would be performed as it is very eﬃcient and widely used while implementation of machine learning models.

In order to enhance the training time of the dataset, we have used Google Colab;

as it provides a cloud server and an in-built GPU. We felt it was faster and more eﬃcient than training on a CPU.

Hardware Environment:

The hardware environment used in this study is mentioned in Table 4.1.

Python libraries used:

1. Sci-kit learn: It is a free software machine learning library for the Python Programming Language, which features various classiﬁcation algorithms, and also supports Python numerical and scientiﬁc libraries like NumPy and SciPy.

2. Python Image Library: PIL is a free open source library which adds support for opening, manipulating and saving diﬀerent image ﬁle formats in Python Programming Language.

(34)

24 Chapter 4. Method 3. NumPy: It is a library available in Python Programming Language which supports a large collection of mathematical functions. In addition to that, it adds support for large, multi-dimensional arrays.

4. SciPy: It is a free open-source library scientiﬁc computing in Python Program- ming Language. It contains modules for Linear Algebra, Image Processing etc.

5. Matplotlib: Its is a plotting library and the numerical extension of NumPy in the Python Programming Language.

6. Joblib: It is a set of tools, that provide lightweight pipelining in Python and allows simple parallel computing.

4.3 Data Collection

4.3.1 Dataset

For this study, we have considered a combination of publicly available benchmark datasets. We have used images from the UCID: an uncompressed color image database [65], Image Manipulation Dataset [62], and a few images which have been manually forged.

We have chosen multiple databases, and randomly extracted images from each of them, in order to give a little variety. While manually forging, we have kept the compression levels similar to the datasets and in accordance with our study, so that there are no discrepancies. The images considered are in a combination of formats like JPEG, PNG and TIFF.

For localization of the forged area, the presence of the ground truth of the mask (forged part) is necessary. The ground truth of the mask can be generated by the subtraction of the original image from the tampered image.

In UCID [65], there are a total of 1338 images present in the dataset, of which we chose 200 images at random, and stored them separately in a folder.

Image Manipulation Dataset [62] contains a database consisting of the original image, the manipulated, and also the ground truth images. There are a total of 48 sets of images in multiple folders, each with a diﬀerent post-processing technique applied to it. These include scaling, rotation, JPEG artifacts, and a folder with combined eﬀects. For this study, we have chosen 192 original images, and in addition to that, their respective forged images, and their corresponding ground truths.

For images that were forged manually; we already have their ground truths present. We have forged 18 images on our own. This was done, and was added to the test set, in order to judge the performance of the model with having self-forged images as well.

Fig 4.1 and Fig 4.2 are two sets of examples, each having the original image, the corresponding forged image and its ground truth. These images belong to the datasets chosen for our study.

(35)

4.3. Data Collection 25

Figure 4.1: Example of images chosen in dataset [62] (a) Original Image (b) Forged Image (c) Ground truth

Figure 4.2: Example of images chosen in dataset [62] (a) Original Image (b) Forged Image (c) Ground truth

4.3.2 Data Preprocessing

Pre-processing of the data needs to be effective and is an important step, as it will allow proper extraction of features, which would, in turn, ensure the efficient classifier training on the dataset.

The images considered are in a combination of formats like JPEG, PNG and TIFF. The images are ﬁrstly converted into JPEG format, and thereby the color channels are extracted. RGB color channels are converted to YCbCr color channels as images in YCbCr channels have better performance aspects during feature extraction.

JPEG based algorithms are sometimes sensitive to non-JPEG features like noise, and it is hence essential to remove noise features present in the images, as it might interfere with the accuracy and the results. A Gaussian blur 3*3 ﬁlter is used for this purpose. In addition to this, the images undergo a morphological operation (Erosion) as well in order to remove the imperfections, if any, which could hinder the performance of the classiﬁers. Erosion removes pixels on object boundaries. We have ensured that all the considered images have similar properties at the end of the pre-processing phase.

The images which are not compressed, are often misjudged by the Ghost algorithm, as it ends up operating like a noise filtering process, and then the various noise contents present in different parts of the image are localized as a specific image edge [78].

(36)

26 Chapter 4. Method We do not consider images with quality factor < 40, as there is an increased chance of the algorithm, confusing the darker region(lower intensity values) as the ghost region, and hence giving incorrect results. It is of utmost importance to ensure that the quality factor of the authentic image, and the tampered image which are being used for training and testing be the same.

4.3.3 Feature Extraction

Image processing is a vast field, and endless features can be extracted from images depending on the need. It is vital to firstly understand, and then select what features are more suitable for extracting, in order to produce the desired result one is looking for. Extraction of the right set of features given the problem that is being addressed is of utmost importance, as the classifier being trained would not be able to give the desired result otherwise [37].

After gathering knowledge, understanding and analyzing what features have been used in the past studies from the papers gathered in the literature review and study, we have chosen the extraction process based on the analysis of the diﬀerence curves.

The features extracted are the aptest in accordance with that, as it provides us with the ideal scenario we are looking for. Features extracted in this study use resources from OpenCV and sci-kit image. We have extracted features like the slope, the weighted mean of the curve, the standard deviation value, the median, the y- intercept value etc. Keeping the JPEG qualities in mind, we have assigned a high-low weighted functions for high and low JPEG qualities. The features extracted mainly focus on the analysis of the curves. The models are trained on features extracted from Farid’s algorithm output. The example of the output is given in Fig 2.3.

The image features are stored in .npy format. This is because .npy ﬁles as these ﬁle format are incredibly speed enhanced due to the concept of parallelism in Numpy operations. The primary packages used include os.path to read images paths, matplotlib.pyplot to plot performance curves. The split the dataset into training and testing sets using the train test split method.

Implementations available in the scikit-learn library are used for Random Forest, Support Vector Machine and XGBoost, which includes the train-test split, the tuning of hyper-parameters. The dataset was split into 80-20 split, and the models were applied.

4.4 Data Analysis

4.4.1 Performance Metrics

True Positive (TP): This depicts the model correctly predicting the positive cases as positive.

True Negative (TN): This depicts the model correctly predicting the negative cases as positive.

False Positive (FP): This depicts the model incorrectly predicting the negative cases as positive.

(37)

4.5. Adapted Approaches 27 False Negative (FN): This depicts the model incorrectly predicting the positive cases as negative.

True Positive is when authentic images are identified as authentic, True Negative is when the tampered images are identified as tampered, False Negative is when the authentic images are identified as tampered and False Positive is when tampered images are identified as authentic.

Jaccard Index/Intersection over Union: Intersection over Union (IoU) also referred to as Jaccard Index is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. This is an essential metric since we are considering localization of the region. The IoU score ranges from 0 to 1, the closer the two boxes, the higher the IoU score. In other words, it speciﬁes the amount of overlap between the predicted and ground truth bounding box. Higher the similarity between the two areas, higher will be the Jaccard Index.

IoU = (AreaofOverlap)

(AreaofUnion) (4.1)

Precision: It is the ratio of correctly predicted tampered image to the total images present.

Precision= T P

(T P + F P ) (4.2)

Recall: It is also called true positive rate or sensitivity. It is deﬁned as the ratio of correctly predicted tampered images to the number of tampered images in the test set.

Recall= T P

(T P + F N) (4.3)

F1 Score: The weighted average of Precision and Recall F1 Score= 2 × (Recall × P recision)

(Recall + P recision) (4.4)

4.4.2 Dependent and Independent Variables

Dependent variables: The performance metrics used in our experiment, which compares the performance of the implemented models are the dependent variables.

They are Jaccard Index, F1, Precision and Recall.

Independent variables: The models chosen in our experiment are the independent variables in our experiment. They are Support Vector Machine, Random Forest and XGboost.

4.5 Adapted Approaches

In our research study, we have chosen RF, SVM and XGBoost as the most appropriate models. While doing the literature study and conducting the LR, we found that, in the Machine Learning domain, RF and SVM have been a pivotal model selection choice when it came to the domain of image processing and image segmentation

(38)

28 Chapter 4. Method (Image Forgery in specific). XGBoost is a relatively newer model and a theoretically more efficient model than RF, and as far as we have studied the past literature in the given timeframe, there are few researches which undertake XGBoost. Keeping this in mind, and also our feature extraction process (which is centred around the analysis of the difference curves caused by compression), we felt that these models would be most appropriate to our research.

Picking the right algorithm is not the only vital part of any research. Allowing the chosen model to function at its best, and make use of each of its strengths is vital.

Depending on the dataset, and the aim of the study, parameters of the model need to be tweaked. This procedure is called Hyper-parameter tuning. It has a significant impact on the performance of the algorithm. We came up with the hyper-parameters on a trial and error basis. We tried different sets of parameters and chose the ideal parameters which were giving us higher efficiency in terms of the performance. All the models are tuned to similar values so that there are no discrepancies in the performance of the model.

Support Vector Machine: SVM’s were initially designed for binary classification, and are supervised classifiers which are based on statistical theory. It belongs to the family of linear classifiers. It is a very popular algorithm and has been extensively used in image processing, computer vision, natural language processing. In this algorithm, each data item is plotted on an n-dimensional plane (n is the number of features that are present), with the value of the feature being the value of the coordi- nate. Classification is performed by calculating the hyperplane which differentiates the classes. The accuracy depends on the distance between the data points (also known as support vectors) and the hyper-plane. The parameters used in this study are max_iter = -1 and gamma=’auto’. Max_iter provides a hard limit on iterations.

Some of the advantages are:

1. Outliers present in the dataset, have a lesser impact on the output.

2. It is known to perform well on higher dimensions.

Some of the limitations of SVM are:

1. Selection of the correct set of hyper-parameters is very vital for the algorithm to perform well.

2. Kernel selection is very tricky and can be time-consuming.

Random Forest: Random Forest is a very popular algorithm as it is fast, relatively robust against outliers and noise, and most importantly, is highly accurate. It is easy to understand and implement and can make an implicit selection of features. It was developed based on the bagging approach and is an ensemble learning method.

In this, a group of weak models combine to form a robust model. By randomly choosing sub-samples for each iteration of growing trees, the algorithm allows for bootstrapping of data. Random forests are bagged decision tree models that split on a subset of features on each split. During the training process, a number of decision trees are constructed (as deﬁned by the programmer) which are then used for class

(39)

4.5. Adapted Approaches 29 prediction; this is done by considering the voted classes of all trees, and the highest- vote class is considered to be the output class [1]. Random Forest is known to perform well, as it considers a subset of predictors which is randomised, and each node of the tree formed is split, which gives rise to an intuitive counter-strategy. Parameter Tuning: n_estimators=50, max_depth=5, min_samples_split = 2. Some of the advantages are:

1. Performance is not aﬀected by a huge number of missing values.

2. Useful to extract feature importance.

3. Training speed is faster, easy to make parallel method Some of the limitations of Random Forest are:

1. Predictions of the trees need to be uncorrelated.

2. For data with diﬀerent values, attributes with more values will have a greater impact on random forests, so the attribute weights generated by random forests on such data are not credible.

XGBoost: eXtreme Gradient Boosting utilizes the concept of gradient tree boosting. It is an efficient and scalable implementation of a gradient boosting framework developed by [27] [28]. It is a special case of boosting algorithms, where errors are minimized by a gradient descent algorithm. It is a relatively newer algorithm and was developed to increase speed and performance while reducing overfitting by introducing regularization parameters. It has the in-built efficient ability to handle missing values and helps in tree pruning. It leverages subsamples and learning rate (shrinkage) from the features like random forests, which increases its computational speed while training. One of the most exciting aspects of this algorithm is that it contains a built-in cross-validation method at each iteration, which improves the performance of the algorithm. Parameter Tuning: max_depth = 5, min_child_weight:

6, subsample=0.8, learning_rate = 0.2. Some of the advantages are:

1. No need for scaling and normalization of data.

2. Less prone to overﬁtting, and outliers present in the dataset, have less impact to the performance.

Some of the limitations of XGBoost are:

1. Parameter tuning in vital, as there is a higher probability of the model overﬁt- ting.

2. Model has too many parameters, and often becomes slightly diﬃcult to ﬁnd the right set of parameters.

(40)

(41)

Chapter 5 Results and Analysis

5.1 Experiment Results

It is quite often to compare multiple metrics while comparing the performances of two or more models. It helps us in understanding how eﬀective the model is, and how helpful is our study to the scientiﬁc world. While comparing models, the most commonly sought after metrics are accuracy, F1 score, time taken (to run the model, and for data cleaning/preprocessing), precision, recall, and so on. However, it is of utmost importance to select metrics which add value to the research, and most importantly, which gives correct analysis of the model performance.

Performance of models was measured quantitatively, and diﬀerent performance metrics were used to analyze the models. In order to compare the eﬃciency between the selected methods, these parameters are calculated at the image level. Perfor- mance measures, as discussed in the previous chapter, are applied to the forecast of all three models.

5.1.1 Comparison of models

To localize the tampered region in the image, we have used the concept of contour approximation method on the ground truth mask for obtaining the bounding boxes.

As the original image, and the ground truth is of the same size, the masks of the extracted image can be generated easily. The model was then further trained with the newly acquired ground truths. Fig 5.1 is an example of a fake image, with its corresponding ground truth image.

Figure 5.1: (a) Forged Image (b) Ground truth corresponding to the image

The performances of the models are analyzed and depicted in Fig 5.2.

31

(42)

32 Chapter 5. Results and Analysis Jaccard Index: As mentioned before, the value of Jaccard metric ranges from 0 to 1. Values < 0.5 indicate that the system is has a poor performance, while scores from 0.5 - 0.95 indicate good performance. In this research, we chose the Jaccard Index or IoU score with a validation threshold of 0.5, meaning if the value is > 0.5, it is determined as valid. It is a very valuable metric as it measures overlap between two bounding boxes or masks. In our study, we are focusing on the localization of the tampered region, and we are ﬁnding out how much of the actually tampered region is being recognized by the model. More the overlap between the actual tampered region and the produced region better would be the performance of the model, and more closer would it be 1.

Precision is the probability of the exact forgery detection, while recall is the probability of forged image detection. F1 score is obtained by considering both precision and recall.

XGBoost is a relatively newer algorithm, and is an optimized version of a Gra- dient Boosting algorithm, while in Random Forest a subset of features are selected at random on each iteration to build a collection of trees. XGBoost analyzes the distribution of features across all data points and reduces the search space for all possible feature splits. Normalization is not required by either of the algorithms.

From Table 5.1 and Fig 5.2, we can conclude that Random Forest and XGBoost have nearly similar Recall and Precision scores. High precision and high recall scores show that both the model had a high number of correct detection’s.

Whereas Support Vector Machine has high precision, but low recall score, imply- ing there was a huge number of correct detection’s, but a large number of them were classiﬁed incorrectly (Many false positives). Setting the parameters and modifying the model did not improve the result.

The eﬀect of hyper-parameters was massive for Random Forest and XGBoost as there was a noticeable increase in the Recall and Precision.

Table 5.1: Models Comparison

Random Forest XGBoost Support Vector Machine

Jaccard Index 76.7% 79.8% 48.04%

Precision 85.8% 88.7% 86.2%

Recall 79.4% 91.32% 54.8%

F1 Score 86.8% 93.5% 67.9%

All the models returned high precision with nearly similar values, from which we can conclude that, the models are classifying the tampered images eﬃciently.

XGBoost outperforms the other models in each metric comparison.

Enhancing the JPEG Ghost Algorithm using Machine Learning