Investigation on how presentation attack detec- tion can be used to increase security for face recognition as biometric identiﬁcation

(1)

Improvements on traditional locking system

Fredrik Öberg

Independent degree project – second cycle — Master thesis

Main field of study: Department of Information Systems and Technology Credits: 30 hp

Semester, year: 10, 2021

Supervisor: Sebastian Försth (Dewire), Luca Beltramelli (Mid sweden university) Examiner: Mikael Gidlund, mikael.gidlund@miun.se

Degree programme: civil engineering computer science , 300 credits

(2)

Abstract

Biometric identification has already been applied to society today, as today’s mobile phones use fingerprints and other methods like iris and the face itself. With growth for technologies like computer vision, the Internet of Things, Artificial Intelligence, The use of face recognition as a biometric identification on ordinary doors has become increasingly common. This thesis studies is looking into the possibility of replacing regular door locks with face recognition or supplement the locks to increase security by using a pre-trained state-of-the-art face recognition method based on a convolution neural network. A subsequent investigation concluded that a networks based face recognition are is highly vulnerable to attacks in the form of presentation attacks. This study investigates protection mechanisms against these forms of attack by developing a presentation attack detection and an- alyzing its performance. The obtained results from the proof of concept showed that local binary patterns histograms as a presentation attack detection could help the state of art face recognition to avoid attacks up to 88% of the attacks the convolution neural network approved without the presentation attack detection. However, to replace traditional locks, more work must be done to detect more attacks in form of both higher percentage of attacks blocked by the system and the types of attack that can be done. Neverthe- less, as a supplement face recognition represents a promising technology to supplement traditional door locks, enchaining their security by comple- menting the authorization with biometric authentication. So the main contributions is that by using simple older methods LBPH can help modern state of the art face regognition to detect presentation attacks according to the results of the tests. This study also worked to adapt this PAD to be suitable for low end edge devices to be able to adapt in an environment where modern solutions are used, which LBPH have.

Keywords Face Recognition, Presentation Attacks, Convolutional Neural Network

(3)

Acknowledgements

First, i want to start by thanking Dewire by Knightec, who gave me the opportunity to do this thesis with them and my supervisor Sebastian Försth.

Secondly this thesis could never have been good to complete without the help of my supervisor Luca Beltramelli at mid Sweden university witch help me when i needed it and gave excellent feedback on the thesis to improve it.

(4)

List of Figures v

List of Tables vi

Terminology / Notation vii

1 Introduction 1

1.1 Background and problem motivation . . . 1

1.2 Overall aim . . . 3

1.3 Scope . . . 4

1.4 Research question . . . 4

1.5 Concrete and verifiable goals . . . 5

1.6 Outline . . . 5

1.7 Contributions . . . 5

2 Theory 6 2.1 Face Detection . . . 6

2.1.1 Haar-like cascade . . . 6

2.1.2 Histogram of Oriented Gradients . . . 7

2.2 Face Recognition . . . 7

2.3 Spoofing and Presentation Attack . . . 9

2.4 Face classification . . . 10

2.5 Methods . . . 11

2.5.1 Convolutional Neural Network . . . 11

2.5.2 Local Binary Pattern . . . 12

2.5.3 Principal component analysis . . . 13

2.6 Databases . . . 14

2.6.1 Face recognition . . . 14

2.6.2 Spoofing attacks databases . . . 14

2.7 Related work . . . 15

3 Methodology 17 3.1 Research area and strategy . . . 17

3.2 Proposed solution . . . 18

3.3 Dataset structure . . . 18

3.4 Choice of algorithms . . . 19

3.4.1 Face detection . . . 19

3.4.2 Face recognition. . . 19

3.4.3 Image classification . . . 20

(5)

3.5 Evaluation . . . 20

4 Implementation 22 4.1 Testing framework . . . 22

4.1.1 Presentation attacks . . . 23

4.2 Face recognition system . . . 24

4.2.2 Face recognition with CNN . . . 25

4.3 Presentation attack detection with LBPH . . . 26

4.3.2 LBPH training . . . 27

5 Result 28 5.1 Investigation of methods . . . 28

5.1.1 Face recognition . . . 28

5.1.2 Presentation attacks . . . 28

5.2 Implementation of systems . . . 29

5.3 Evaluation against the database . . . 30

5.3.1 Case one FR. . . 31

5.3.2 Case two PAD . . . 31

5.3.3 Case three PAD + FR. . . 32

6 Discussion 33 6.1 Development of system . . . 33

6.1.1 CNN face recognition . . . 33

6.1.2 Presentation attack detection . . . 33

6.2 Framework discussion . . . 33

6.3 Evaluation of results . . . 34

6.4 Ethical aspects . . . 34

6.5 Future work . . . 35

7 Conclusions 36 7.1 Concrete and verifiable goals . . . 36

7.2 Conclusion research question . . . 37

7.3 Overall conclusion and lessons learned . . . 38

7.4 Main contributions . . . 38

References 40

(6)

4 Standardization of weak point in ISO/IEC DIS 30107-1, 2016 . 9

5 Convolutional neural network . . . 11

6 Max-pooling . . . 12

7 Local Binary Pattern . . . 13

8 Framework . . . 22

9 Folder structure . . . 25

10 Cnn detection of a face . . . 29

11 Attack and real histogram distribution . . . 30

(7)

4 PAD confusion matrix . . . 31 5 PAD baseline results obtain from matrix . . . 32

(8)

Terminology

.

CNN Convolutional Neural Network

CRISP-DM CRoss Industry Standard Process for Data Mining (CRISP-DM) DNN Deep Neural Network

FR Face Regongition

IoT Internet of Things KNN k-nearest neighbor LBP Local Binary Pattern PA Presentation Attack

PAD Presentation Attack Detection SVM Support Vector Machine

(9)

1 Introduction

This chapter explains the background of this study which focus on examine the security of facial recognition as biometric identification as a replacement or supplement to key system. Followed by, concrete and verifiable goals to achieve the overall goal for the project. And in the end, the reader can read about how this report is structured and what limitations it had, but also the author’s Contribution.

1.1 Background and problem motivation

In recent years, biometric identification is becoming the preferred solution to a wide range of problems involving identity-checking because of the ability to provide more secure identification and verification, which this article states. [1] And from this, a method for biometric identification that is very common today is face recognition. So by focusing on biometric identification as an alternative to replacing traditional locks. We can see that this has already been applied to society today, as today’s mobile phones use fingerprints. Other methods exist as well, like the iris and the face itself. Of these three, when it comes to daily use, face recognition is the least intrusive of them because of how easy it is to analyze images with faces. A recent survey [2] published in 2019 has identified and categorized over 330 contributions to deep learning-based face recognition, a testament to the significant in- terest surrounding this area in academia. One big part of what this survey talked about where the Identification process of a person, which is simply the process of someone claiming to be a specific person. After this process, what needs to happen is an authentication process to verifying or prove the claimed identity. This process happens today in the form of a traditional locking system that can use a key or password. These traditional lock leads to having many accounts, passwords, and more, and keeping track of these is becoming increasingly complex, especially when it comes to systems that require high security. And to solve these problems, traditional systems have biometric identification can be used, which is a process where parts of a person’s body are analyzed to identify the person. By looking into how biometric identification is used, we can see that this type of identification has started to increase. It is used more in places like smartphones, laptops, and tablets to secure data and other sensitive information because of the unique- ness of biometric characteristics in the security system. [3]

However, one of these identification methods mentioned earlier is face recognition. Some challenges that need to be addressed for this method are low resolution, pose variation, complex illumination, and motion blur. Face recognition methods based on more traditional algorithms like support vector machine (SVM), Eigenfaces, Fisherfaces, Metaface, and Bayesian faces

(10)

do not handle the problems mentioned above in a good way. Furthermore, all of the mentioned methods cannot handle unconstrained face matching, like having different lighting and background every time. All of these above mentioning problems are described in [2] One interesting thing the survey focused on where Convolutional Neural Network (CNN), Which out of 330 contributions in the survey, 61% were based on the CNN to solve different face recognition problems. These methods showed a good result on verification with face recognition of up to 96 % accuracy. [2]

And by combining these Biometric identification methods with technologies such as computer vision, Internet of Things, Artificial Intelligence, and cloud solutions. An ideal system that utilizes these technologies have been created for this study in figure 1 as a reference picture which will be explained in detail later. Based on that picture and that phones and computers are already utilizing face recognition to identify users. Questions appear, like, is it possible to use FR to solve the traditional key system access problem?

This question has already been tackled for face recognition because of research on how ordinary doors with face recognition will work. For example, [4] [5] [6]. Intelligent doors with face recognition are realized. However, with face recognition, other problems will also appear, like presentation attacks, when face recognition wrongfully gives access when attacked. This article [7] states that if a deep neural network (DNN) face recognition is used, the method is highly vulnerable to Presentation Attacks if the model has higher than 90% accuracy. Furthermore, since security is always a hot topic, this study is more about security regarding facial recognition.

Furthermore, to explain why we can look at today’s people who use large parts of key systems or code to access places. However, this does not con- firm whether it is the physical owner of the key who accesses the site because there is no guarantee that when using keys and codes, the people who do not have the authorization to enter will enter.

One way to solve this problem is by developing a face recognition system that can unlock doors with people’s faces. Nevertheless, still many open questions remain to be answered. Would this system be more secure than regular locks, and will it be safe to use as an everyday use? How resilient is it to Presentation Attacks like replay attacks? What pros and cons does a system like this have? Alternatively, can it be used as a face recognition supplement to the already existing key system?. All of the above questions and thoughts will this study try to answer.

(11)

Figure 1: Cloud based face recognition system

1.2 Overall aim

This study aims to examine the security of facial recognition as biometric identification as a replacement or supplement to key systems or similarly restricted areas where not everyone is authorized to have access. This will be done by making a proof of concept that uses LBPH in the python library openCV2 which was originally made to do face recognition but in this case will act as a PAD. And this will be compered with a CNN FR system to see how it handels PAs. Furthermore, This aim proceed from assuming the developed system will be used in the reference picture in figure 1. Due to popular technologies such as cloud computing and IoT that Industries are trying to implement. This figure can be illustrated as follow. A full-fledged door locking system with an edge device to capture and detect faces to see if the picture is a presentation attack. Furthermore, thanks to it having an edge device, the developed system must run on lower-end edge devices.

Then the edge device sends the picture to the cloud. Then in the cloud,

(12)

a face recognition process begins. A decision will be made if the person is allowed to enter the door. To do this, a face recognition model with high accuracy will be used. With the architecture in Fig 1. as a reference when developing the FR and the presentation attack detection (PAD), this study will investigate how facial recognition as biometric identification can replace or supplement traditional systems. More practically, the aim is to develop a PAD to protect against PA based on related work. Furthermore, by using a pre-trained CNN model and evaluate its vulnerability against PA. This will create a two system one for the PAD and one for the FR These systems will work together to classify if it is a PA and if it is allowed. And all of this will be evaluated based on accuracy and how good it can restrain PA against the Replay-Attack database. [8]

1.3 Scope

This thesis has several limitations in the scope. One of these is thanks to how many ways of implementing a face recognition system. This thesis focused on convolution neural networks for the face recognition method because it is considered a more state-of-the-art way of doing it. To investigate and evaluate if it is possible to create a face recognition system to replace a traditional lock or have it as a supplement. Will this thesis focus on how to protect against presentation attacks and see how it affects the state-of-the-art face recognition baseline protection. Furthermore, the study is based on the earlier mentioned architecture in figure 1. which means that the PAD must be able to run on a low-end edge device which means that the PAD must be suitable for this. Due to the time span of the thesis, this study will only look at high-resolution replay attacks because of the simplicity of a regular user to do this type of attack.

1.4 Research question

The main research questions in this thesis are as follows:

• How can a PAD be used to increase the security of a state of art face recognition model like a CNN model with high accuracy in a locking system?

• Can traditional locking systems be replaced with face recognition or be used as a supplement to increase the security of an existing locking system?

(13)

1.5 Concrete and verifiable goals

The concrete goals of the project are as follows:

• Investigate what methods there are in face recognition and what meth- ods to protect against presentation attacks against these face recognition methods.

• Implement a face recognition system using a CNN model and a PAD suitable for running on a low-end edge device.

• Implement an test environment to evaluate the PAD, and the FR.

• Evaluate the CNN model and the PAD against the presentation attack database.

• Evaluate the PAD and the CNN model together against the presenta- tion attack database.

• From the result, Evaluate the possibility of increased security by re- placement or supplement of a traditional locking system with Face recognition to see strengths and weaknesses when it comes to facial recognition.

1.6 Outline

Chapter 1 describes the general background and what the purpose of this project is. Chapter 2 explains the necessary theory and presents the related works. Chapter 3 explains the method used to carry out this project to test and validate the created systems. Chapter 4 describes the implementation of the system. Then comes Chapter 5, which will present the results. Finally, Chapter 6 and 7 show the discussion and conclusions.

1.7 Contributions

The thesis has been performed by Fredrik Öberg, under supervision of Se- bastian Försth (Dewire), Luca Beltramelli (Mid sweden university). Dewire by Knightec

(14)

2 Theory

This chapter describes the theoretical elements for this thesis. The reader will get the theory to understand how and what the coming chapters will present. Some significant parts are deep learning and facial recognition.

Another part will be previous work in the areas of this study.

2.1 Face Detection

Face detection is a technology in computer science that aims to detect and identify faces in an image or a video stream. There are different methods to accomplish the tasks. One of these is CNN. Developing these networks from scratch will require vast amounts of data, and can be complex. So if this is a problem, a pre-trained model trained on millions of faces can be used to make it easier. Furthermore, there are also other commonly used methods like Haar-like cascade (HOG) with (SVM) or (LBP) cascade. A comparison of this method has been made by. [9] which showed that the HOG+SVM approach is more robust and accurate than LBP and Haar approaches, with an average detection rate of 92.68%.

2.1.1 Haar-like cascade

Viola-Jones Algorithm Developed in 2001 by Paul Viola and Michael Jones. [10]

Is the first step to have a Haar-like cascade. The Viola-Jones algorithm is an object-recognition framework that allows the detection of image features in real-time. Despite being an outdated framework, Viola-Jones is quite powerful, and its application has proven to be exceptionally notable in real-time face detection.

Figure 2: Illustration Haar-like features

(15)

Detection works by outline a box on the image then iterates thru this image with this box. Furthermore, while this box is going thru the image, it will be searching for these haar-like features. These haar-like features are features the system can see in pictures based on the distribution in black and white colors. Furthermore, by combining these features, a face can be created.

How these features look like is shown in figure 2

2.1.2 Histogram of Oriented Gradients

A feature descriptor is an algorithm that takes an image and outputs feature descriptors/feature vectors. And what it does is encode the information into a series of numbers and then act as a numerical "fingerprint" that can differentiate one feature from another. This is the base of how the His- togram of Oriented Gradients works. Furthermore, this method wants to create images with so low an amount of data and still see what the picture represents.

HOG works by focusing on the structure or the shape of an object, and what HOG does is provide the edge directions by extracting the gradient and orientation of the edges. Additionally, small regions in the picture will represent these orientations in the image. Furthermore, for each region, the gradients and orientation are calculated. Finally, the HOG generates a His- togram for each of these regions separately. Based on the values of the pixels and create the histograms using the gradients and orientations.

2.2 Face Recognition

Face recognition is a digital technology that began to be developed in the 1970s [11] and has since been developed at a tremendous rate essentially because computers have become more powerful. What Face Recognition does to identify or verify a person based on a digital image or video frame.

By comparing images from a given image within a database to generate a model. This model knows all images in the database, the process in figure 3 Ilustate a typical system.

Figure 3: Face recognition process

(16)

• Occlusion and Partial occlusion are some of the significant challenges of face recognition, which is the ability to hide part of the face. It would be difficult to recognize a face if some part of the face is missing.

• Low Resolution as an example, the pictures are taken from surveil- lance video cameras comprise tiny faces.

• Digital Noise lineages are inclined to several types of noise. This noise leads to poor detection and recognition accuracy.

• Illumination the variations in illumination can drastically degrade the performance of the face recognition system. The reasons for these variations could be background light, shadow, brightness, contrast.

• Pose Variation frontal face reconstruction is required to match the im- age face with the face in the database.

• Expressions With the help of facial expressions, we can express our feelings which can affect the FR.

• Aging is one of the natural components.

• Plastic Surgery plastic surgery and their faces will be unknown to the existing face recognition framework.

Based on these factors, there is a couple of methods to conduct facial recognition. These can be summarized using geometry-based Methods, Holis- tic Methods, Feature-based Methods, Hybrid Methods, and Deep Learning Methods.

• Geometry-based Methods This method is one of the first proposed methods for face recognition. The method works by finding a set of facial landmarks to measure the position and distance between them.

• Holistic Methods Represent faces using the entire face region. Many of these methods work by projecting face images onto a low-dimensional space.

• Feature-based Methods refer to methods that leverage local features extracted at different locations in a face image.

• Hybrid Methods combine techniques from holistic and feature-based methods. Solutions like a holistic and feature-based method were state of the art before deep learning became widespread.

• Deep Learning Methods CNNs are the most common type of deep learning method for face recognition. It is because of the capability to handle an unconstrained environment. One negative effect of CNN is the amount of training data it needs and how long it takes

(17)

2.3 Spoofing and Presentation Attack

Face recognition as a method for Biometrics identification has been used in public devices as so far back to 2009 and earlier where big companies like Lenovo, Asus, and Toshiba. Moreover, as this paper conclude is that is possible to bypass all Three of the big company’s face recognition. [12]

As mention in chapter 1, state art faces recognition like CNN tens to have very high accuracy. Thanks to this, it is not always ideal to use it because it should also protect itself against attacks. After all, they are many types of attacks. [13] Tackle this problem by developing a secure framework to protect the privacy of the data by offloading the data from the edge to the cloud.

Figure 4: Standardization of weak point in ISO/IEC DIS 30107-1, 2016 A more general way is what figure 4 shows (ISO/IEC DIS 30107-1, 2016).

These are weak point attacks on the biometric sensor (point 1) is called direct attacks or PAs Attacks at points 2 to 9 are called indirect attacks. From this, a presentation attack is when using biometric data as an attack on the system.

The attacker will display biometric data to create events that wrongfully appear to pass the system when receiving data directly from the person, online or existing databases. It is possible to create these types of attacks.

Protecting against these PAs is to develop countermeasures to PAs that identify whether the presented biometric sample is a false presentation. This system is called PAD (presentation attack detection). Some variations of PADs are Frame-based, only use a single image to classify face samples. These PAD systems can quickly output a decision. Video-based require a video recording of a certain length to classify the samples. Other methods require human interaction, like Challenge-Response.

When it comes to PA attacks, the are multiple ways to do them. Morphed

(18)

face attacks are one way of attacking a system. [14] Investigated the vulnerability of biometric systems to such morphed face attacks. The result ended with creating two new databases by printing and scanning digitally morphed images using two different scanners and valuating the techniques proposed to detect morphed face images. Furthermore, other databases have also created to test and train the PAD and FR system to handle this type of attack. One this thesis will use named REPLAY-ATTACK, will be used in this paper. Other papers like this one 6313548 studied the Effec- tiveness of Local Binary Patterns in Face Anti-spoofing and, for evaluation, used the REPLAY-ATTACK. This paper as well used it for Image-Based Ob- ject Spoofing Detection [15], which tries to improve the spoofing detection ability by using multiple color schemes to concatenate them and train the model, which shows promising results against other PAD.

The state-of-the-art method of developing PADs is to make CNN models.

A problem with CNN-based PADs is that it needs many data to train cor- rectly as [16] mentions numerous parameters in these deep learning-based detection methods cannot be as good they can be due to limited data.

2.4 Face classification

Face classification is classifying the features extracted from a person after getting hold of the facial features by the recognition. Furthermore, comparing it to the database to classify this. More precisely, person A has features, and person B has other features that must be classified to decide which person it is. To achieved this, a classification algorithm can be applied. Some classification algorithms are SVM, k-NN and Gaussian Naïve Bayes.

• SVM the objective of the support vector machine algorithm is to find a hyperplane in N-dimensional space that distinctly classifies the data points.

• k-NN The k-nearest neighbors (KNN) algorithm is a simple, easy-to- implement supervised machine learning algorithm that can solve classification and regression problems.

• Gaussian Naïve Bayes Based on Bayesian classification methods, Naive Bayes classifiers rely on Bayes’s theorem, an equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we are interested in finding the probability of a label given some observed features.

(19)

2.5 Methods

There are many ways to do face recognition when it comes to face recognition, so this chapter presents an explanation of a couple of methods to do face recognition.

2.5.1 Convolutional Neural Network

A convolutional neural network is a method in the field of deep learning.

This method is a common and well-known method for image classification, object classification, and faces classification. CNN takes an input image and runs this image thru a couple of different layers. An example of this setup is shown in figure 6. What the layers do is explain down below.

Figure 5: Convolutional neural network

• Input Layer This layer takes an image that has a basic two-dimensional structure. But if we take the colors, then we can represent the image in three-dimensional. Images are encoded into color channels, so the image data is represented into each color intensity in color, typically RGB. The intensity of each channel color into the width and height of the image becomes three-dimensional. To be able to use the image in the CNN, it needs to reshape it into a single column. As an example, 28x28 =784 will be converted into a 784x1. So, if the training data is n, the input will be(784, n)

• Convolution Layer This layer main focused is to extract features. What the layer does is taking the input image and connect it to the Convo layer. This performs a convolution operation, which means it will cycle through the image with a set size of a filter. As an example, if the image is4x4 and the filter is 3x3 the cycle will go through the image four times and calculate a 2x2 matrix. equation (1) is the general for- mula for this operation. which shows the operation where N is the image size and Fis the filter. If the size of the output wants to be con-

(20)

trolled, padding can be added to the equation. (2) shows a version that adds p as padding.

(NxN) ∗ (FxF) = (N−_F+1)x(N−_F+1) (1) (N+2p−F+1)x(N+2p−F+1) (2)

• Pooling Layer This is for reducing the volume of the image to a more spatial form and is usually between two Convolution layers. One of the more popular Pooling layers is max pooling which means the max- imum value in a batch will be chosen in the reduction figure 6 shows a 2x2 max polling process. This is to reduce the computationally ex- pensive not doing it will have.

Figure 6: Max-pooling

• Fully Connected Layer A fully connected layer involves weights, bi- ases, and neurons. It connects neurons in one layer to neurons in another layer. It is used to classify images between different categories by training. In place of fully connected layers, conventional classifiers like SVM can be used as well. However, we generally adding a Fully Connected Layer will be added to make the model end-to-end train- able.

2.5.2 Local Binary Pattern

Local Binary Pattern is a simple yet very efficient texture operator which la- bels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. As is shown in 7 The general way to describe this process is equation (3)where S is defined as (4)The obtain values then can be used to create a histogram of the future which then combines with another future histogram. This histogram is a classifier for

(21)

Due to its discriminative power and computational simplicity, the LBP texture operator has become a popular approach in various applications. It is the unifying approach to the more traditionally divergent statistical and structural models of texture analysis. Perhaps the essential property of the LBP operator in real-world applications is its robustness to monotonic grayscale changes caused, for example, by illumination variations. Another important property is its computational simplicity, making it possible to analyze images in challenging real-time settings.

LBP(gpx, gpy)_∑_P−1^p=0_S(_gp−gc) ×₂^p ₍₃₎

S(x) =

0 i f x ≥₀

1 i f x <0 (4)

Figure 7: Local Binary Pattern

2.5.3 Principal component analysis

Eigenfaces refer to an appearance-based approach, and Principal component analysis is the base of that method. This method seeks to capture the variation in a collection of face images and use this information to encode and holistically compare images of individual faces. This method works by calculating the eigenvectors and eigenvalues.

Step 1 is to transform the image to a matrix this Menes all training images will be converted to an NxN matrix so, in the end, you have as eq (5) shows where equals the number of images, and i is the matrix.

(i₁, i₂, i₂...in) (5) Step 2 calucating the matrix to a vector

Step 3 calculate the average of the vectors Step 4 subtract the average from the vector

(22)

Step 5 compute the covariance matrix

Step 6 calculate the eigenvectors with their related eigenvalues Step 7 K eigenvectors

For face detection and recognition, the Eigenface approach is considered by many to be the first working facial recognition technology. It served as the basis for one of the top commercial face recognition technology prod- ucts. Since its initial development and publication, there have been many extensions to the original method and many new automatic face recognition systems. Eigenfaces are a baseline comparison method to demonstrate the system’s minimum expected performance.

2.6 Databases

Using a database of pictures and training a model can be done with a face recognition algorithm. This chapter will discuss the different types of databases that exist to use to create FR models like CNN. To get the best result, different kinds of data in the database depending on the situation to achieve the best result. For example, age can significantly impact the result and the lighting, and the environment. Another essential part is the problem with presentation attack, which also needs training depending on PAD type. It will also address some of the databases for different kinds of attacks on systems.

2.6.1 Face recognition

To develop a successful FR system, the system must consider what kind of problems the system has to deal with, and This requires a database to train the model. The choice of database most fits the purpose of the model. For example, if the model purpose is to make an FR system for children, then the database must contain images and variation of the image that mimics children. Other factors like Occlusion, Low-Resolution Noise Plastic Surgery, Aging illumination, Pose Expressions can affect the result.

2.6.2 Spoofing attacks databases

To handle presentation attacks, databases must be available to test if the system can handle several types of attacks. This chapter explains what types of databases there are and the different purposes. Some are for latex masks.

Some are for print attacks others are replay attacks. Later these databases can test the crated system against matrices for face recognition.

• MOBIO This database consists of bi-modal (audio and video). which contains data from 152 people, 100 males and 52 females. This was

(23)

done from 2008 to 2010 from six different sites from five different coun- tries. [17]

• Replay-Attack Is a 2012 database and is made up of 1300 video clips of photo and video attack attempts on 50 clients. Furthermore, have four different groups. training data ("train"), to be used for training your anti-spoof classifier, Development data ("devel"), to be used for threshold estimation, Test data ("test"), with which to report error figures, Enrollment data ("enroll") that can be used to verify spoofing sensitivity on the face detection algorithms. [8]

• Replay-Mobile Is a similar database to Replay-Attack. Consists of 1190 video clips of photo and video attack attempt to 40 clients, under different lighting conditions. an also have the same groups as Replay- Attack [18].

• SWAN The SWAN-Idiap dataset comprises 150 subjects captured in six different sessions reflecting real-life scenarios of smartphone-assisted authentication. One of the unique features of this dataset is that it is collected in four other geographic locations representing a diverse population and ethnicity. Additionally, it also contains a multi-modal Presentation Attack (PA) or spoofing dataset using low-cost Presen- tation Attack Instruments (PAI) such as print and electronic display attacks . [19]

• WMCA The Wide Multi-Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presentation attacks from 72 different identities. The data is recorded from several channels, including color, depth, infra-red, and thermal [20].

2.7 Related work

As mentioned in the introduction of this thesis, much work exists in face recognition in recent years. About 330 contributions analyzed in the 2019 servery 61 % were based on the CNN network to solve different face recognition problems. Show good results on verification with face recognition up to 96 % Accuracy. One big part of this servery was to focus on what problems FR must overcome to get a good face recognition, and some of them play a big role depending on the purpose of the model. Some of these problems were still image-based face recognition. Where in recent year con- siderable progress has been made in constraint environment. Furthermore, recently, researchers focus more on unconstrained face recognition where various poses, illuminations, expressions, blur, ages, and occlusions were problems. [2] However, with FR models with high accuracy, other problems will be discovered. Like what this article has researched Deeply vulnerable:

(24)

a study of the robustness of face recognition to presentation attacks. What this article has done is to investigate the DNN FR model’s vulnerability to a PA. Because as of earlier said, DNN FR, like CNN, has been recently outper- formed other methods by a significant margin. Nevertheless, maximizing recognition performance alone is not sufficient. The system should also be capable of resisting various kinds of attacks, including PA. What this studie shows is that high DNN based FR is highly correlated to be vulnerable to PA when the accuracy starts to be in the 90% or more [7]. Which also shows in this article [21] which concludes the lessons learned about spoofing and anti-spoofing in face biometrics and highlight open issues and future directions. A what they say is that "Without spoofing counter-measures, most of the state-of-the-art facial biometric systems are indeed vulnerable to attacks since they try to maximize the discriminability between identities without regards to whether the presented trait originates from a legitimate living client or not."

As for system development for exactly door access, some articles focused on developing a Low-Cost Embedded Facial Recognition System for Door Access Control using Deep Learning. To have ass an edge device and so on.

However, one vulnerability this found is that we have said earlier the ability to use a phone with the face and access the door. [4] Other paper have also done developing the smart door system like [5] [6]

(25)

3 Methodology

This chapter describes the methods used to fulfill the Concrete and verifiable objectives described in Chapter 1.5. By first explaining the Work strategy that will be used to achieve the goal of this study. This will be performed during the thesis period until the project has been completed. The Last part will describe the testing and the validation of the system.

3.1 Research area and strategy

During the work, A conclusive research with experimental data has been conducted and how this was achieved was with a mixture of the two agile work strategies Scrum and XP. Scrum was chosen as it is well suited for development projects where the requirements often change during the work.

With Scrum, it is in these cases easy to change the requirements set at the be- ginning of the project. The method is also suitable when there is uncertainty about which parts the project will have. XP was used together with Scrum to enable backlog changes during an ongoing sprint, as rapid changes and varying requirements could have occurred. Scrum has a sprint length of at least two weeks, while XP has a length of one to two weeks.

Furthermore, the mix between Scrum and XP has meant that the work has been focused on a product backlog. This backlog has constantly been chang- ing based on need. These changes could also have taken place during an ongoing sprint. Something that Scrum as the only strategy had not allowed.

In the initial stage of the work, a feasibility study has been carried out. This helps to produce information to create a solid foundation to work from. In meetings with Knightec Dewire and Mid Sweden University, the discussion regarding the scope and area of the project has been clarified. This information has since been of great importance for the collection of requirements on which the work is based on.

During the feasibility study, information has been obtained from a similar Thesis that has existed. This is to investigate how these solutions work and how and if this could affect the project’s direction. The feasibility study has shown that similar systems and software exist today but with some differences.

(26)

3.2 Proposed solution

The proposed approach to investigate the aim of this study is to make a proof of concept that uses LBPH in the library openCV2 which was originally made to do face recognition but in this case will act as a PAD and this proof of concept is based on if it is possible to replace regular locking systems with state of art systems like CNN.

The main way how this study will work is based on the CRoss Industry Standard Process for Data Mining (CRISP-DM) where the first focus is on finding purpose for the project through Business understanding. The second step is to understand what type of data will be needed in this case, which database to use and what type of attacks. Which then leads to prepar- ing the data to be used. In the end, modeling and testing will be done to be evaluated it. To be able to to this fist of all an investigation has been done to complete one of the Concrete and verifiable goals. This investigation has shown that CNN faces recognition has really high accuracy which means that it a good candidate for the study. which furthermore research of the related topic the concept of Presentation attack was introduced. This concept is attacks on the FR system and a couple of articles show a high correlation between high accuracy FR system and the vulnerability for PA. So based on that information the study will lock into a special case which was shown in chapter 1 this was an IoT and cloud-based solution for a locking system.

With this in mind, the proof of concept will include a FR and a PAD with a testing framework to see if the high-accuracy CNN model will have efficient results from protection from PAs with this PAD which must be suitable to run on low-edge devices. Also, what type of data will be tested on the system. All of the choice which algorithm to use is explained further in this chapter. First, the dataset. Then the choice of algorithms for Face detection, Face recognition, Image classification, and presentation attack detection.

3.3 Dataset structure

The chosen database for PA where the REPLAY-ATTACK database because of the related article [15]. which use LBP, which is similar to this study.

The chosen PAD in that article gave a good result. Also, how easy it is to make a replay attack on a system. The database is constructed to have four different types of data train, dev, testing, and Enrollment which give the user a comprehensive ability to construct a PAD. Furthermore, the dataset comes with protocols of different types of attacks listed in Table 1. Moving on, the PAD and FR will train on this protocol. As seen in the table, a good variation of training and testing is available. The PAD and CNN model will have the own collection of training, but both of them will use the same

(27)

Table 1: Database protocols

Hand-Attack Fixed-Support All Supports Protocol train dev test train dev test train dev test

Print 30 30 40 30 30 40 60 60 80

Mobile 60 60 80 60 60 80 120 120 160

Highdef 60 60 80 60 60 80 120 120 160

Digitalphoto 60 60 80 60 60 80 120 120 160

Photo 90 90 120 90 90 120 180 180 240

Video 60 60 80 60 60 80 120 120 160

Grandtest 150 150 200 150 150 200 300 300 400

3.4 Choice of algorithms

In this study, a couple of choices have been made because of how broad the choices is. This chapter will address the choice of the critical parts in the system. This will include which face detection, face recognition, and what classifying method will be used. Furthermore, what methods are used to detect the presentation attacks.

3.4.1 Face detection

This study focuses on face recognition and the detection of PA and not the detection of the face which means not the focus has not been on face detection. Furthermore, the face detection method’s choice is based on the mentioned architecture presented in Chapter 1. So it must be able to run on a lower edge device. To make it more accessible, the system will use Opencv2 and Python’s own face recognition library to utilize as this study do not have a focus on Face detection. so what method the CNN and the PAD will be Histogram of Oriented Gradients (HOG) to detect the face in the CNN , and the PAD will use CV2 CascadeClassifier.

3.4.2 Face recognition

For the face recognition, we have two choices: one for the CNN face Rego- nigtion and one for the PAD. For the CNN models, there are a lot of different trained models that can be used. As mention in Section 2.5.1. Researchers have developed different kinds of CNN architecture which. In this study, Dlibs face recognition will be used which is build in python. Dlibs is a version of the ResNet-34 developed by [22] but with fewer layers and the number of filters reduced by half. This version was made by Davis King and was done on a severely different dataset, including self scraped from the internet, scrub dataset. [23], the VGG dataset [24] and the Labeled Faces in the Wild (LFW) [25] dataset the network compares to other state-of-the-art

(28)

methods, reaching 99.38% accuracy. [26]

3.4.3 Image classification

The last part is image classification. The used CNN model will use regular Euclidean distance with a specific confidence. Furthermore, in the end, the result of Euclidean distance will end up in a voting system. To decide which one of the faces has more confidence. The LBPH will be using the histogram generated for each face to compare it to the input, and with calculated confidence, it will decide how close the face is to the real one.

3.4.4 Presentation attack detection

The PAD will be using LBPH because of the promising result in [15], which worked with LBP with different color schemes. Also, one reason for using LPB based is because it not highly computational is excellent for edge devices. More state of art PADs that uses CNN to train the PAD is problematic because of the amount of data it needs. This article [16] mentions, the available databases used for PA are not so good because of the size CNN needs.

This article also states that CNN and LPH have a similar structure which can be a good choice. What thay did in the article was to use LPB to reduce the CNN, so it did not need as much data which is ass mention earlier as a problem.

3.5 Evaluation

To understand how good or bad the created system is. It can be evaluated against performance matrices. In this chapter, some evaluations of biometric recognition systems will be explained.

The generic way of evaluating this kind of system is Metrics for binary classification systems. The idea is to identify if a person is positive or negative.

eq. (6) defines a label positive or negative depending on the function M(x) which returns the score of the face model, which then can be compared against a certain threshold r.

label =

positive i f M(x) ≥ r

negative i f M(x) < r (6) These metrics for binary classification systems have four possible outcomes listed down below.

• true positive (TP) when x is a positive sample and is labeled as a pos- itive sample.

(29)

• true negative (TN) when x is a negative sample and is labeled as a negative sample.

• false positive (FP) when x is a negative sample and is labeled as a positive sample.

• false negative (FN) when x is a positive sample and is labeled as a negative sample

Furthermore, based on these values, a calculation can be done to obtain the following computed score.

• sensitivity, recall, hit rate, or true positive rate (TPR):

• specificity, selectivity or true negative rate (TNR):

• precision or positive predictive value (PPV):

• negative predictive value (NPV):

• False Rejection Rate (FRR):

• False Acceptance Rate (FAR):

• half total error rate (HTER):

To test A spoofing detection system, we must handle two types of errors, either the actual access is rejected (false rejection), or an attack is accepted (false acceptance). In order to measure the performance of a spoofing detection system, the Half Total Error Rate (HTER), which combines the False Rejection Rate (FRR) and the False Acceptance Rate (FAR) and is defined as (7)

HTER(%) = ^FAR+FRR

2 ∗100 (7)

FAR = ^FP

FP+TP (8)

FRR= ^FN

TP+FN (9)

In an ideal spoofing detection system, both FAR and FRR should be 0. An- other metric commonly used to evaluate a biometric system is the EER - Equal error rate. This error rate is obtained at the threshold that provides the same FAR and FRR.

(30)

4 Implementation

This chapter will cover the CNN face recognition implementation using Python, which uses DLIB which is a C++ toolkit containing machine learning algorithms. As mention earlier, a pre-trained and modified version of the resnet32 will be used to do the FR. The developed PAD will use LBPH to train on the faces in the databases. And then how this two models for PAD and FR can be used as a evaluation if the face was real or an attack will be covered in this chapter as well. Furthermore will the evaluation of the system be done by developing a framework, to be able to attack the system with specific protocols that the database has.

4.1 Testing framework

The created framework to test the CNN and the PAD is based on testing three different cases in the system illustrated in figure 8. The first case represents the FR result without the PAD and the second one is the result of the PAD. This is to evaluate the two systems separately. The third one is when the system applies the PAD to the system. This is to see how the PAD affected the FR when faces labeled PA is removed in the FR.

Figure 8: Framework

Furthermore the created framework is based on a terminal application that works with arguments. Below is the listed argument for the framework.

A trained CNN model and PAD have been created with the corresponded data, which is the replay database. The training on the CNN will be done on the actual videos in the dataset. The PADs training can happen in several ways depending on what system is testing because of the severe types of attack that can be done on the system which is explain more later.

• Testing specific argument

(31)

– test-method which test you gona run CNN only PAD or both togheter

– protocalwhich protocal to run

• CNN specific argument

– detection-method face detection model to use either ‘hog‘ or

‘cnn‘

– encodingspath to serialized db of facial encodings for the CNN – datasetpath to input directory of faces + images

– Imagepath to input image – Inputvideopath to input video

– displaywhether or not to display output frame to screen – outputvideopath to output video

• LBPH specific argument

– lbphcascadepath to the face detection cascade for the LBPH – lbphyml path to the yml file which containd the traind data for

the PAD

– lbphlabels path to the pickle file which contain the accosiaded names

– lbphsavecapturesave path for the PAD model

4.1.1 Presentation attacks

Ass mentioned earlier in this chapter. The system will be evaluated against some protocols. These are created by the developer of the Database explained in section 3.2, which is what type of attacks can be done on the system. These protocols will work as an attack on the system in the three cases explain earlier. First, it will run FR without the PAD to see how the baseline protection is for the CNN FR. The same protocol will run thru the PAD to detect as many PAs as possible. Furthermore, the case three will be run last which are a combination of both cases 1 and 2 together. So in these three cases three attacks will have been done, which are evaluated against the HTER value explained in section 3.4 In the end after the system has obtained results from the CNN and the PAD separately. But also together. will conclusion and discussion will be presented in the later chapters.

(32)

4.2 Face recognition system

One of the concrete and verifiable goals was to implement a face recognition system using the CNN model. The Python Face recognition library was used to accomplish this goal, and this library uses the modified version of the resnet 32 CNN model and trained with over 3 million faces. The rest of 4.2 will explain the implementation steps of the created FR system with this CNN model.

4.2.1 Face detection

The first step of training the CNN model is to detecting faces in the pictures.

The face detection part of the CNN model will work with the Histogram of Oriented Gradients to speed up the process which was explained in detail in chapter 2.1.2. Why this metod is used is becuse when testing was done a noticeable increase of execution time was shown when for exempel CNN was used instead of HOG

Futermore when developing the CNN FR, some possible cases were developed. One when the system needed to recognize faces in a single image, one with recognizable faces in a live video stream from the webcam and then outputing a video, and one to recognize faces in a video file residing on disk and output the processed video to disk. down below a step-by-step process be explained how the detection will work

• Step 1 Depending on what type of media the user is using, the detec- tion of the face is the same. The idea is to store the known encoding and the known names in two lists. These two will contain the face encodings and corresponding names for each person in the dataset.

• Step 2 Depending on how many people the user wants to train. The system needs to iterate thru them and detect the faces. This is depending on the structure illustrated in figure 9 The process will iterate N times if N people are in the dataset. From there, the system will extract the name of the person from the image path. And important step is to converting images to RGB because DLIB expects it, so before we proceed, a swap needs to be done.

• Step 3 In this step, for each iteration we use the library module in python named face recognition that has a face locations method that takes the RGB image and what type of method to detect ass mention earlier it is HOG.

• Step 4 In this last step, we utilize the face encodings module in the library to convert the image to an en numerical encoding and take the name and the encoding and append it to (known encodings and

(33)

9

Figure 9: Folder structure

4.2.2 Face recognition with CNN

To be able to use this encoding in chapter 4.2.1 in other scripts and not only in one place. A encode script converts these two lists created into a pickle file which in python is when the program is serializing and de-serializing a Python object structure. “Pickling” is the process where a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, where a byte stream (from a binary file or bytes-like object) is converted back into an object. This will create 128-d face embeddings for each face in our dataset.

The created pickle can be used to recognize the faces of an image using OpenCV which are a open source computer vision and machine learning software in python. Down below is the explained step-by-step way to how these processes have been done.

• Step 1 load the pre-computed encodings + face names and then con- struct the 128-d face encoding for the input image. which includes loading and converts the input image to RGB color. Then proceed to detect all faces in the input image and compute their 128-d encodings.

• Step 2 Initialize a list of names for each face that is detected. Then iterate over the found faces to match each face in the input image (encoding ) to our known encodings dataset.

• Step 3 in this step, the trained DLIB module in python will be used to see if the face was found. This will return True/ False and what this process will end up with is a N-dimensional list with true/ False values of all the persons in our dataset.

(34)

4.2.3 Image classification

To classify the result of the extracted features of the CNN. The trained network will return a 128-d embedding for each of the faces in our dataset.

Down below will be a step-by-step explanation of the classification part of the CNN.

• Step 1 load the known faces and embeddings.

• Step 2 load the input image and detect the face in the picture.

• Step 3 compute the facial embeddings for each face.

• Step 4 match each face in the input image to our known encodings, which will return true or false for each image in our dataset. Inter- nally, this is computing the Euclidean distance between the candidate embedding and all. Faces in our dataset.

• Step 5 Given our match list, we can compute the number of “votes”

for each name which means the number of True values associated with each name. After summarizing the votes, the face with the most corresponding votes will be the selected person.

4.3 Presentation attack detection with LBPH

The process of implementing the PAD is similar to the CNN. First of all, it has a training phase that consists of extracting features and ids of images to train a model for later use. Also a capture module has been created to capture faces in pictures which are used in the detection module implemented to detect if the person is fake or real. Rest of this chapter will explain more in detail how this work.

4.3.1 Face detection

The first step of training the PAD is to detect and save the persons face that the system wants to be able to see if there is fake or not. This picture will be base on how the pictures trained with the CNN look. The detection phase will save the person’s faces with the help of the openCV2 module Cascade- Classifier. The saved faces are saved in the corresponded grayscale picture in the same folder structure in figure 9. After the picture are loaded in the folder, an extraction and labeling process is started. This process works in the same way how the CNN did the preparation for training to save ids and the corresponded face in two lists which are converted to a pickle file to use later.

(35)

4.3.2 LBPH training

After generating the face and the id to the face, the process uses LBPH which in openCV takes two parameters, the faces and the id corresponding to the face. Furthermore, this will generate the extracted features for each face and crate a .yml file to save the information in form of the values of the histograms.

4.3.3 Image classification

After the training has been done, the last step is to detect if it is real or fake. The confidence to how close this picture showed to the system is to the trained picture is the key in this step. The system takes a picture as input and finds the face in the picture, then converting it into grayscale.

The grayscale picture will run the LPBH function on open CV2 to extract the features. This is then run thru the crated recognizer object in the library which have taken the . .yml file as argument for the constructor. After this, the system will see if the extracted feature is close to the system-trained model. With the predicted function LPBH have in the library provide. in the end, we will have the id and the confidence of how close the picture is to the real one. The confidence will work ass a threshold value to decide if it is real or fake.

(36)

5 Result

This chapter presents the result to meet up the concrete and verifiable goals in section 1.5. First of an investigation of what methods there are in face recognition and what methods to protect against presentation attacks are explained. After that, it shows the implementation of the system’s results and the testing framework followed by the face recognition system and the PAD system. In the end, testing results from the three cases will be presented.

5.1 Investigation of methods

A pre-study was conducted to investigate what type of methods exist to face recognition and how to protect against attacks in the form of presentation attacks. And what this chapter present is what this study found in the pre-study of investigation of methods, face recognition, and Presentation attacks.

5.1.1 Face recognition

In the pre-study of this project, the first thing was to lock after state-of-the- art face recognition methods. Which concluded that a neural network, and more exact, a convolutional neural network was the way to go. However, more exciting data gathered from this was that when these CNN face recognition papers received high accuracy, more than 90% were testes against attacks, which showed that these are weak against attacks. Furthermore, from this, the concept of presentation attack was discovered where false data are presented to the system to get access to it.

5.1.2 Presentation attacks

As mentioned earlier, a state of art face recognition system based on CNN has a significant vulnerability against presentation attacks. So, an investigation on how PAs work and how to protect against them was done, showing a couple of presentation attack detection systems. Some of these were done by looking at motion, and others the colors of the picture. Furthermore, some used CNN to train on data to detect these. Other states that the amount of data to have a good CNN pad was not available, so the combination of LBP and CNN was done. Furthermore, based on one article which did a PAD with the only LBP with promising results. Will this study will investigates the LBPH capability to work as a PAD based on what assumption we made in chapter 1.

(37)

5.2 Implementation of systems

Here the result of the Implemented system to accomplish the goals explained in section 1.5 will be presented. What this will cover is a face recognition system that had been developed using a pre-trained CNN model. Also, the corresponded PAD for the system suitable for running on low-end edge devices. However, the created testing framework that will run both and test the three cases created will be shown.

The result of the developed testing framework is a framework based on a console application. The created application has arguments to specify the training of the PAD and CNN. After the models have been trained, a Speci- fied text file containing the video’s location will iterate and the result printed in the console.

The results from CNN are the ability to choose how the face recognition is done on different media. It can use the web camera, a regular picture, a video file as input which then generates a new video with the detection in the video, Or just showing the video live without saving it to the file. 10 shows how a detected picture can look.

Figure 10: Cnn detection of a face

The PAD result is an iterative process of extracting the confidence value to

(38)

find a specific threshold value. This will then decide whether it is real or fake and output the value in the terminal as a real or fake label and the corresponding confidence..

5.3 Evaluation against the database

In this section, the results from the attacks on the face recognition system based on Three cases will be presented to the reader. These cases will look into the HETER value.

To better visualize and see the differences between a real and an attack, the histogram plot of two clients in the deadset has been created. This histogram shows the intensity of the pictures, as figure 11 shows. Figure 11b is a picture taken in the database, which is a real picture. And a big part of the trained model will be based on these type of images which have a con- strained environment with the same camera and lighting figure 11a shows an attack picture, and the differences of these two images can be seen in figure 11c and 11d

(a) Attack picture (b) Real picture

(c) Attack picture histogram (d) Real picture histogram Figure 11: Attack and real histogram distribution

(39)

Table 2: CNN baseline confusion matrix Total =120 Attack predicted No attack predicted

Actual attack 0 60 60

Actual no attack 1 59 60

1 119

Table 3: CNN baseline results obtain from matrix Case two results

Metod HTER(%) FAR FRR TRP TNR PPV NPV

CNN baseline FR 50,8 1 0,0164 0,98 0 0,5 0

5.3.1 Case one FR

This first case will cover the baseline protection against Presentation attacks with CNN. The system has used 60 non-attack pictures and 60 attacks. The conducted evaluation to do this test was with high-definition photos. The result of this attack is shown in table 2. This result is in the form of a confusion matrix where case one tested 120 total pictures. Where in this first case, CNN detected zero attacks. Furthermore, one image where miss-classified as an attack. The second two tables 3. Shows the calculated values of the evaluation metric explained in section 3.4. the interesting point to lockout for in this table is the half total error rate (HTER), False Rejection Rate (FRR), and False Acceptance Rate (FAR). These values for an ideal face recognition system should be as close to zero as possible. The obtained values from case one have an HTER of 50,8% with means more than half of the deadset was miss-classified. And by looking at the confusion matrix, these results also can be seen as 60 of 60 attacks were classified as no attack.

5.3.2 Case two PAD

The second case has cover the protection the created PAD has. This runs the same setup as case one to archive more comparable results to evaluate as the table 4 shows a big difference in the confusion matrix when the PAD runs. As in total, 12 images were miss-classified, and out of those, 5 were not an attack, and seven were classified as no attack. The obtained HTER from table 5 for the PAD where 9,97%.

Table 4: PAD confusion matrix Total =120 Attack No attack

Attack 54 7 61

No attack 5 54 59

59 61

(40)

Table 5: PAD baseline results obtain from matrix Case two results

Metod HTER(%) FAR FRR TRP TNR PPV NPV

PAD 9,97 0,115 0,0164 0,92 0,88 0,11 0,92

5.3.3 Case three PAD + FR

For the third case, the result from the first confusion matrix and the second one with the PAD will be combined to see how a case which both were running will behave. And as stated earlier, out of the 120 images, 60 were attacks, and in the first case, all 60 were classified as "No attack predicted"

when it was label as an "Actual attack," which is 0% of the calcification on the attacks. When the PAD is running the same 120 faces, seven faces were classified ass "No attack predicted" when it was label as an "Actual attack."

And this is 12% of the classified attacks. The pad also classified five images as "attack predicted" when there were "Actual no attack." And this is 8% of the real classified pictures.

So, in the end, the PAD + FR wrongfully classified 8% of the no attack picture. And 12% of the attack pictures. Furthermore, in the total of the 120 pictures, 10% were wrongfully classified in the last case.

(41)

6 Discussion

This chapter discuss the results which is presented in chapter 5.

6.1 Development of system

The development of the system is divided into two parts, one for the CNN face recognition and one for the PAD. In this section, a discussion about the two systems will be presented to the reader. First of all, as the feasibility study showed that LBP and CNN face recognition shared similar proper- ties for the extraction. In practice, this meant the same method to save the picture and load pictures could be used in both systems. which work with saving two lists of the names of the persons and the faces of each one. And the generation of a pickle file. This made implementation a lot smoother.

6.1.1 CNN face recognition

The created CNN face recognition system, which tested against the presentation attack database, had some interesting choices. First of all, this study did not look at face detection and only had the requirement to run on low- edge devices. So HOG and Haar-like cascade was used as it is the most straightforward and uncomplicated face detection method. Furthermore, the chosen CNN model, which was DLIB, was selected because the test requirement of the CNN was to have high accuracy and the simplicity to run on a native python environment.

6.1.2 Presentation attack detection

The related work done on the PAD showed that there are many different types of PAD. But the chosen one LBPH, had a simple way of training the model and getting the confidence value. This system does not take long to develop or test and is an excellent choice for protection against the sim- plest of attacks. Based on the result of the HTER value, the investment to developing this PAD is based on LBPH in an acceptable value.

6.2 Framework discussion

The crated framework to test this system ended up being complicated because the system was developed individually. The third case, which was supposed to test the pad + the FR with the same protocol case one and two, had run but only sending the accepted picture to the CNN. It was not done in time due to how modules. So a combination of the result of cases one and two was conducted, which will be discussed later.