Automatic Facial Occlusion Detection and Removal

(1)

Automatic Facial Occlusion Detection and Removal

Naeem Ashfaq Chaudhry

November 11, 2012

Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Niclas B¨ orlin

Examiner: Frank Drewes

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

(2)

(3)

Abstract

In our daily life, we are faced with many occluded faces. The occlusion may be from different

objects like sunglasses, mufflers, masks, scarves etc. Sometimes, this occlusion is used by

the criminal persons to hide their identity from the surroundings. In this thesis, a technique

is used to detect the facial occlusion automatically. After detecting the occluded areas, a

method for image reconstruction called aPCA (asymmetrical Principal Component Analysis)

is used to reconstruct the faces. The entire face is reconstructed using the non occluded

area of the face. A database of images of different persons is organized which is used in the

process of reconstruction of the occluded images. Experiments were performed to examine

the effect of the granularity of the occlusion on the aPCA reconstruction process. The

input mask image is divided into different parts, the occlusion for each part is marked and

aPCA is applied to reconstruct the faces. This process of image reconstruction takes a lot

of processing time so pre-defined eigenspaces are introduced that takes very less processing

time with very less quality loss of the reconstructed faces.

(4)

ii

(5)

List of Figures

1.1 Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion. . . 2 2.1 The first vector Z ₁ is in direction of maximum variance and second vector Z ₂

is in direction of residual maximum variance. . . . 6 2.2 Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface. . . . 8 2.3 The blue part represents the eigenspace of non-occluded regions whereas the

green part represents the pseudo eigenspace of the complete image. . . . 9 2.4 (a) and (b) represent the original images while (c) and (d) represent the

registered images. . . 11 3.1 (a) an occluded facial image. (b) Image division into 6 parts. (c) Image

division into 54 smaller parts (d) Image division into 486 parts. . . 14 3.2 (a) an occluded facial image. (b) Image division into blocks. (c) Each black

block represents an occluded block. . . . 15 4.1 (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces. . . . 18 4.2 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 19 4.3 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.2 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 19 4.4 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 21 4.5 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.4 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 21 4.6 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 22 4.7 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.6 (c). (c) Reconstructed image. (d) Non-occluded image. . . . . 22 4.8 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 23

v

(8)

vi LIST OF FIGURES

4.9 An example of the reconstructed face by level 3b image division (a) An oc- cluded image. (b) The occluded image masked by the mask from Figure 4.8 (d). (c) Reconstructed image. (d) Non-occluded image. . . . . 23 4.10 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 24 4.11 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.10 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 25 4.12 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 25 4.13 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.12 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 26 4.14 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 26 4.15 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.14 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 27 4.16 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 27 4.17 An example of the reconstructed face by level 3b image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.16 (d). (c) Reconstructed image. (d) Non-occluded image. . . . 28 4.18 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 28 4.19 An example of the reconstructed face by level 1 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.18 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 29 4.20 (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions. . 29 4.21 An example of the reconstructed face by level 2 image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.20 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 30 4.22 (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions. 30 4.23 An example of the reconstructed face by level 3a image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.22 (c). (c) Reconstructed image. (d) Non-occluded image. . . . 31 4.24 (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division. . 31 4.25 An example of the reconstructed face by level 3b image division. (a) An

occluded image. (b) The occluded image masked by the mask from Figure 4.24 (d). (c) Reconstructed image. (d) Non-occluded image. . . . 32 4.26 Occluded facial images used for construction of 6 eigenspaces. . . . 33 4.27 (a) An occluded image. (b) Detected occlusion by level 3b image division.

(c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d)

Reconstructed image using the eigenspace in (c). . . . 34

(9)

LIST OF FIGURES vii

5.1 Occlusion detection by different image division methods. (a) Occluded image.

(b) Occlusion detection by level 1 image division. (c) Occlusion detection by level 2 image division. (d) Occlusion detection by level 3a image division. (e) Occlusion detection by level 3b image division. . . 36 5.2 Reconstructed image by different image division methods. (a) An occluded

image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image di- vision. (e) Reconstructed image by level 3b image division. (f) Non-occluded image. . . . 38 5.3 Reconstructed image by different image division methods. (a) An occluded

image. (b) Reconstructed image by level 1 image division. (c) Reconstructed

image by level 2 image division. (d) Reconstructed image by level 3a image di-

vision. (e) Reconstructed image by level 3b image division. (f) Non-occluded

image . . . . 39

(10)

viii LIST OF FIGURES

(11)

List of Tables

5.1 Reconstruction quality of the complete image (PSNR)[dB] for granularity effect 37 5.2 Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for

granularity effect . . . . 37 5.3 Number of Pixels used in Reconstruction . . . 37 5.4 Processing Time (sec) for granularity effect . . . . 38

ix

(12)

x LIST OF TABLES

(13)

Chapter 1

Introduction

1.1 Background

Face recognition has been one of the most challenging and active research topics in computer vision for the last several years (Zhao et al., 2003). The goal of face recognition is to recognize a person even if the face is occluded by some object. A face recognition system should recognize a face independently and robustly as possible to the image variations such as illumination, pose, occlusion, expression, etc. (Kim et al., 2007). A face is occluded if some area of the face is hidden behind an object like a sunglass, a hand, a mask, as seen in Figure 1.1. Face occlusions can degrade the performance of face recognition systems including humans.

Recent research projects e.g. (M.Al-Naser and S¨ oderstr¨ om, 2011) have used pre-determined occluded areas in standardized positions. After occlusion detection, aPCA (asymmetrical Principal Component Analysis) (S¨ oderstr¨ om and Li, 2011) was used for entire face recon- struction. aPCA is used to estimate an entire image based on the subset of the image, e.g. to reconstruct a partially occluded facial image using the non-occluded facial parts of the image. The experiments used a small database (n = 116) of facial images with no classification (Martinzer and Benavente, 1998). A property of the reconstructed images in (M.Al-Naser and S¨ oderstr¨ om, 2011) is that the reconstructed images have sharp edges between the original and reconstructed regions.

This application can be used by the law enforcement agencies, access control systems, surveillance at different public places like ATM machines, air ports etc.

1.2 Goals of the thesis

The overall goal of this thesis is to improve the performance of aPCA for reconstruction of occluded regions of facial images.

The primary goal is to develop an algorithm for automatic detection and reconstruction of facial occlusions. The algorithm should be automatic and detect smaller occlusions com- pared to previous work. Furthermore, arbitrary occlusion should be handled, i.e. occlusions of any part of the face.

A secondary goal is to develop an algorithm for smoothing the reconstructed images to reduce the edges between the original and reconstructed regions.

1

(14)

2 Chapter 1. Introduction

Figure 1.1: Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion.

A tertiary goal is to extend the AR database with more images and to classify the images individually according to gender, ethnicity etc.

1.3 Related work

1.3.1 Occluded face reconstruction

M.Al-Naser and S¨ oderstr¨ om (2011) reconstructed the occluded regions using asymmetrical principal component analysis (aPCA). The occluded facial regions were estimated based on non-occluded facial regions. They did not detect the occlusion automatically rather occlusion on the facial images was marked manually. Jabbar and Hadi (2010) detected the face area using a combination of skin color segmentation and eye template matching. They used fuzzy c-mean clustering algorithm for detection of occluded facial regions. When the occluded region was one of the symmetric facial feature such as eye, then this feature is used to recover the occluded area. When the occluded area was not one of the symmetric facial feature then they used the most similar mean face from the database.

1.3.2 Facial occlusion detection

Min et al. (2011) performed the facial occlusion detection caused by sunglasses and scarves

using the Gabor wavelet. The face image were divided into an upper and lower half. The

upper part was used to detect sunglass occlusions while the lower part was used for scarf

occlusion detection. Kim et al. (2010) proposed a method to determine if a face is occluded

by measuring skin color area ratio (SCAR). Oh et al. (2006) found the occlusion by first

dividing the facial images into a finite number of local disjoints patches and then examine

each patch separately.

(15)

Chapter 2

Theory

2.1 Principal Component Analysis (PCA)

PCA (Jollifie, 2002) is a mathematical procedure that is used to transform potentially cor- related variables into uncorrelated variables. Suppose we have a data matrix of observations of N correlated variables X ₁ ,X ₂ ,. . . ,X _N , PCA will transform the X _i variables into N new variables Y _i that are uncorrelated. The variables Y _i are called principal components. The first principal component is in the direction of the largest variance of the origin. The other principal components are orthogonal to each other and represent the largest residual vari- ance, see Figure 2.1.

PCA can be used as a dimension reduction method to represent multidimensional, highly correlated data, with fewer variables. PCA is used for, e.g. information extraction, image compression, image reconstruction and image recognition.

2.1.1 PCA method/model

Image-to-vector conversion

A 2-dimensional image is transformed to a 1-dimensional vector by placing the rows side by side, i.e.

x = [p 1 , p 2 , . . . , p r ] ^T , (2.1) where p i is the ith row of p and r is total number of rows. Each image is stored in a vector and each vector is stored in a matrix column wise.

Subtract the Mean

The mean of each vector is calculated and is subtracted from each vector to produce a vector with zero mean. Let I 0 represent the mean then it is calculated as

I ₀ = 1 N

N

X

j=1

I _j , (2.2)

where N is the number of variables I.

3

(16)

4 Chapter 2. Theory

Figure 2.1: The first vector Z ₁ is in direction of maximum variance and second vector Z ₂ is in direction of residual maximum variance.

Calculate the covariance matrix

The covariance of the mean centred matrix is calculated as

Cov = W ^T W, (2.3)

where W is a r-by-c sized matrix composed of the column vectors (I _i − I ₀ ). Cov is a square matrix of size r-by-c.

Calculate the eigenvectors and eigenvalues of covariance matrix

The Singular Value Decomposition (SVD) Strang (2003) of a matrix A (r-by-c) decomposes

E _r×c = U _r×r Σ _r×c V _c×c ^T = [u ₁ , u ₂ , . . . , u _r ]





 σ ₁

σ ₂ . . .

σ _r 0 .. . 0







[v ₁ , v ₂ , . . . , v _c ] ^T , (2.4)

where U is an r-by-r unitary matrix, σ is an rxc rectangular diagonal matrix and V is an cxc unitary matrix. In general, U and V are the left and right singular vectors, respectively and the singular values σ _i ≥ 0 are sorted in descending order. If A is symmetric positive definite, U = V and contain the eigenvectors and σ _i are the eigenvalues.

Choosing components and forming a feature vector

The eigenvector that is associated with the highest eigenvalue represents the greatest vari- ance in the data whereas the eigenvector associated with lowest eigenvalue represents the least variance. The eigenvalues decrease in an exponential pattern (Kim, 1996). It is esti- mated that 90% of the total variance is contained in the first 5% to 10% of the dimensions.

The eigenvectors associated with low eigenvalues are less significant and can be ignored. A

(17)

2.1. Principal Component Analysis (PCA) 5

feature vector b is constructed by selecting M eigenvectors associated with highest eigenval- ues, from a total of N eigenvectors, i.e.

b = (e ₁ , e ₂ , ..., e _M ). (2.5)

Deriving the new dataset

Take transpose of Feature Vector b and multiply it with W to get the final dataset Φ

Φ = b ^T W. (2.6)

2.1.2 PCA for images

The PCA is computed as the SVD of the covariance matrix Cov of the facial images. An Eigenspace φ is created by using the equation

φ j = X

i

b ij (I i − I 0 ), (2.7)

where b ij is eigenvector of covariance matrix {(I i − I 0 ) ^T (I j − I 0 )}. Eq. 2.6 and 2.7 are the same.

The projection coefficients {α j } = α ₁ , α 2 , α 3 ...α x for each facial image are calculated as

α j = φ j (I − I 0 ) ^T . (2.8)

Each facial image is represented by taking the sum of the mean of all pixels and the weighted principal components. The representation becomes error free if all N principal components are used

I = I 0 +

N

X

j=1

α j φ j . (2.9)

The final facial image is constructed by

I = I 0 +

M

X

j=1

α j φ j , (2.10)

where M is number of selected principal components that are used for reconstruction of the facial image. An image with negligible quality loss can be represented by a few principal components because the first 5–10 % of the eigenvectors can represent more than 90% of the variance in the data (Kim, 1996).

PCA achieves compression since fewer (M) than the original dimensions (N) are used to represent the images. A PCA model also allows images to be represented with only a few values (α ⁰ s) and this is how PCA works for image representation.

2.1.3 Eigen faces

The eigenvectors or principal components of the distribution of faces are the eigenfaces.

Eigenfaces are like the ghostly faces. The first 3 eigenfaces obtained from AR database

described in section 3.1 can be seen in Figure 2.2. Each individual face can be represented

by a linear combination of eigenfaces. Each face is approximated using the best eigenfaces

that have the most variance within the set of face images. The best M eigenfaces span an

M-dimensional subspace-“face space”-of all possible images (Turk and Pentland, 1991).

(18)

6 Chapter 2. Theory

Figure 2.2: Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface.

Figure 2.3: The blue part represents the eigenspace of non-occluded regions whereas the green part represents the pseudo eigenspace of the complete image.

2.2 Asymmetrical PCA (aPCA)

aPCA is a method for estimating the entire space based on a subspace of this space. This method finds the correspondence between pixels in non-occluded regions and pixels behind occluded regions.

2.2.1 Description of aPCA

aPCA is an extension of PCA (Principal Component Analysis). By using aPCA, entire faces are reconstructed by estimating the occluded regions based on the non-occluded regions of the images. Intensity (appearance) of non-occluded pixels is used to estimate the intensity of occluded pixels. In aPCA, two eigenspaces are constructed, one from non-occluded areas of occluded images where the eigenvectors are orthogonal to each other and the other space is the pseudo eigenspace that is constructed from the eigenvectors of the non-occluded image regions. In the pseudo eigenspace, the eigenvectors are not orthogonal, as seen in the Figure 2.2.

2.2.2 aPCA calculation

In aPCA, a pseudo eigenspace is created. It models the correspondence between the pixels in the images but only non-occluded parts are orthogonal. Let I ^no represents the non-occluded image parts I. I ^no is modelled in an eigenspace Φ ^no = φ ^no ₁ , φ ^no ₂ , φ ^no ₃ , . . . , φ ^no _N using the formula

φ ^no _j = X

i

b ^no _ij (I _i ^no − I ₀ ^no ) (2.11)

(19)

2.3. Skin color detection 7

where b ^no _ij are eigenvector values of the covariance matrix {(I _i ^no − I ₀ ^no ) ^T (I _j ^no − I ₀ ^no )} and I ₀ ^no is mean of the non-occluded regions,

I ₀ ^no = 1 N

N

X

j=1

(I _j ^no ). (2.12)

Eigenvectors of the non-occluded parts are used to make them orthogonal while the occluded parts are modelled according to the correspondence with the non-occluded parts. The pseudo eigenspace Φ p is calculated as

Φ ^p _j = X

i

b ^no _ij (I i − I 0 ), (2.13)

where I i is the original image and I 0 is the mean of the original images.

Projection is used to extract the coefficients {α ^f _j } from the eigenspace Φ ^no

α ^no _j = Φ ^no _j (I ^no − I ₀ ^no ) ^T . (2.14) The complete facial image ˆ I is reconstructed as

I = I ˆ 0 +

M

X

j=1

α _j ^no Φ ^p _j , (2.15)

where M is the selected number of pseudo components that are used for the reconstruction.

By using the above calculated projection coefficients, a complete image can be reconstructed from only non-occluded parts of the image.

2.2.3 aPCA for reconstruction of occluded facial region

With the eigenspace modelling the non-occluded facial regions and pseudo eigenspace mod- elling the entire face, it is possible to use aPCA to estimate how a face image looks like behind the occlusions. When the spaces are created, the entire face needs to be visible so that the correspondence between the spaces can be modelled with aPCA.

The eigenspace is created according to Eq. 2.11 and a pseudo eigenspace is constructed according to Eq. 2.13. The correspondence between the facial regions is captured in these two spaces. The non-occluded regions can then be used to extract projection coefficients α (Eq. 2.14) meaning that only non-occluded pixels affect the representation. When the pseudo eigenspace is used with these coefficients to recreate an image of the entire face (Eq.

2.15), the content of the previously occluded pixels is calculated based on their relationship with the non-occluded pixels.

2.3 Skin color detection

This section follows (Cheddad et al., 2009).

He uses 2 approximations l and ˆ l for skin color detection. l is calculated as

l(x) = ((r(x), g(x), b(x)) ∗ α) (2.16) where * represents matrix multiplication and the transformation matrix

α = [0.298, 0.587, 0.140]. (2.17)

(20)

8 Chapter 2. Theory

Figure 2.4: (a) and (b) represent the original images while (c) and (d) represent the registered images.

The matrix ˆ l is calculated as

ˆ l(x) = arg x{1,2,...,n} max(G(x), R(x)). (2.18) An error signal for each pixel is calculated as

e(x) = l(x) − ˆ l(x), (2.19)

and classified as skin or not skin by

f _skin (x) =

1, if 0.02511 ≤ e(x) ≤ 0.1177

0, otherwise . (2.20)

2.4 Image registration

Image registration is the process of transforming a set of images into one coordinate system without changing the shape of the images. In this process, one image is selected as the base image and spatial transformations are applied on the other images so that these images align according to the base image. Image registration is performed as a preliminary step in order to apply different image processing operations on the dataset that have same coordinate system. If facial images are being aligned then after alignment, all the images will have their facial features like mouth eyes, nose, etc. in the same position.

2.4.1 Translation

Translation is a process of geometric transformation in which an image element located at a position (x 1 , y 1 ) is shifted to a new position (x 2 , y 2 ) in the transformed image. The translation operation is defined as

x 2

y 2

= x 1

y 1

+ t x

t y

(2.21)

where t _x and t _y are the horizontal and vertical pixels displacements, respectively.

(21)

2.5. Peak signal-to-noise ratio (PSNR) 9

2.4.2 Rotation

Rotation is a geometric transformation in which the image elements are rotated by a specified rotating angle θ. The rotation operation is defined as

x 2

y 2

= cos θ − sin θ

− sin θ cos θ

x 1

y 1

(2.22)

2.4.3 Scaling

Scaling is a geometric transformation that can be used to reduce or increase the size of the image coordinates. The scaling operation is defined as

x ₂ y ₂

= c _x 0 0 c _y

x ₁ y ₁

(2.23)

2.4.4 Affine transformation

Affine transformation is a linear 2-D geometric transformation that uses rotation, scaling and translation operations. It maps variables located at position (x ₁ , y ₁ ) in an input im- age into variables located at (x ₂ , y ₂ ) in an output image by applying a linear combination of translation, rotation, scaling and/or shearing (non-uniform scaling in some direction) operations. The Affine Transformation takes the form

x 2

y 2

= a 11 a 12

a 21 a 22

x 1

y 1

+ t x

t y

(2.24)

Facial images used in this thesis are aligned using Affine Transformations.

2.5 Peak signal-to-noise ratio (PSNR)

PSNR is used to calculate the ratio between the maximum possible value of a signal and the power of distorting noise that affects the quality of its representation. It is often used as a benchmark level of similarity between constructed image and the original image (Santoso et al., 2011). PSNR compares the original image with the coded/decoded image to quan- tify the quality of data that is the output of decompressing the encoded data. A higher PSNR value means that the reconstructed data is of better quality. The mathematical representation of the PSNR is

P SN R = 10 log ₁₀

max

²

M SE

, (2.25)

where max is the maximum possible value of the image pixels and M SE is the mean squared difference between the compressed and the original data.

M SE =

X

P

m=1 Y

P

n=1

[I 1 (m, n) − I 2 (m, n)] ²

XY (2.26)

where I ₁ is the original image, I ₂ is the reconstructed image, X and Y are the number of

rows and columns respectively.

(22)

10 Chapter 2. Theory

(23)

Chapter 3

Method

3.1 The AR face database

To perform the experiments, AR Face database (Martinzer and Benavente, 1998) was used.

This database contains more than 4000 facial images of 126 persons including both male and female (70 men and 56 women). The database contains images with scarf and sunglasses occlusions and non-occluded images with different facial expressions. The original size of the images is 768x576 pixels. The images were taken in controlled conditions with no restrictions on wearing and style.

3.2 Automatic occlusion detection

3.2.1 Replace white color with black color

The skin color detection method of section 2.3 classifies the white pixels as skin pixels.

However, since white color is not a skin color, rather it is an occlusion. Therefore, white pixels are always replaced by black pixels before skin color detection. A pixel is classified as white if its R, G, B values are all greater than 190, where 255 is the maximum value.

3.2.2 Image cropping

The original size of the images is 768x576 pixels. These images contain a lot of back ground area that effect the quality of reconstructed images. Therefore the images are cropped to a size of 171x144 pixels.

3.2.3 Image division

The image (171x144)is divided into 6 parts: 2 head parts, 2 eyes parts and 2 mouth parts, see Figure 2.3 (b). The size of each head part is 45 × 72 pixels, the size of each eyes part is 54 × 72 pixels and the size of each mouth part is 72 × 72 pixels.

In the second step, each part is further divided into 9 sub parts, see Figure 2.3 (c). By doing this, smaller facial occlusions can also be detected. In the third step, each part of second step is further divided into 9 sub parts, see Figure 2.3 (d).

11

(24)

12 Chapter 3. Method

Figure 3.1: (a) an occluded facial image. (b) Image division into 6 parts. (c) Image division into 54 smaller parts (d) Image division into 486 parts.

Figure 3.2: (a) an occluded facial image. (b) Image division into blocks. (c) Each black block represents an occluded block.

3.2.4 Occlusion detection for each block

To detect the occlusion for each block, the skin color information is used. If a pixel is not a skin pixel, it is marked as an occluded pixel. If 25% of the pixels in a block are non-skin pixels, the block is marked as an occluded block.

3.3 Occluded face reconstruction

After facial occlusion detection, a column vector is created that contains only the non- occluded parts of each image. The column vectors are stored in a matrix that contains the corresponding non-occluded parts of the facial images in the database. Each image of the database is also converted into a vector and stored in a matrix. If there are 100 images in the database then this matrix will contain 100 vectors. The mean of each vector of non-occluded matrix is calculated and subtracted from each value of the vector. Similarly, the mean of each vector of the original facial matrix is calculated and subtracted from each value of the vector. This produces a dataset whose mean is zero. The covariance cov of the non-occluded facial matrix is calculated as described in section 2.1.1. The eigenvector and eigenvalues of the covariance matrix are calculated using the SVD. An eigenspace is constructed from the non-occluded parts of the images. Similarly, a pseudo eigenspace is constructed from all parts of the images in the database. The projection is used to extract the coefficients from the eigenspace. These extracted coefficients will be used for facial images reconstruction.

A specific number M = 50 of eigenvectors are used for the reconstruction of the images.

The choice of M = 50 was found by initial experiments. The final facial images data is constructed using the Eq. 2.15. At the last step, each vector of the matrix is reshaped to get the R, G, and B values for each image and to reconstruct the facial images.

3.3.1 PSNR calculation

PSNR of the input image and reconstructed image is calculated to check the quality of the

reconstructed image. If value of PSNR is more than 30, then it is normally considered that

(25)

3.3. Occluded face reconstruction 13

the reconstructed image is of good quality (Wikipedia, 2012).

(26)

14 Chapter 3. Method

(27)

Chapter 4

Experiment

4.1 Granularity effect

This experiment examines the effect of the granularity of the occlusion on the aPCA recon- struction process. The image is divided into 6 parts at the first step. Occlusion for each facial part is determined. The non-occluded parts of the image are used to construct the eigenspace whereas the entire image is used to construct the pseudo eigenspace.

At the second step, the image is first divided into 6 parts and occlusion is determined for each block. If a part is occluded then this part is further divided into 9 sub parts and the occlusion process is repeated.

At the third step, the image is first divided into 6 parts then each part into 9 sub parts based on occlusion detection. Occlusion for each of these sub parts is determined. If any of the block is occluded, it is further divided into 9 sub parts and occlusion is determined for these parts. These small parts are used to construct the eigenspace and the entire image is used to construct the pseudo eigenspace.

4.1.1 Metric

PSNR is used as a metric to determine the results of the granularity effect. PSNR is calculated for the entire image and for only the reconstructed part of the image. The number of non-occluded pixels used for encoding in each experiment are also calculated.

4.1.2 Sunglasses scenario

In this scenario, the mask input image is occluded by the sunglasses. The image is divided into sub parts, the occlusion is detected for each of these parts individually and the full faces are reconstructed using aPCA image reconstruction method. The average PSNR of all the reconstructed faces is calculated to determine the quality of the reconstructed facial images and the average PSNR value of all the reconstructed occluded parts are also calculated.

Furthermore, the number of pixels used in the reconstruction process and the time taken by each division method are recorded.

In Figure 4.1, the image (a) is the original image, (b) is the input mask image occluded with sunglasses and (c) represents the two eigenspaces. The occluded input mask image (b) will be used in the below given 3 test cases. The green ellipse represents the pseudo eigenspace that is constructed from the non-occluded images as given in the image (a)

15

(28)

16 Chapter 4. Experiment

Figure 4.1: (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces.

and non-occluded parts of the occluded images. The blue ellipse represents the eigenspace constructed from the non-occluded parts of the occluded images.

Level 1 image division

In level 1 image division method, the mask input image is divided into 2 head parts, 2 eyes parts and 2 mouth parts, see Figure 4.2(b). The occlusion for each part is detected separately. The full faces are reconstructed as described in section 3.3. In Figure 4.2, the image a represents mask input image occluded with sunglasses, the image b represents that the image is divided into 6 different parts and in the image c, the area marked with the black color is representing the detected occlusion in the eye parts. Note that by dividing the image into 6 parts does not detect all the occlusion and also some non-occluded regions are considered as occluded. The background regions in the 2 mouth parts are not detected by level 1 image division method.

The reconstruction results of level 1 image division can be seen in Figure 4.3. The reconstructed image has some circles around the eyes. This is due to some images with the eye glasses in the database, so the corresponding eigenvectors leave some imprints on the reconstructed images.

After reconstruction, the average PSNR of the complete reconstructed faces is calculated and also of the occluded reconstructed regions only. Furthermore, the number of pixels used in the reconstruction process are recorded. If more pixels are used in the reconstruction process, the reconstructed images should be better with higher average PSNR value.

Level 2 image division

In the level 2 image division method, the 6 parts of level 1 are further divided into 9 sub parts, see Figure 4.4(b). Each of these parts undergoes occlusion detection process and aPCA is applied to reconstruct the facial images.

In Figure 4.4, the image a represents the mask input image occluded with sunglasses, the

image b represents that the image is divided into 54 sub parts and in the image c, the black

blocks represent the detected occlusions. The white background area that is not part of

mouth is considered as an occlusion. This background occlusion is also detected by dividing

the image into smaller parts. The level 2 image division method also marks some occluded

area as non-occluded area, see Figure 4.4(c) where some parts of sunglasses are marked as

non-occluded. The Figure 4.5 is an example of the image reconstruction using level 2 image

(29)

4.1. Granularity effect 17

division. Note that there are prominent circles around the eyes, the black background areas near the cheeks are not constructed well.

Level 3a image division

In the level 3a image division method, the 54 parts of level 2 are further divided into 9 sub parts, see Figure 4.6(b). The complete image is divided into 486 very small parts and occlusion is detected for each part separately. After occlusion detection, aPCA is applied to reconstruct the faces. Due to very small size of each part, very small occlusions can also be detected.

In Figure 4.6, the image a represents the mask input image occluded with sunglasses, the image b represents that the image is divided into 486 sub parts and in the image c, the black blocks represent the detected occlusions. The Figure 4.6 (c) shows that it has detected almost all the facial occlusion but also has also marked the non-occluded area as the occluded area, hair and eyebrows are marked as occluded. The Figure 4.7 shows the face reconstructed by level 3a image division. The quality of the reconstructed image is better than level 1 and level 2 with less imprints of eye glasses around the eyes.

Level 3b image division

In the level 3b image division method, the 6 parts of level 1 are further divided into 9 sub parts. The occlusion is detected for each of these parts separately. If a part is occluded, it is further divided into 9 sub parts, see Figure 4.8(c). The occlusion is detected for these very small parts and aPCA is applied to reconstruct the faces.

In Figure 4.8, the image (a) represents the mask input image occluded with sunglasses, the image (b) represents the detected occlusions by level 2 image division, the occlusion is marked with the black color, the image (c) represents that the detected occluded area by level 2 image division is further divided into sub parts and again the occlusion is detected for these very small parts, the image (d) represents the occlusion detection by level 3b image division method.

Note that background and sunglasses occlusion is detected and very less occluded area is marked as occluded. From the Figure 4.8 (d), we can note that nose and cheeks area near the sunglasses that was marked as occluded in the Figure 4.8 (b) is now marked as non-occluded area. The Figure 4.9 is an example of the image reconstruction using this method.

4.1.3 Scarf scenario

In this scenario, the input image is occluded by the scarf so that all the mouth area is occluded. The image is divided into sub parts, the occlusion is detected for each of these parts individually and the full faces are reconstructed using the aPCA method. The av- erage PSNR of all the reconstructed faces is calculated to determine the quality of the reconstructed facial images and the average PSNR value of all the reconstructed occluded parts are calculated. Furthermore, the number of pixels used in the reconstruction process and the time taken by each division method are recorded.

The figures 4.10 to 4.17 represent the 4 methods of image division applied on the mask

input image occluded with scarf, occlusion detected by each of these methods and the

reconstructed faces reconstructed using the 4 image division methods by applying the aPCA.

(30)

18 Chapter 4. Experiment

Figure 4.2: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

Figure 4.3: An example of the reconstructed face by level 1 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.2 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.4: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.5: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.4 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.6: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

(31)

4.1. Granularity effect 19

Figure 4.7: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.6 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.8: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c) Level 3b image division. (d) Occlusion detection by level 3b image division.

Figure 4.9: An example of the reconstructed face by level 3b image division (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.8 (d). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.10: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

Figure 4.11: An example of the reconstructed face by level 1 image division. (a) An occluded

image. (b) The occluded image masked by the mask from Figure 4.10 (c). (c) Reconstructed

image. (d) Non-occluded image.

(32)

20 Chapter 4. Experiment

Figure 4.12: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.13: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.12 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.14: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

Figure 4.15: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.14 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.16: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division.

(33)

4.2. Pre-defined eigenspaces 21

Figure 4.17: An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.16 (d). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.18: (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions.

4.1.4 Cap and sunglasses occlusion

In this scenario, the head is covered by the cap and the eyes are covered with the sunglasses.

The mouth parts contain some background occlusion so some/all areas of all the 6 parts of the mask input image are occluded. The input image is divided into different parts, occlusion is detected for each part and aPCA is applied to reconstruct the faces. The average PSNR of the complete reconstructed images and of only occluded reconstructed parts are calculated to determine the quality of the reconstructed images. The number of pixels used in the reconstruction process are recorded to determine the affect of the non-occluded pixels on the quality of the reconstructed faces. The processing time of aPCA process is also recorded.

The figures 4.18 to 4.25 represent the 4 methods of image division applied on the mask input image occluded with cap and sunglasses, occlusion detected by each of these methods and the reconstructed faces reconstructed using these image division methods by applying the aPCA.

4.2 Pre-defined eigenspaces

In this experiment, 6 different pre-defined eigenspaces are created and the pseudo eigenspace is constructed for each of them on all 116 images. The pre-defined eigenspaces have dif-

Figure 4.19: An example of the reconstructed face by level 1 image division. (a) An occluded

image. (b) The occluded image masked by the mask from Figure 4.18 (c). (c) Reconstructed

image. (d) Non-occluded image.

(34)

22 Chapter 4. Experiment

Figure 4.20: (a) An occluded image. (b) Level 2 image division. (c) Detected occlusions.

Figure 4.21: An example of the reconstructed face by level 2 image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.20 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.22: (a) An occluded image. (b) Level 3a image division. (c) Detected occlusions.

Figure 4.23: An example of the reconstructed face by level 3a image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.22 (c). (c) Reconstructed image. (d) Non-occluded image.

Figure 4.24: (a) An occluded image. (b) Occlusion detection by level 2 image division. (c)

Level 3b image division. (d) Occlusion detection by level 3b image division.

(35)

4.2. Pre-defined eigenspaces 23

Figure 4.25: An example of the reconstructed face by level 3b image division. (a) An occluded image. (b) The occluded image masked by the mask from Figure 4.24 (d). (c) Reconstructed image. (d) Non-occluded image.

ferent kinds of sunglasses occlusions. When occlusion is detected, then the pre-defined eigenspace which has the least difference between the detected occlusion and the pre-defined eigenspaces is selected. This eigenspace is used to reconstruct the image with aPCA. The closest eigenspace is selected based on the positions of the occlusion in the eigenspace and the detected occlusion. If the occlusion of a pixel is the same in both versions, the score is 0, but if they are different, the score is 1. Then the eigenspace with the lowest score is selected.

4.2.1 Metric

PSNR is used as a metric in two different ways. Calculate the PSNR for the entire image and for only the reconstructed parts. This would only need to be done for the 6 different pre- defined occlusions. The number of non-occluded pixels used for encoding in each experiment are also recorded.

4.2.2 Experiment description

The pre-defined eigenspaces are constructed and saved in some storage media. These eigenspaces are created by dividing the occluded images of the Figure 4.19 as described in the section 4.1.2. A pseudo eigenspace and 6 eigenspace for each of the images in the Fig- ure 4.26 are constructed and saved at some storage media. A vector containing the occlusion information about each part is also created and saved to be used later. If a part is occluded, 1 is stored in respective vector element otherwise 0 is stored. The occlusion of the mask input image is detected by following the section 4.1.2. A vector is created that contains the occlusion information about each part. This vector is compared to each vector of the pre- defined eigenspaces to calculate the number of occluded parts that have the same position in both input mask image and the image used in construction of pre-defined eigenspace.

The eigenspace that has the maximum number of same occlusion positions is selected for the reconstruction of the facial images. The average PSNR of the complete reconstructed facial images and of occluded reconstructed areas is calculated to determine the quality of the reconstructed facial images. The time taken to perform the aPCA operation is recorded to determine the efficiency of pre-defined eigenspaces.

The 6 faces having sunglasses occlusion that are used in the construction of 6 pre-

defined eigenspaces can be seen in Figure 4.26. In the Figure 4.27, the image (a) is the

mask input image, the image (b) represents the occlusion detection by level 3 (b)image

division, the image (c) represents the pre-defined eigenspace that is selected based on the

detected occlusion in the image (b), the image (d) represents the reconstructed image using

pre-defined eigenspace.

(36)

24 Chapter 4. Experiment

Figure 4.26: Occluded facial images used for construction of 6 eigenspaces.

Figure 4.27: (a) An occluded image. (b) Detected occlusion by level 3b image division.

(c) Pre-defined eigenspace most similar to the detected occlusion in (c). (d) Reconstructed

image using the eigenspace in (c).

(37)

Chapter 5

Results

In this chapter, the results of the experiments performed are described. The chapter is divided into three parts. In the first part, the results of 4 image division methods for au- tomatic occlusion detection are discussed and images showing the output of these methods are displayed. In the second part, the reconstruction results based on the 4 methods of occlusion detection are discussed, tables containing the average PSNR values for the re- constructed faces, for the only reconstructed areas and table containing processing time for each image division method are displayed. In the third part, the discussion about the pre-defined eigenspaces will be made to determine the efficiency and reconstruction quality of the pre-defined eigenspaces. The tables containing the processing time to reconstruct the faces with and without pre-defined eigenfaces and average PSNR values of reconstructed faces are to be displayed and discussed.

5.1 Occlusion detection results

The Figure 5.1 represents the occlusion detection by different image division methods, image (a) represents the mask input image occluded with sunglasses , image (b) represents the occlusion detection by level 1 image division, (c) represents the occlusion detection by level 2 image division (d) represents the occlusion detection by level 3a image division and (e) represents the occlusion detection by level 3b image division method. The grey blocks represent the marked occluded areas.

In the level 1 image division method, the complete image is divided into 6 large parts.

The size of each part is large and to determine the occlusion, its 25% area should be occluded. Due to large size of each part, less occlusion is detected. The image (b) shows that the occlusion in both eyes parts is detected. Since white background in mouth part is also an occlusion but it is not detected because this occlusion covers less than 25% of the corresponding parts. The image (b) also shows that some non-occluded area in both eyes part is also marked as an occlusion.

In the level 2 image division method, the size of each part is small so it can detect the small occlusions. The image (c) shows that the eyes occlusions and background occlusions in mouth parts are detected whereas less occluded area is marked as non-occluded area.

But still the size of each part is large, some non-occluded area is also marked as occluded area and less pixels are available for the reconstruction process. Many experiments were performed, the level 3a image division showed the best results for occlusion detection as compared to all other methods.

25

(38)

26 Chapter 5. Results

Figure 5.1: Occlusion detection by different image division methods. (a) Occluded image.

(b) Occlusion detection by level 1 image division. (c) Occlusion detection by level 2 image division. (d) Occlusion detection by level 3a image division. (e) Occlusion detection by level 3b image division.

In the level 3a image division method, the size of each part is very small so it can detect very small occlusions. The image (d) shows that it has detected almost all the occlusion while marking some non-occluded area as an occlusion. The image (d) shows that it has marked eyes and background occlusion correctly but has also marked eyebrows and hair as an occlusion.

Occlusion detection by level 3b image division, the process is divided into two steps.

In the first step, the image is divided as described in the section 4.1.2 and the occlusion is detected for each part. This process detects the small occlusions whereas some non-occluded area is marked as an occlusion. In the second step, the occluded area marked at the first step is further divided into sub parts and occlusion is detected for each sub part. By doing this, the non-occluded areas marked as occluded area in the first step are now marked as non-occluded areas and more pixels gets available for the reconstruction of faces, see Figure 5.1 (e). The level 3b is also a good occlusion detection method.

5.2 Reconstruction quality results

The quality of reconstructed faces is determined by PSNR. The average PSNR is calculated of the complete reconstructed faces and of the reconstructed occluded parts only. Table 5.1 shows the average PSNR of the complete reconstructed faces and Table 5.2 shows the PSNR for the reconstructed occluded parts. In tables 5.1, 5.2, 5.3 and 5.4, Level 1 shows the reconstruction of faces by level 1 image division, Level 2 shows the reconstruction of faces by level 2 image division, Level 3a shows the reconstruction of faces by level 3a image division and Level 3b shows the reconstruction of faces using level 3b image division method. The number of pixels used in the reconstruction of faces are recorded to determine the impact of number of non-occluded pixels on the quality of the reconstructed faces. Furthermore, the processing time taken by each image division method is also recorded.

Table 5.1 contains the average PSNR values of all 116 reconstructed faces for 3 different types of occlusions. The level 1 image division has the maximum average PSNR value in case of sunglasses occlusion whereas the level 3a image division has maximum average PSNR value in scarf and cap & sunglasses occlusion.

Table 5.2 contains the average PSNR values of the reconstructed occluded parts only for 3 different types of occlusions. The level 1 image division has the maximum average PSNR value in sunglasses and cap & sunglasses occlusion while the level 3a has maximum average PSNR value in the scarf occlusion.

Table 5.3 contains the number of non-occluded pixels that are used in the reconstruction

of the facial images. The quality of the reconstructed faces generally increases with the

increase of number of non-occluded pixels.

(39)

5.2. Reconstruction quality results 27

Table 5.1: Reconstruction quality of the complete image (PSNR)[dB] for granularity effect Occlusion type Level 1 Level 2 Level 3a Level 3b

Sunglasses 23.46 23.22 23.19 23.33

Scarf 19.85 19.85 20.01 19.87

Cap and sunglasses 19.95 20.32 20.38 20.34

Table 5.2: Reconstruction quality of the occluded reconstructed parts (PSNR)[dB] for gran- ularity effect

Occlusion type Level 1 Level 2 Level 3a Level 3b

Sunglasses 23.30 20.99 20.89 20.78

Scarf 18.46 18.55 18.77 18.54

Cap and sunglasses 21.66 19.09 18.88 18.99

Table 5.3: Number of Pixels used in Reconstruction Occlusion type Level 1 Level 2 Level 3a Level 3b

Sunglasses 50544 49464 47640 53496

Scarf 42768 42768 43512 45264

Cap and sunglasses 31104 42768 44736 46656

Table 5.4: Processing Time (sec) for granularity effect Occlusion type Level 1 Level 2 Level 3a Level 3b

Sunglasses 24.04 37.77 40.33 43.60

Scarf 22.53 36.81 38.80 41.48

Cap and sunglasses 25.93 38.71 40.58 41.06

(40)

28 Chapter 5. Results

Figure 5.2: Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image.

Figure 5.3: Reconstructed image by different image division methods. (a) An occluded image. (b) Reconstructed image by level 1 image division. (c) Reconstructed image by level 2 image division. (d) Reconstructed image by level 3a image division. (e) Reconstructed image by level 3b image division. (f) Non-occluded image

Table 5.4 contains the processing time for 4 image division methods that are applied on 3 types of occlusions. The results show that the level 1 image division takes the least processing time whereas the level 3b method takes the most processing time. This shows that the processing time depends on image division. When the size of each image part is large, it takes less processing time and when the size of each image part is small, it takes more processing time.

The Figure 5.2 represents a single image reconstructed using different image division methods. The image (a) represents an occluded image, image (b) shows that the quality of the reconstructed image is good except some circles around the eyes but these circles are not very prominent. The images (c) shows that the quality of the reconstructed image is not good as we can notice prominent circles around the eyes. The white background area is also not reconstructed well. The image (d) and (e) represent that the images are reconstructed with good quality with some circles around the eyes but the circles are not prominent and the image (f ) represents the non-occluded image. The visual evaluation and average PSNR values of the reconstructed images show that the level 3a image division generates the images with highest quality as compared to all other image division methods.

5.3 Reconstruction results using pre-defined eigenspaces

Six pre-defined eigenspaces were constructed using six sunglasses occlusion masks where the

vector was created by level 3a image division. The occlusion of the mask input image is

detected and based on the detected occlusion, the closest eigenspace is selected for recon-

struction process. The average PSNR of the reconstructed faces is calculated to determine

the quality of the reconstructed faces. The processing time is recorded to determine the ef-

ficiency of the pre-defined eigenspace. Many experiments were performed and the deducted

results showed a remarkable decrease in processing time with negligible quality loss of the

(41)

5.3. Reconstruction results using pre-defined eigenspaces 29

reconstructed faces.

The processing time for pre-defined eigenspace was 6.2 seconds compared to 40.3 seconds

for run time eigenspace. The average PSNR of the reconstructed faces using pre-defined

eigenspace was 23.0 dB and 23.1 dB for level 3a. Thus, time for the pre-defined eigenspace

is much less than the time taken by the level 3a image division method. There is a mini-

mal decrease in quality of the reconstructed faces that were reconstructed from pre-defined

eigenspaces but the run time creation of the eigenspaces takes a lot of processing time as

compared to the pre-defined eigenspaces.

(42)

30 Chapter 5. Results

(43)

Chapter 6

Conclusions

6.1 Discussion about granularity effect and reconstruc- tion quality

The occlusion is detected by 4 image division methods, level 1, level 2, level 3a and level 3b.

In the level 1 image division method, the image is divided into 6 large parts, the occlusion is detected for each part. As the size of each part is large, small occlusions are not detected.

It can mark large non-occluded area as an occlusion and also some occluded areas as non- occluded areas. This method generated the best reconstruction results in case of sunglasses scenario because detected occlusion exactly matches the 2 boxes. Another advantage of using this method is that it takes less processing time compared to all other image division methods.

The level 3a image division method can detect very small occlusions as the size of each part is very small. The main advantage of this method is that it can mark almost all occlusions. This method also marks hair around the head area and eyebrows as an oc- clusion. Many experiments were performed, facial occlusion was detected, the faces were reconstructed and average PSNR values were calculated that showed that the level 3a is the best occlusion detection method. Moreover, it generated best reconstruction results in case of scarf and sunglasses & cap occlusion.

The level 2 and level 3b reconstructed the images with the worst results.

The pixels’ positions in the non-occluded images and in the occluded images is not the same, so the quality of the reconstructed images is not as good as it should be. The quality can be enhanced by marking the occluded area on the same non-occluded images and then use these images for the reconstruction process. By doing this, the average PSNR value increases 4 to 5 dB.

6.2 Discussion about pre-defined eigenspaces

The experiments were performed and level 3a was selected as the best method that yielded the good reconstruction results but it takes more processing time. The limitation of more processing time can be overcome by defining the pre-defined eigenspaces. For the exper- iments, 6 eigenspaces based on the sunglasses occlusions were defined and saved on some media storage devices to be used later. Experiments were performed and use of the pre- defined eigenspaces showed a remarkable decrease in processing time with less amount of

31

(44)

32 Chapter 6. Conclusions

decrease in the quality of the reconstructed faces. This small quality loss should be accept- able where time is critical/important.

6.3 Limitations

This occlusion method is based on the skin color detection so it can not detect the face occluded by objects of skin color. It works well for the Caucasians and Asian people but can not detect the occlusion of black colored people because black color is marked as an occlusion. The hair covering the head, eyebrows and beard are also not detected as face part when mask input image is divided into very small parts. The images are not registered properly. If the images are registered correctly, i.e. all the facial points like eyes, nose, lips are at the same position in all the images then the quality of the reconstructed faces can be enhanced.

6.4 Future work

The algorithm of the occlusion detection can be more generalized so that it can detect the

objects of skin color and also people having black color. More images can be added to

the database so that the extensive study of the aPCA can be made. Furthermore, if the

database base is big, then it can be divided into different groups based on gender, ethnicity

etc.

(45)

Chapter 7

Acknowledgements

By the blessings of Almighty Allah and the prayers of my parents, I have accomplished this work. First of all, I want to thank my external supervisor Dr. Ulrik S¨ oderstr¨ om who is ever ready to support and give his time to the students. I would also like to thank my internal supervisor Dr. Niclas B¨ orlin for arranging proper meetings and providing guidance in writing the thesis report due to which I was able to finish my thesis in time. I am also grateful to my parents, family and friends, especially A. Mushtaq, for their moral support.

33

(46)

34 Chapter 7. Acknowledgements

(47)

References

Cheddad, A., Condell, J., Curran, K., and Kevitt, P. M. (2009). A skin tone detection algorithm for an adaptive approach to steganography. Signal Processing, 89(12).

Jabbar, D. E. K. and Hadi, W. J. (2010). Face occlusion detection and recovery uisng fuzzy C-Means. Engineering and Technology Journal, 28.

Jollifie, I. T. (2002). Principal Component Analysis. Springer Series in Statistics. Springer- Verlag, 2nd edition.

Kim, G., Suhr, J. K., Jung, H. G., and Kim, J. (2010). Face occlusion detection by using B-Spline active contour and skin color information. In Proceedings of 11th Int. Conf.

Control, Automation, Robotics and Vision, pages 189–190, Singapore.

Kim, K. (1996). Face recognition using principle component analysis. In In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–591.

Kim, T. Y., Lee, K. M., and Lee, S. U. (2007). Occlusion invariant face recognition using Two-Dimensional PCA. In International Conferences VISAPP and GRAPP, volume 4 of Communications in Computer and Information Science. Springer-Verlag, Berlin.

M.Al-Naser and S¨ oderstr¨ om, U. (2011). Reconstruction of occluded facial images using asymmetrical principal component analysis. In Proceedings of Intl. Conference on Sys- tems, Signals, and Image Processing (IWSSIP), pages 257–260.

Martinzer, A. M. and Benavente, R. (1998). The AR face database. CVC Technical Re- port 24.

Min, R., Hadid, A., and Dugelay, J.-L. (2011). Improving the recognition of faces occluded by facial accessories. In Proceedings of 9th IEEE Conference on Automatic Face and Gesture Recognition, Santa Barbara, CA, USA.

Oh, H. J., Lee, K. M., and Lee, S. U. (2006). Occlusion invariant face recognition using selective non-negative matrix factorization basis images. In Computer Vision - ACCV 2006, volume 3851 of Lecture Notes in Computer Science, pages 120–129. Springer-Verlag, Berlin.

Santoso, A. J., Nugroho, D. L. E., Suparta, D. G. B., and Hidayat, D. R. (2011). Compres- sion ratio and peak signal to noise ratio in grayscale image compression using wavelet.

International Journal of Computer Sceince and Technology, 2(2).

S¨ oderstr¨ om, U. and Li, H. (2011). Asymmetric principal component analysis theory and its applications to facial video coding. In Effective Video Coding for Multimedia Applications, CS in, page 16. InTech, Umea.

35

(48)

36 REFERENCES

Strang, G. (2003). Introduction To LINEAR ALGEBRA. Wellesley-Cambridge Press, third edition.

Turk, M. and Pentland, A. (1991). Face recognition using eigenfaces. In In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pages 586–691, Hawaii.

Wikipedia (2012). http://en.wikipedia.org/wiki/Peak signal-to-noise ratio. Last accessed on 2012-10-16.

Zhao, W., Chellappa, R., Phillips, P. J., and Rosenfeld, A. (2003). Face recognition: A

literature survey. ACM Computing Surveys, 35(4):399–458.

Automatic Facial Occlusion Detection and Removal

Automatic Facial Occlusion Detection and Removal

Naeem Ashfaq Chaudhry

November 11, 2012

Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Niclas B¨ orlin

Examiner: Frank Drewes

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

Abstract

In our daily life, we are faced with many occluded faces. The occlusion may be from different

objects like sunglasses, mufflers, masks, scarves etc. Sometimes, this occlusion is used by

the criminal persons to hide their identity from the surroundings. In this thesis, a technique

is used to detect the facial occlusion automatically. After detecting the occluded areas, a

method for image reconstruction called aPCA (asymmetrical Principal Component Analysis)

is used to reconstruct the faces. The entire face is reconstructed using the non occluded

area of the face. A database of images of different persons is organized which is used in the

process of reconstruction of the occluded images. Experiments were performed to examine

the effect of the granularity of the occlusion on the aPCA reconstruction process. The

input mask image is divided into different parts, the occlusion for each part is marked and

aPCA is applied to reconstruct the faces. This process of image reconstruction takes a lot

of processing time so pre-defined eigenspaces are introduced that takes very less processing

time with very less quality loss of the reconstructed faces.

ii

Contents

1 Introduction 1

1.1 Background . . . . 1

1.2 Goals of the thesis . . . . 1

1.3 Related work . . . . 2

1.3.1 Occluded face reconstruction . . . . 2

1.3.2 Facial occlusion detection . . . . 2

2 Theory 5 2.1 Principal Component Analysis (PCA) . . . . 5

2.1.1 PCA method/model . . . . 5

2.1.2 PCA for images . . . . 7

2.1.3 Eigen faces . . . . 7

2.2 Asymmetrical PCA (aPCA) . . . . 8

2.2.1 Description of aPCA . . . . 8

2.2.2 aPCA calculation . . . . 8

2.2.3 aPCA for reconstruction of occluded facial region . . . . 9

2.3 Skin color detection . . . . 10

2.4 Image registration . . . 10

2.4.1 Translation . . . . 10

2.4.2 Rotation . . . . 11

2.4.3 Scaling . . . . 11

2.4.4 Affine transformation . . . . 11

2.5 Peak signal-to-noise ratio (PSNR) . . . . 12

3 Method 13 3.1 The AR face database . . . 13

3.2 Automatic occlusion detection . . . . 13

3.2.1 Replace white color with black color . . . . 13

3.2.2 Image cropping . . . . 13

3.2.3 Image division . . . . 13

3.2.4 Occlusion detection for each block . . . . 14

iii

iv CONTENTS

3.3 Occluded face reconstruction . . . 14

3.3.1 PSNR calculation . . . . 15

4 Experiment 17 4.1 Granularity effect . . . 17

4.1.1 Metric . . . . 17

4.1.2 Sunglasses scenario . . . 17

4.1.3 Scarf scenario . . . 20

4.1.4 Cap and sunglasses occlusion . . . 24

4.2 Pre-defined eigenspaces . . . . 24

4.2.1 Metric . . . . 32

4.2.2 Experiment description . . . . 32

5 Results 35 5.1 Occlusion detection results . . . 35

5.2 Reconstruction quality results . . . 36

5.3 Reconstruction results using pre-defined eigenspaces . . . 40

6 Conclusions 41 6.1 Discussion about granularity effect and reconstruction quality . . . . 41

6.2 Discussion about pre-defined eigenspaces . . . . 41

6.3 Limitations . . . . 42

6.4 Future work . . . . 42

7 Acknowledgements 43

References 45

List of Figures

1.1 Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion. . . 2 2.1 The first vector Z 1 is in direction of maximum variance and second vector Z 2

is in direction of residual maximum variance. . . . 6 2.2 Eigenfaces (a) First eigenface. (b) Second eigenface. (c) Third eigenface. . . . 8 2.3 The blue part represents the eigenspace of non-occluded regions whereas the

green part represents the pseudo eigenspace of the complete image. . . . 9 2.4 (a) and (b) represent the original images while (c) and (d) represent the

registered images. . . 11 3.1 (a) an occluded facial image. (b) Image division into 6 parts. (c) Image

division into 54 smaller parts (d) Image division into 486 parts. . . 14 3.2 (a) an occluded facial image. (b) Image division into blocks. (c) Each black

block represents an occluded block. . . . 15 4.1 (a) Non-occluded facial image. (b) An occluded image. (c) Eigenspaces. . . . 18 4.2 (a) An occluded image. (b) Level 1 image division. (c) Detected occlusions. . 19 4.3 An example of the reconstructed face by level 1 image division. (a) An

1.1 Different types of occlusion. (a) Sunglasses occlusion. (b) Mask occlusion. . . 2 2.1 The first vector Z ₁ is in direction of maximum variance and second vector Z ₂

x = [p 1 , p 2 , . . . , p r ] ^T , (2.1) where p i is the ith row of p and r is total number of rows. Each image is stored in a vector and each vector is stored in a matrix column wise.

I ₀ = 1 N