Context-Based Algorithm for Face Detection

Full text

(1)Examensarbete LITH-ITN-MT-EX--05/052--SE. Context-based algorithm for face detection Helene Wall 2005-09-07. Department of Science and Technology Linköpings Universitet SE-601 74 Norrköping, Sweden. Institutionen för teknik och naturvetenskap Linköpings Universitet 601 74 Norrköping.

(2) LITH-ITN-MT-EX--05/052--SE. Context-based algorithm for face detection Examensarbete utfört i medieteknik vid Linköpings Tekniska Högskola, Campus Norrköping. Helene Wall Handledare Nikola Kasabov Examinator Ivan Rankin Norrköping 2005-09-07.

(3) Datum Date. Avdelning, Institution Division, Department Institutionen för teknik och naturvetenskap. 2005-09-07. Department of Science and Technology. Språk Language. Rapporttyp Report category. Svenska/Swedish x Engelska/English. Examensarbete B-uppsats C-uppsats x D-uppsats. ISBN _____________________________________________________ ISRN LITH-ITN-MT-EX--05/052--SE _________________________________________________________________ Serietitel och serienummer ISSN Title of series, numbering ___________________________________. _ ________________ _ ________________. URL för elektronisk version. Titel Title. Context-based algorithm for face detection. Författare Author. Helene Wall. Sammanfattning Abstract Face. detection has been a research area for more than ten years. It is a complex problem due to the high variability in faces and amongst faces; therefore it is not possible to extract a general pattern to be used for detection. This is what makes the face detection problem a challenge. This thesis gives the reader a background to the face detection problem, where the two main approaches of the problem are described. A face detection algorithm is implemented using a context-based method in combination with an evolving neural network. The algorithm consists of two majors steps: detect possible face areas and within these areas detect faces. This method makes it possible to reduce the search space. The performance of the algorithm is evaluated and analysed. There are several parameters that affect the performance; the feature extraction method, the classifier and the images used. This work resulted in a face detection algorithm and the performance of the algorithm is evaluated and analysed. The analysis of the problems that occurred has provided a deeper understanding for the complexity of the face detection problem.. Nyckelord Keyword. face detection, image processing, neural networks, feature extraction.

(4) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Helene Wall.

(5) Abstract Face detection has been a research area for more than ten years. It is a complex problem due to the high variability in faces and amongst faces; therefore it is not possible to extract a general pattern to be used for detection. This is what makes the face detection problem a challenge. This thesis gives the reader a background to the face detection problem, where the two main approaches of the problem are described. A face detection algorithm is implemented using a context-based method in combination with an evolving neural network. The algorithm consists of two majors steps: detect possible face areas and within these areas detect faces. This method makes it possible to reduce the search space. The performance of the algorithm is evaluated and analysed. There are several parameters that affect the performance; the feature extraction method, the classifier and the images used. This work resulted in a face detection algorithm and the performance of the algorithm is evaluated and analysed. The analysis of the problems that occurred has provided a deeper understanding for the complexity of the face detection problem.. i.

(6) Preface This thesis is the final part of a Master of Science in Media Technology and Engineering at the Department of Science and Technology at Linköping University, Sweden. The project was carried out under the supervision of Prof. Nikola Kasabov at the Knowledge Engineering Discovery and Research Institute (KEDRI) at Auckland University of Technology, New Zealand. A number of people have helped me in various ways making this project possible. First of all I would like to thank Prof. Stanley Miklavcic for establishing the contact with Prof. Nikola Kasabov and KEDRI. I also want to express my gratitude to Prof. Nikola Kasabov for inviting me to KEDRI, and for his guidance and support during my time there. I would like to thank everybody at KEDRI especially my second supervisor David Zhang for all help and support during my project. Also a special thanks to Simei Wysoski for your valuable ideas and support in the final stage of this project. Thanks to Charlotte Perhammar at Linköping University for giving me access to the images used in this project. Many thanks to my examiner Ivan Rankin for valuable feed back and support. I finally would like to thank Anders for his support and everlasting patience during this project.. Auckland April 2005 Helene Wall. ii.

(7) Contents. 1 Introduction 1.1 Motivation . 1.2 Background . 1.3 Objective . . 1.4 Methodology 1.5 Target group 1.6 Outline . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 1 1 1 2 3 4 4. 2 Overview 2.1 Previous work . . . . . . . . . . 2.1.1 Feature-based approach 2.1.2 Image-based approach . 2.2 Context-based approach . . . . 2.2.1 Gaussian derivatives . . 2.2.2 Wavelets . . . . . . . . . 2.3 Plan of Implementation . . . . 2.3.1 Sub-tasks . . . . . . . . 2.3.2 Training and testing . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 6 6 6 8 9 9 10 10 10 11. 3 Implementation 3.1 Algorithm description . . . . . . 3.2 Extracting facial contexts . . . . 3.3 Classifier of context . . . . . . . 3.3.1 Training context classifier 3.3.2 Testing context classifier . 3.4 Extracting faces . . . . . . . . . . 3.5 Classifier of face . . . . . . . . . 3.5.1 Training face classifier . . 3.5.2 Testing face classifier . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 12 12 13 14 15 15 16 18 18 18. 4 Problem analysis 4.1 Context classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Face classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Suggestions for improvement . . . . . . . . . . . . . . . . . . . . . . . .. 21 21 23 23. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. iii.

(8) Contents 5 Results and conclusions 5.1 Spatial context and evolving 5.2 Face detection performance 5.3 Conclusions . . . . . . . . . 5.4 Future work . . . . . . . . .. neural . . . . . . . . . . . .. network . . . . . . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 25 25 25 26 29. References. 30. Glossary. 32. iv.

(9) List of Tables. 3.1 3.2 3.3. Confusion table for context classifier . . . . . . . . . . . . . . . . . . . . 16 Confusion table for face classifier . . . . . . . . . . . . . . . . . . . . . . 19 Confusion table for face classifier using the 64 most important features . 19. v.

(10) List of Figures. 1.1. Project timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.1. Face detection approaches . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 3.1 3.2 3.3 3.4. Flowchart face detection Sliding window . . . . . Principle of EGCC . . . Wavelet decomposition .. 4.1 4.2. Data distribution of two classes . . . . . . . . . . . . . . . . . . . . . . . 21 Positive and negative context windows . . . . . . . . . . . . . . . . . . . 22. 5.1 5.2 5.3 5.4 5.5. Face detected and two miss-detections Miss-detections . . . . . . . . . . . . . Face detected and one miss-detection . Face detected wrongly . . . . . . . . . Face detected . . . . . . . . . . . . . .. algorithm . . . . . . . . . . . . . . . . . .. vi. . . . .. . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. . . . .. . . . . .. 12 14 15 17. 26 27 27 28 28.

(11) 1 Introduction This chapter contains an introduction to face detection and the thesis in general. The motivation, background and objective are presented. The methodology is described and the outline gives the reader a guide to each of the following chapters.. 1.1 Motivation Face detection is the necessary first step in creating an automatic face recognition system, the aim is to localize and extract face areas from the background. One definition of face detection of Yang et al. [16] is: ”Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face.” In the development of face recognition systems in many cases it has been assumed that there is a face in the image to be searched. If face detection is included in face recognition systems then the systems can be used in cases where it cannot be assumed there is a face in the image. Papageorgiou and Poggio [11] have given a good description of face detection explaining why it will be useful to combine with face recognition: ”A face detection system knows how to differentiate faces from ”everything else”, while a face recognition system knows the difference between my face and other faces.” There are many areas where face detection can be very useful. It can for example be used in surveillance systems, intelligent human computer interaction and video conferences to mention a few areas. When developing a face detection algorithm there are several aspects of the problem to be considered. What application is it going to be used in? What result is expected? What assumptions can be made and why? So far no face detection algorithm has been developed that is general enough to be used for a random application. The face detection algorithm has to be tailor-made for a specific case; this is due to the variability in faces and the environment where they occur. Faces can for example be viewed at different angles, have different facial expressions and some parts of the face may be occluded. Other factors that have to be considered are for instance background, light and movements. All these factors depend on the area of application for the algorithm.. 1.2 Background Detecting a face is an easy task for a human, but still a very difficult task for a computer. Why is it difficult for a computer? The main reason is the high variability. 1.

(12) 1.3 Objective in faces and amongst faces. There is no obvious pattern that can be extracted and used to detect the faces. KEDRI has developed a platform for video, image and speech processing, ECoVIS. A face detection algorithm can contribute to this platform, particularly in the face recognition module. In the face recognition module the constraint that there is a face in the image used for recognition can be removed. Face detection can be applied first to establish if there is a face in the image. If a face is detected, then the face recognition starts. Face detection is an interesting problem for several reasons: • After more than ten years of research the problem is not solved. Perhaps the researchers have been missing something important when trying to get computers to perform such a complex task. The capacity and complexity of the human brain might be under estimated when expecting a computer to manage to perform the same task, face detection. • The applications it can be used in are for example, intelligent human computer interaction and surveillance systems. If a surveillance system independently can detect faces and recognize whether the person in question is allowed in that area or not. That means that no one needs to watch the surveillance videos. • Which approach to the problem to use, many different approaches are suggested to solve the problem. Perhaps the solution will be found when combining different approaches. This summarizes some of the reasons why I chose the topic of face detection for my M.Sc. project. Many researchers have attempted to solve this problem and algorithms have been developed that perform well for the intended application. Unfortunately it will not be possible to use that algorithm for another application, due to the assumptions and limitations that need to be made when implementing the algorithm. The face detection problem is very complex and it takes a long time to understand all the limits set to the problem depending on the chosen approach. This will be further explained in chapter 2. In this project an approach that make use of the color information will be used in combination with an evolving neural network as a classifier. The main focus will be on evaluating the performance of the classifier.. 1.3 Objective The aim of this M.Sc. project is to explore the possibilities of using an evolving neural network in a face detection algorithm. The performance of an evolving neural network in a face detection algorithm is evaluated to find out whether it increases the performance of the algorithm or not. It would be beneficial to have a classifier that evolves and adapts to changed conditions. This would make it possible to develop a face detection algorithm that has a more general area of application.. 2.

(13) 1.4 Methodology The face detection algorithm has been developed using an existing feature extraction method and an evolving classifier from KEDRI. The algorithm developed is based on the work done by Bergboer et al. in [1]. The chosen approach for this task includes using the spatial context of the face and in this case an evolving neural network as a classifier. Previously an SVM has been used. A successful implementation of evolving neural network for face detection can be integrated in a system for face recognition. In this system it would perform the first task, i.e. detecting the face. However the integration is not the aim of this project. The focus of attention is the face detection algorithm.. 1.4 Methodology This project is based on a literature study where research results published as journal papers, conference papers and books were reviewed. There are many different approaches to the face detection problem according to the literature. The approach chosen for this project is mainly based on the work done by Bergboer et al. [1]. This approach is combined with an evolving neural network. After the literature review a project proposal was written, outlining the project. Further studies were done in the areas of Gaussian derivative and Wavelets to get a better understanding of how to implement the approach. The literature review was presented as a poster at the NCEI ’04 conference organized by KEDRI in December 2004. The poster was based on the work presented in the project proposal. The preparation for the implementation followed the literature study. Images were collected to use for training and testing of the algorithm. The images used were collected from a database at Link¨ oping University and from University College Dublin Colour Face Image Database [15]. The implementation can be divided into three steps. The first step includes implementing the part of the algorithm that extracts possible face areas, using the spatial context of the face. This step includes preprocessing and preparation of images to be used for training and testing of the classifier. The second step is to implement the part of the algorithm that extracts the actual face out of the possible face areas extracted by the first classifier. The third step is to evaluate and analyse the output from the two classifiers. The final step of the project includes evaluating the performance of the face detection algorithm and analyse whether it is suitable or not to use an evolving neural network as a classifier, and to write the thesis. The time frame, see figure 1.1, for accomplishing this task was from mid-September 2004 until the end of April 2005.. 3.

(14) 1.5 Target group. Literature studies. Sep. Oct. Training, testing and evaluation of algorithm.. Gaussian derivatives and Haar wavelets.. Nov. Project proposal, further literature studies and preparation for conference.. Dec. Jan. Feb. Implementing the algorithm.. Mar. Apr. Writing thesis and completing the project.. Figure 1.1: Project timeline.. 1.5 Target group This report is intended for an audience with a technical background and a basic knowledge of image processing and neural networks. The reader is not presumed to be familiar with specific methods presented in this thesis.. 1.6 Outline This section contains an outline of the report and gives the reader an introduction to each chapter. • Chapter 1 contains an introduction to face detection and the thesis in general. The motivation, background and objective are presented and the methodology is described. • Chapter 2 introduces previous work that has been done in this research area. This is to provide a background and give an understanding of the complexity of the face detection problem. • Chapter 3 describes the process of implementing the face detection algorithm. It starts with a brief description of what the algorithm is expected to perform and proceed into more detail of the implementation.. 4.

(15) 1.6 Outline • Chapter 4 discusses some of the difficulties that appeared when using an evolving neural network in a face detection task. Suggestions of how to improve the performance of the face detection algorithm are also presented. • Chapter 5 presents the results and conclusions of the project and ideas for future work are suggested. • Glossary provides a short explanation of terms and abbreviations used in the thesis.. 5.

(16) 2 Overview This chapter introduces some previous work that has been done in this research area. This is to provide a background and give an understanding of the complexity of the face detection problem.. 2.1 Previous work During the last ten years researchers have been giving more attention to the face detection problem. This has resulted in a vast number of new methods. Earlier the research focus was mostly on face recognition and many of the algorithms assumed that the face was already segmented from the image [9]. To create an automatic face recognition system the first problem to solve is face detection. Face detection is a difficult problem, which can be split into three main categories [17]: 1. View dependence: the image of the face will vary with the viewing direction. 2. Nonrigidity: from the same viewpoint, different facial expressions will result in different images. 3. Lighting: with the same viewpoint and the same facial expression, the image can be different due to diverse lighting environments. Several algorithms have been developed to solve the face detection problem, but most of them only manage to consider one or two of the three main categories of the problem. Each algorithm is usually a combination of methods. These methods can be divided into two different approaches: feature-based and image-based according to [6] as shown in figure 2.1.. 2.1.1 Feature-based approach The feature-based approach usees the fact that information about the structure of the face is known. These properties are, for example, skin color and the geometry of the face. In the feature-based approach the focus for this work will be on methods using color, edges and feature searching. Color and edges are relatively easy to extract from the rest of the image. Skin color gives rise to a small cluster in the color space that can be used to extract face areas from the image. Edges can be extracted by filtering the image with an edge-enhancing filter, such as the Sobel Operator. There. 6.

(17) 2.1 Previous work. Edges Gray-levels Low-level analysis. Color Motion Generalized measure. Feature-based approaches. Feature analysis. Face detection. Feature searching Constellation analysis Snakes. Active shape models. Deformable templates Point distribution models. Linear subspace methods Image-based approaches. Neural networks Statistical approaches. Figure 2.1: Face detection divided into approaches. are also methods that use first- and second-order derivatives to extract edges. Feature searching can be done sequentially, i.e. the strategy is based on the relative position of facial features for example eyes, nose and mouth. The geometry of the face is used in feature searching; if the eyes, for example, are found then the mouth is expected to be found within a certain distance and at a certain angle. Face detection algorithms using color information [1] [8] [9] [19] manage the viewpoint and nonrigidity problems much better, but are more sensitive to changes in illumination. The algorithms based on color can be more or less sensitive to illumination changes depending on which color space is used. The most frequently used color spaces are RGB, normalized RGB, YCbCr and HSI. The color information is mainly used to reduce the search space within which the face is expected to be found. The search space is defined as the area needed to be searched to find the faces, i.e. the image. When the search space is reduced only smaller parts of the original search space will be scanned to find the faces. In the YCbCr color space the Y component contains the illumination properties.. 7.

(18) 2.1 Previous work When this color space is used for skin color detection, the Y component is excluded to eliminate the effect of illumination changes as far as possible. Therefore in [8], [9] the CbCr plane of the YCbCr color space is used for detecting the skin color region. When the possible face areas are detected using color information from the CbCr plane, different methods are applied to these areas to decide whether they contain a face or not. In [8] connected components, non-linear filters, are used to make the decision. They eliminate parts of the image and preserve the contours of the remaining parts. To the remaining parts, shape- and geometry-based connected components are applied. The remaining parts that have not got the shape and geometry of a face are eliminated and finally there are only connected components containing faces left. An elliptic shape is applied to the skin tone areas in [9] to further make sure the area is a face. This is fed into a neural network that makes the final decision. In [19] a skin-color model in the YUV color space is used to provide potential face areas. To detect the approximate angle of rotation of the face a neural network is used. The iris location is used to give an accurate rotation angle. When the rotation angle has been determined, a template matching method is used to search for facial features; this method also eliminates non-face regions. Detection of faces is achieved by measuring the distance between the input window and its projection onto the eigenspace. The eigenspace is the subspace formed by the set of all eigenvectors corresponding to eigenvalues. Instead of having a coordinate system with x and y axes a new coordinate system can be created using the eigenvectors. The eigenvectors corresponding to the highest eigenvalues best represent how the data is spread in the space. Projection onto the eigenspace means that the data is transformed and expressed in terms of the patterns between them. In [1] the spatial context of the face is taken into account by using the color information. As a result of this, only areas with a color range where it is likely to find a face is searched, for example green areas are not considered. In this method the image is transferred from RGB to three color receptive fields R1 , R2 and R3 , where R1 is the intensity field, R2 the yellow-blue opponent field and R3 the red-green opponent field. The color receptive fields are used to calculate the color differential structure for G and W; yellow-blue transitions and red-green transitions in the image. G and W are combined together to a feature vector, used for training the system on the context. The classifier label the areas as either positive or negative context. Positive context refers to areas with a color range similar to the color range of skin. Negative context refers to areas containing all other color ranges but skin color. This is explained more thoroughly in section 3.2. A method developed by [11] is used for detecting faces in the context areas, and for classification of context and face a support vector machine is used.. 2.1.2 Image-based approach In the image-based approach face detection is treated as a pattern recognition problem. The system is taught to recognize a face pattern and therefore the use of previous knowledge of the face structure is avoided. This approach to the face detection problem can cope better with the unpredictability of face appearance and environmental. 8.

(19) 2.2 Context-based approach conditions. The neural network algorithms in the image-based approach are most useful for this work. In [13] a neural network based algorithm is developed for detecting faces in an upright frontal position or with little change in viewpoint. This algorithm only manages to handle two of the three categories of the face detection problem, see section 2.1. Due to the training set, which only consists of images of faces in a frontal position and with no parts excluded, the use of this method is limited. Rowley et al. have developed another neural network algorithm that can detect faces at different angles [14]; this algorithm is trained on images of faces at different angles. The use of this algorithm is much less limited but it still cannot completely manage the three categories of the face detection problem. In both algorithms the background has been very simple; this makes it easier to distinguish the face from the background than if the background is complex.. 2.2 Context-based approach The approach chosen for this project is the context-based approach that is related to the image-based approach of the face detection problem. No previous knowledge about the face and its geometry is used. The approach is also related to the feature-based approach of the face detection problem since the knowledge of where it color wise is likely to find a face is used. This approach is related to the two approaches mentioned in the previous sections 2.1.1, 2.1.2 but the relation is closest to the image-based approach. This project uses the spatial context of faces and an evolving neural network as a classifier. This approach can roughly be divided into three main steps. Firstly color information is used to reduce the search space by using the spatial context of the face. Secondly for detection of the face an overcomplete representation of Wavelets is used. Thirdly an evolving neural network [7] is used for classification. The aim is to evaluate the performance of this evolving neural network in a face detection task. A method developed by Bergboer et al. [1] is adopted for the first two steps and for the third step an evolving neural network method developed at KEDRI by Zhang and Kasabov [18] is used. This is described more thoroughly in chapter 3. The approach adopted from [1] involves certain assumptions: the face has to be in a frontal position, no parts of the face occluded, and limited changes of the illumination. How much variation in illumination the algorithm can manage is a subject to be explored through experiments. Whether the system is able to handle faces at a slight rotation angle is also to be shown in the experiments.. 2.2.1 Gaussian derivatives The derivative of an image is calculated, for example, to enhance edges. Taking the derivative of an image can be replaced by a convolution with the derivative of the Gaussian kernel [4]. This is called the Gaussian derivative. The Gaussian kernel has the shape of a Gaussian distribution. Therefore the Gaussian derivative processes the. 9.

(20) 2.3 Plan of Implementation image based on local properties, with no consideration of global properties. This is well suited for image applications since it is closely related to the early response of the human visual system [3].. 2.2.2 Wavelets The wavelet transform has similarities to the Fourier transform, i.e. it gives a description of the frequency content in the image. But unlike the Fourier transform, whose basis functions are sinusoids, the wavelet transforms are based on small waves, Wavelets, of varying frequency and duration. This makes it possible for the wavelet transform to reveal not only the frequency information but also time information about when they occur. Wavelets form the foundation of a relatively new approach to signal processing and analysis called multiresolution theory presented in 1987 by [10]. The multiresolution theory makes it possible to examine the image on different scales, for example if there are both small and large objects or, if there is low and high resolution present in the same image, then it could be an advantage to study them at several resolutions.. 2.3 Plan of Implementation 2.3.1 Sub-tasks When this project was about to be implemented the actual task had to be divided into sub-tasks. The sub-tasks are the following: – Reduce the search space by using the spatial context of the face. – Detect the face using a overcomplete representation of Wavelets. – Use an evolving neural network for classification. The implementation of the first sub-task requires particular attention since the performance of the whole algorithm depends on the first sub-task´s performance. If the search space is not reduced enough then the computation of the following tasks is more complex and the expected result might not be achieved. On the other hand if the search space is reduced too much, then possible face areas are excluded. As a result of this faces are not detected. When the first two sub-tasks are solved, the training of the evolving neural network starts. Two evolving neural network, also called classifier, will be used one for the first sub-task and one for the second sub-task. In the first case the features extracted are fed into classifier and the decision is made whether it is a face candidate area. In the second case the features extracted from the face candidate areas are fed into the evolving neural network and then the decision on whether it is a face or not is made. The classification result will be studied and analyzed to evaluate the performance of the face detection algorithm.. 10.

(21) 2.3 Plan of Implementation. 2.3.2 Training and testing A large set of images is needed for training the evolving neural network. There are databases of training images available on the Internet. Several of the databases that have been previously used for training of face detection algorithms are in grey scale. In this project color images are needed. It is also important to find images that agree with the assumptions that had to be made while developing the algorithm. The database should contain color images containing one or many faces taken under different conditions. In this work images from two different databases are used: the University College Dublin Colour Face Image Database [15] and images collected from a database at Linköping University. At UCD a database was developed to be used for evaluation of face detection algorithms [15]. Therefore the UCD Colour Face Image Database is good for evaluating this particular algorithm. Due to the assumptions made while developing the algorithm it will not manage to detect all faces in the UCD database. The test will show if the algorithm using an evolving neural network as a classifier actually manages to fulfill what is expected.. 11.

(22) 3 Implementation This chapter describes the process of implementing the face detection algorithm. First a brief description is presented to give the reader an idea of what task the algorithm is expected to perform.. 3.1 Algorithm description An RGB image is read into the system and converted into three color receptive fields R1 , R2 and R3 . Then the image is searched using a sliding window. The size of the window is set to 40 x 40 pixels and it slides along the image moving one pixel at a time. For every window a feature vector is calculated, θc . This feature vector, θc , is used and fed into the context classifier. The task of the context classifier is to decide whether the feature vector, θc , contains positive or negative context. All 40 x 40 windows that are classified as positive context are searched once more, now using a 19 x 19 sliding window looking for faces. This is done using an overcomplete representation of Wavelets; the information extracted is stored in a feature vector, θf . This feature vector, θf , is fed into the face classifier that makes the decision of whether there is a face or not. Extracting facial context |. Classifier of context. Extracting faces. Classifier of faces. {z. Figure 3.1: Flowchart of face detection algorithm.. 12. }.

(23) 3.2 Extracting facial contexts. 3.2 Extracting facial contexts The visual features used for extracting the facial context are color edges. It is convenient to extract the color edges from a color differential structure using the scale-space theory by Geusebroek et al. [5]. The input image is decomposed from RGB color space to three color receptive fields R1 , R2 and R3 . The transformation from RGB values to color receptive fields is adopted from [12]:     R1 0.002358 0.025174 0.010821 R  R2  =  0.011943 0.001715 −0.013994   G  R3 0.013743 −0.023965 0.00657 B . (3.1). The color receptive fields, R1 , R2 and R3 , are used to calculate the color differential structure G, see equation 3.2 for yellow-blue transitions and W, see equation 3.3 for red-green transitions. The partial derivatives with respect to x and y in the following expressions are Gaussian derivatives at scale σ = 8 according to [1]. The scale decides the width of the Gaussian kernel. This gives a scale that is not too large and not too small. Too large a scale would provide information that is too coarse to distinguish between context and non-context. Conversely, too small a scale would capture information that concerns intra-class variance instead of inter-class variance. In this method the normalization is omitted in G and W to prevent amplification of e and W. e artifacts [1]. The non-normalized quantities are denoted G 1 G= 2 R1. 2 2 1/2 ∂R2 ∂R2 ∂R1 ∂R1 − R1 − R1 R2 + R2 ∂x ∂x ∂y ∂y. 2 1 ∂R2 ∂R3 ∂R1 2 ∂R1 W= 3 2R2 − 2R1 R2 + R1 −R3 + R1 |R1 | ∂x ∂x ∂x ∂x 2 1/2 ∂R1 ∂R2 ∂R1 ∂R3 + 2R22 − 2R1 R2 + R1 −R3 + R1 ∂y ∂y ∂y ∂y. (3.2). (3.3). When the image is searched for facial context, a sliding window of size 40 x 40 pixels is used. The size of the window is adopted from [1], and is roughly four times the size of a face in the images used. All images used are preprocessed to comply with this. In this case the window only slides across the image once, see figure 3.2, this assumes that the face is smaller than the window extracting the context. The reason for not searching the image at different scales at this stage is to keep the search as simple as possible. The sliding window moves along the image one pixel at a time and for each window e and W e are calculated; this generates 1,600 features for each of them. For practical G. 13.

(24) 3.3 Classifier of context. Figure 3.2: The sliding window of size 40 x 40 pixels moving across the image. e and W e are down-sampled to 10 x 10 matrices G eD and W e D , and linearly reasons G normalized and vectorized to 1 x 100 vectors Gn and Wn . For each window a feature vector, θc , of size 1 x 200 is extracted. The feature vector, θc , consists of Gn and Wn concatenated to one vector [1].. 3.3 Classifier of context The feature vectors, θc , are used to train and test the classifier. For the classification of facial context the EGCC classifier, developed by Zhang and Kasabov is used [18]. This classification task has two classes: positive context and negative context. Positive context are areas with a color range that is similar to the color range of skin. Negative context are areas containing all other color ranges but skin color. EGCC uses the following principle for classification. The classification data is mapped into an n-dimensional space, where n is the number of features. For each sample, feature vector, a node is created in the space. Due to the variability amongst the samples belonging to each class different groups, clusters, of nodes are created. When all nodes are created in the space, the centers of the clusters are calculated. All cluster centers then grow gradually until the influence field reaches the allowed maximum or an influence field of another class, see figure 3.3.. 14.

(25) 3.3 Classifier of context. +. +. + (a). + (b). (c). Figure 3.3: EGCC growth of cluster centers in a simplified case with only two dimensions. In (a) the cluster centers are found in the two classes. In (b) the influence field grows gradually, and in (c) the influence field has reached its maximum forming two nodes.. 3.3.1 Training context classifier For training and testing of the context classifier in each image the samples of positive and negative contexts are marked by hand in the 108 different images. The dataset consists of 370 samples. A sample is in this case the feature vector, θc , consisting of 200 values. The samples have the following distribution: • Number of positive samples: 157 • Number of negative samples: 213 Supervised learning is used to train the classifier, i.e. it is known what class the samples belong to and the expected output is also known. Ten-fold cross validation is used for training and testing when evaluating the performance of the classifier. The data set of 370 samples is divided into ten continuous folds, each containing 37 samples. The training and testing is run ten times and each time nine folds are used for training and one for testing. For example, the first nine folds are used for training and the tenth fold for testing. When the classifier is used in the algorithm the labeled dataset is used for training and the whole images of unlabeled data are used for testing. Information about how the windows from the images are classified is stored. This is used in the next step of the algorithm, when these windows are searched further to decide whether there is a face in the image or not.. 3.3.2 Testing context classifier The average detection result of the classifier when ten-fold cross validation is used is 75.4 %. The test result is not useful, unless it is known how many of the samples belonging to class one that are classified as class one and vice versa for class two. The confusion table explains how the classification result is distributed. In this table context is positive context and non-context is negative context. See table 3.1.. 15.

(26) 3.4 Extracting faces Table 3.1: Confusion table for context classifier. Detected Context. Detected Non-Context. Actual Context. 78.8%. 21.2%. Actual Non-Context. 25.8%. 74.2%. The content of the confusion table needs to be analyzed. The percent of actual context classified as detected context and actual non-context classified as detected non-context should be maximized for the best detection result. This is not always possible and in this case the percent of actual context classified as detected noncontext and actual non-context classified as detected context needs to be evaluated. The number of actual contexts classified as detected non-context should be minimized in this application, since this is the percent of possible face areas not proceeding to the next step of the algorithm. The number of actual non-contexts classified as detected context is not as significant for the performance as the number of actual contexts classified as detected non-context. Some miss-detection is accepted in this case, since no faces are missed. It is considered better to have a search space marginally bigger than to miss too many possible face areas. The number of cluster centers also has to be considered when evaluating the performance of the classifier. In this case the number of cluster centers is almost the same as the number of nodes; this shows that no generalization has taken place. There are two main reasons why this happens: the maximum influence field might not be big enough, see section 3.3, or it depends on the distribution of the data. This will be more thoroughly discussed in chapter 4.. 3.4 Extracting faces The facial features are extracted using a grey scale wavelet method. This method is adopted from Papageorgiou and Poggio, as this method gives the most compact representation of the object class, faces, [11]. This method is applied to the 40 x 40 windows that are classified as positive context. The image windows are converted from RGB into grey-scale images. A new sliding window of size 19 x 19 is applied to the 40 x 40 image windows, sliding one pixel at a time. The features extracted are wavelet coefficients for the square-support Haar wavelets of size 2 x 2 and 4 x 4. This means that a multiresolution-based decomposition. 16.

(27) 3.4 Extracting faces is performed at the two scales: 2 and 4, see figure 3.4. For both scales the lowH V pass filtered data, Wϕ , is discarded and only the horizontal, Wψ , vertical, Wψ , and D diagonal, Wψ , detail coefficients are retained. This results in six components, three from each scale. The wavelet representation is overcomplete, resulting in 17 x 17 coefficients for a given scale and orientation. The overcomplete representation gives a denser set of basis functions that provide a richer model and finer spatial resolution. To get an overcomplete representation the distance between the wavelets is reduced, in this case quadruple density is used and the distance is 14 2n , where as the distance in the general case is 2n [11].. Wϕ. H,n=0 Wψ H,n=1 Wψ. V,n=0 D,n=0 Wψ Wψ H,n=2 Wψ. V,n=1 Wψ. D,n=1 Wψ. V,n=2 Wψ. D,n=2 Wψ. Figure 3.4: Wavelet decomposition up to scale four in the general case. Each of the six 17 x 17 components are linearly normalized and vectorized to 1 x 289. The feature vector, θf , consists of all the six components concatenated to one vector and is of size 1 x 1,734.. 17.

(28) 3.5 Classifier of face. 3.5 Classifier of face The feature vectors, θf , are used to train and test the classifier. The EGCC classifier, developed by Zhang and Kasabov [18], is also used for classification of the face. This classification task also has two classes: face and non-face. Face is the area including eyes, nose and mouth. Non-face is everything else that could appear in the image. The principle of EGCC is explained in section 3.3.. 3.5.1 Training face classifier For training and testing the face classifier, face and non-face samples are marked by hand in each of the 40 x 40 windows that were used when training the context classifier. The dataset consists of 370 samples also in this case. A sample is the feature vector, θf , consisting of 1,734 values. The samples are distributed like in the previous case. • Number of face samples: 157 • Number of non-face samples: 213 Supervised learning is used to train the classifier, i.e. it knows what class the samples belong to and the expected output is also known. Also here ten-fold cross validation is used for training and testing when evaluating the performance of the classifier. The data set is divided into ten continuous folds, each containing 37 samples. The training and testing are run ten times and each time nine folds are used for training and one for testing. For example: the first nine folds are used for training and the tenth fold for testing. When the classifier is used in the algorithm, the labeled data is used as training dataset and all the 40 x 40 windows classified as positive context are used for testing the classifier. In the windows previously classified as positive context a square marks all the areas classified as face. This is the final output of the face detection algorithm.. 3.5.2 Testing face classifier The average result of the face classifier when ten-fold cross validation is used is 83.0 %. The confusion table shows how the classification result is distributed. In this table face is the face area and non-face is the background. See table 3.2. The distribution of the classification result shows that only 50 % of the faces were detected. Therefore reducing the number of features in the feature vector was tried. To reduce the number of features Signal-to-Noise ratio is used in Siftware. The 1,734 features are ranked by SNR in order of importance. The 64 most important features of the 1,734 features were used when evaluating the performance of the classifier with ten-fold cross validation. The result is shown in table 3.3. When the 64 most important features are used the average accuracy is 83.2 %. Also here the content of the confusion table needs to be analyzed. The percent of actual face classified as detected face and actual non-face classified as detected non-face should be maximized for the best detection result. Is not always possible and in such cases focus. 18.

(29) 3.5 Classifier of face. Table 3.2: Confusion table for face classifier. Detected Face. Detected Non-Face. Actual Face. 51.4%. 48.6%. Actual Non-Face. 0.4%. 99.6%. Table 3.3: Confusion table for face classifier using the 64 most important features.. Detected Face. Detected Non-Face. Actual Face. 77.4%. 22.6%. Actual Non-Face. 12.2%. 87.8%. 19.

(30) 3.5 Classifier of face needs to be on the percent of actual face classified as detected non-face and actual nonface classified as detected face. These numbers need to be optimized to give the best result for the intended application. The percent of actual face detected as non-face is preferred to be lower since this percent affects the overall detection result. Confusion table 3.3 shows that the percent of non-face classified as detected face is low, a higher percent could be accepted here depending on the application. No optimization of the result from the confusion table is done in this application. When the number of cluster centers is considered, it shows that also in this case the number of cluster centers is almost the same as the number of nodes. This shows that no generalization has taken place. The number of cluster centers is the same when using 1,734 features and 64 features. The two main reasons why this happens are the same as in the previous case: the maximum influence field might not be big enough or it depends on the distribution of the data. This will also be more thoroughly discussed in chapter 4.. 20.

(31) 4 Problem analysis In this chapter some of the difficulties that arise when using an evolving neural network in a face detection task will be analyzed. There will also be suggestions of how to improve the performance of the face detection algorithm.. 4.1 Context classifier When the performance of the context classifier is evaluated, the number of cluster centers is considered. The number of cluster centers is the same as the number of training samples which implies that no generalization has taken place. As previously mentioned there are two main reasons for this: the maximum influence field is not big enough or it depends on the distribution of the data. The first attempt to find the cause of this problem is to change the maximum size of the influence field. Unfortunately this does not affect the result, i.e. no generalization takes place. If changes in the size of the maximum influence field do not change the result, then the reason must be the distribution of the data. It is not possible to plot the distribution of the dataset since each sample consists of 200 values, which is equal to 200 dimensions. Class two in this case is the background with no restrictions on what can appear in the background. As a result of this the background can be anything and therefore the data is assumed to have the following distribution, see figure 4.1.. Figure 4.1: This is the assumed data distribution of the two classes. Class two, background, is everything within the biggest circle. Class one, faces, is a small subset of class 2 marked by the dashed circle. As a result of this the cluster centers can not grow big enough to include more than one node, the reason being that there are nodes of the other class that stops the cluster. 21.

(32) 4.1 Context classifier centers from growing. Recall from section 3.3.1 that the influence field grows until it reaches its maximum size or the influence field of another class. The distribution of the data depends on the features extracted. In this case the samples of the two classes appear to be very similar. The reason for this is the feature extraction method chosen. There are two classes and both classes include background, see figure 4.2. The images chosen for this application have a complex background; there are no restrictions of what can appear in the background. Therefore it is difficult to find a good description of the two classes. The background class will be most difficult, since it could be anything.. (a). (b). Figure 4.2: In (a) the positive context sample, the face area (from eyebrows to mouth) covers one fourth of the 40 x 40 window. The rest of the window is background. In (b) the negative context sample, the whole window is background . When the general performance of the classifier was evaluated using ten-fold cross validation, the detection percent was 75.4 %. When the whole image was used as test data a new problem occurred, caused by the sliding window. The sliding window moves one pixel at a time. This resulted in many overlapping windows, classified as the same class. As a result of this, for example too many windows detected as the face context class. The purpose of the two-step algorithm is that the first step reduces the search space for the second step. To minimize the effect of the overlapping problem, the step of the sliding window was increased 10 pixels. This solution is acceptable for this application, but not a good solution. A better solution would be if the classifier could choose the window that best represents the class from all the overlapping windows. The classifier would then be able to optimize its output. It is difficult to optimize since, for example, the number of face context areas is unknown. Therefore it is not enough to find the global minimum, the local minimums have to be found, which is much more complicated.. 22.

(33) 4.2 Face classifier. 4.2 Face classifier In the case of the face classifier the same problems as with the context classifier occur, i.e. no generalization takes place and the overlapping of the sliding window. The number of cluster centers is the same as the number of training samples also in this case. Also here the problem is caused by the distribution of the data, see figure 4.1. To extract the face features Wavelets are used. This method uses an overcomplete representation; as a result of this the number of extracted features is high. This creates problems, the EGCC classifier and other classifiers have difficulties in managing this many dimensions of the input. This problem is known as the ”Curse of Dimensionality” by [2]. In this part of the application only the 40 x 40 windows classified as class one are searched further to decide whether they contain a face or not. The reason for this is to reduce the search space, i.e. not to have to search the whole image again according to the problem mentioned in chapter 4.1 with too many of the 40 x 40 windows classified as class one. As a result of this the search space is not reduced. Also in this part of the algorithm the problem with overlapping occurs. A 19 x 19 window slides across the 40 x 40 window moving one pixel at a time. This results in too many overlapping windows and therefore the step of the 19 x 19 window is increased 4 pixels. This reduces the effect of the overlapping problem, but it is not the ideal solution, see section 4.1. To increase the performance of the classifier, the number of features was reduced. A number of features that are not too high and not too low is wanted [2]. The Signalto-Noise ratio was used to rank the features according to importance. When the 64 most important features were extracted, it gave the best performance. For feature reduction for example SNR, PCA and LDA are widely used. In this case SNR was considered as the easiest and fastest way to get an idea of which of the features that actually contributed to the description of the two classes.. 4.3 Suggestions for improvement In the approach chosen for the face detection algorithm several problems occurred. The problems are caused by the number of features extracted and what the extracted features represent. Suggestions for improvement: • Have the features ranked and evaluated before using the classifier. Further analysis of feature reduction should be explored, for example SNR, PCA and LDA are widely used. • Instead of considering face detection as a two-class problem consider, it as a oneclass problem. The class is then the face class, the positive class. The classifier is only trained on this class and this removes the problem of the two classes being too similar.. 23.

(34) 4.3 Suggestions for improvement • Collect more data and improve the performance by using a bigger data set for training and for testing. These are only a few suggestion of what can be done to improve the performance. The main focus should be on feature extraction and what the features represent since this is fundamental part of the algorithm. The performance of the classifier can be improved if the features are good.. 24.

(35) 5 Results and conclusions This chapter presents the results of the outcome of this project. The aim was to explore the possibilities of using an evolving neural network in a face detection algorithm and to evaluate whether it increased the performance of the algorithm or not. Areas for future work are also suggested.. 5.1 Spatial context and evolving neural network The implemented algorithm uses a context-based feature extraction method in combination with an evolving classifier. When these two methods are combined some problems appear. This is caused by the number of features extracted and the quality of the features. Before the two methods to be combined are chosen, the requirements needed to be fulfilled to get each of the methods to perform well should be examined carefully. Face detection is a complex problem and it is not as simple as combining two methods and expecting them to perform well together. The context-based method was chosen because it used images with a complex background, and therefore had a more general area of application. The combination with the evolving classifier was chosen because it was expected to improve the performance of the algorithm. The evolving features of this algorithm can be an advantage if a new data set is introduced to the algorithm. The training can then proceed; there is no need to re-train the whole algorithm. The combination of the two methods in the algorithm needs to be evaluated to improve the performance. The combination has the potential of achieving good detection results, but not in the implementation of today. The core of face detection is to find a feature extracting method that produces fewer features and provides a representation of the class.. 5.2 Face detection performance The performance of the implemented face detection algorithm was not as expected. The reason is that the search space was not reduced. Therefore the actual size of the search space in the second part of the implementation in some cases was bigger than the size of the original image. There are two reasons for this problem: the features extracted describing the two classes and the overlapping of the sliding windows as described in chapter 4.. 25.

(36) 5.3 Conclusions The detection result of the algorithm varied. A number of 108 images was used from the database at Link¨ oping University and 12 images were used from the UCD Colour Face Image Database. The training data was extracted from the 108 LiU images. The best samples of the two classes were marked by hand in these images; not all samples of each class were marked as training data. The 12 UCD images were only used for testing. • In the LiU images the number of faces are: 167 • In the UCD images the number of faces are: 29 This gives a total number of 196 faces in all images. For a face to be considered as correctly detected the eyes, nose and mouth have to be within the marked area. 51.5 % of the faces are correctly detected, equivalent to 101 faces. For every correctly detected face the number of miss-detections were approximately ten times as high. The context-based approach involves certain assumptions, for example the face has to be in a frontal position, no parts occluded and limited changes of illumination [1]. In some cases faces were detected even though they did not comply with the assumptions, and faces were not detected in some cases even though they did comply with the assumptions. A few examples of the detection result are shown in the following pictures, see figures 5.1 – 5.5.. Figure 5.1: The face detected and two miss-detections, all in the neighbourhood of the face. The images are a good representation of all the images in the data set. The background varies, the illumination varies and the viewing angle varies.. 5.3 Conclusions A face detection algorithm has been developed using a context-based feature extraction method combined with an evolving classifier. The complexity of the face detection problem and the extent of the difficulties in implementing a face detection algorithm. 26.

(37) 5.3 Conclusions. Figure 5.2: Only miss-detections not even close to the face. The face complied with the assumptions.. Figure 5.3: The face detected and one miss-detection in the neighbourhood of the face.. 27.

(38) 5.3 Conclusions. Figure 5.4: The face detected even though it did not comply with the assumptions.. Figure 5.5: The face detected.. 28.

(39) 5.4 Future work are now well understood. It is difficult to implement a face detection algorithm for a more general area of application. In this project the aim was to implement a face detection algorithm with a more general area of application. It would have been easier to only consider one aspect of the face detection problem to begin with, for example either illumination, viewing angle or background. The outcome of the project is not exactly as expected from the beginning. It has provided a good understanding of the face detection problem and all the factors that have to be considered when developing a face detection algorithm. The outcome of the project is not only about a good result; it is also about understanding the problem. The difficulty of the face detection problem is now well understood and also why the researchers have not managed give a general solution to this problem.. 5.4 Future work In this chapter the result of the implemented face detection algorithm has been presented and the conclusions drawn. There is still research to be done in this area. A few suggestions of areas worth looking closer at according to the point of view of this project: • Consider only one of the aspects of the face detection problem to begin with. When the algorithm is performing well, extend it to include one more aspect until all the aspects: illumination, viewing angle and background are considered. • Evaluate how to reduce the number of features in an existing method and try to improve the representation of the features. • Evaluate different feature extraction methods and aim to find a method extracting few features that well represent the class. There are only a few suggestions of what can be done in the area of face detection. Even though it has been a research area for more than ten years, no general solution to the problem has been proposed yet. When a solution is presented that manages to fulfill all requirements for a general solution, then many things are going to change, for example human-computer interaction and surveillance systems will be improved.. 29.

(40) Bibliography [1] N. Bergboer, E. Postma, and H. van den Herik. Context-enhanced object detection in natural images. In Context-Enhanced Object Detection in Natural Images in Proceedings of the Belgian-Netherlands AI Conference (BNAIC) 2003, pages 27 – 34, Nijmegen, The Netherlands, October 2003. [2] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1996. [3] J. A. Bloom and T. R. Reed. A gaussian derivative-based transform. IEEE Transactions on Image Processing, 5(3):551–553, March 1996. [4] R. Boomgaard van den. Algorithms for linear scale-space. Lecture notes for PhD course Front-End Vision at University of Amsterdam the Netherlands, January 2004. [5] J. M. Geusebroek, A. Dev, R. van den Boomgaard, A. W. M. Smeulders, F. Cornelissen, and H. Geerts. Color invariant edge detection. In Scale-Space theories in Computer Vision, volume 1252 of Lecture Notes in Computer Science, pages 459–464. Springer-Verlag, 1999. [6] E. Hjelm˚ as and B. K. Low. Face detection: A survey. Computer Vision and Image Understanding, 83(3):236 – 274, September 2001. [7] N. Kasabov. Evolving Connectionist Systems. Springer-Verlag, 2003. [8] P. Kuchi, P. Gabbur, P. S. Bhat, and S. S. David. Human face detection and tracking using skin color modeling and connected component operators. IETE Journal of Research, 38(3 – 4):289 – 293, May-Aug 2002. [9] B.-H. Lee, K.-H. Kim, and J. Nam. Efficient and automatic faces detection based on skin-tone and neural network model. In Developments in Applied Artificial Intelligence: 15th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, volume 2358, pages 57 – 66, Cairns, Australia, June 2002. [10] S. Mallat. A theory for multiresolution signal decomposition: The wavelet model. In Proc. IEEE Computer Society Workshop on Computer Vision, pages 2–7, Washington D.C., 1987. IEEE Computer Society Press.. 30.

(41) Bibliography [11] C. Papageorgiou and T. Poggio. A trainable system for object detection. International Journal of Computer Vision, 38(1):15 – 33, 2000. [12] B. M. Romeny ter Haar. Front-End Vision and Multi-Scale Image Analysis. Kluwer Academic Publishers, 2002. [13] H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23 – 38, January 1998. [14] H. A. Rowley, S. Baluja, and T. Kanade. Rotation invariant neural network-based face detection. In Computer Vision and Pattern Recognition, pages 38 – 44, 1998. [15] P. Sharma and R. Reilly. A colour face image database for benchmarking of automatic face detection algorithms. In 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications, volume 1, pages 423 – 428, Zagreb, Croatia, July 2003. [16] M.-H. Yang, D. J. Kriegman, and N. Ahuja. Detecting faces in images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1):34 – 58, January 2002. [17] K. C. Yow. Automatic Human Face Detection and Localization. PhD thesis, University of Cambridge, Cambridge, United Kingdom, 1998. [18] D. Zhang and N. Kasabov. A novel evolving connectionist based system for classification tasks with growing cluster centres. In Conference on Neuro-Computing and Evolving Intelligence, pages 46–47, Auckland, New Zealand, December 2004. [19] H. Zhang, D. Zhao, and X. Chen. Combining skin color and neural network for rotation invariant face detection. In Advances in Multimodal Interfaces - ICMI 2000: Third International Conference, volume 1948, pages 237 – 244, Beijing, China, October 2000.. 31.

(42) Glossary A Auckland University of Technology (AUT) University in Auckland, New Zealand, p. ii.. C classifier. It takes a feature representation of an object and maps it to a classification label. In this case a feature vector that will be mapped to class one or two, p. 2.. color differential structure contains the transitions of yellow-blue and red-green in the image and from this the color edges can be extracted, p. 8. color receptive fields A biologically inspired decomposition of color data, i.e. decomposed into opponent fields in the same way as color data in the early stages of the human visual system is, p. 8. color space. The range of colors that can be described by a color model, e.g. RGB. The number of colors that can be described by the model depends on the primary colors of that model, e.g. in RGB the primarys are red, green and blue, p. 7.. D Department of Science and Technology (ITN) Department at Linköping University, Campus Norrk¨ oping, Sweden, p. ii.. E Evolving Connectionsist Video Image and Speech Processing (ECoVIS) An environment for developing adaptive speech and image recognition systems, developed at KEDRI, p. 2. Evolving Growing Clusters Classifier (EGCC) is a knowledge-based neural network model for classification of existing knowledge and future incoming data, improved from normal Radial Basis Function (RBF) and Evolving Classifier Function (ECF). The main feature of the EGCC training algorithm is the gradual growth of the cluster centers, p. 14.. 32.

(43) Glossary evolving neural network it is a neural network that operates continuously in time and adapts its structure to changes in the environment, p. 2.. F face recognition A computer algorithm for identification of a person’s face by matching it against a database of known faces, p. 6.. G Gaussian derivative In computational vision the mathematical derivative is replaced the Gaussian Derivative. This means the mathematical derivative is replaced with a convolution of the image area and the derivative of the Gaussian kernel. This is used to capture the local image structure, p. 3.. H Haar. This is the first and the simplest wavelet transform. This wavelet resembles a stepfunction, p. 16.. HSI. Color space: H – hue, S – saturation and I – intensity. This system gives a closer description, than for example RGB, of the way humans experience color, p. 7.. I influence field. The region surrounding the node, p. 14.. inter-class. Differences between classes, p. 13.. intra-class. Differences within the class, p. 13.. K Knowledge Engineering Discovery and Research Institute (KEDRI) Research Institute at AUT, p. ii.. L Linear Discriminant Analysis (LDA) A common technique for data classification and dimensionality reduction. LDA maximizes the ratio of between-class variance to the within-class variance in any particular data set thereby guaranteeing maximal separability, p. 23. Link¨ oping University (LiU) University in Linköping, Sweden, p. ii.. 33.

(44) Glossary. N NCEI ’04. 3rd Conference on Neuro-Computing and Evolving Intelligence organized by KEDRI 2004, p. 3.. neural network An neural network is a mathematical or computational model for information processing based on a connectionist approach to computation. A connectionist approach means a network consisting of relatively simple processing elements and the global behaviour is determined by the connections between the processing elements, p. 8. normalized RGB (nRGB) Color space: r= R/(R+G+B), g = G/(R+G+B) and b = B/(R+G+B). This color space is derived from the original RGB components, p. 7.. P pattern recognition a pattern consists of the certain features extracted to describe a class, e.g, faces, and pattern recognition is the ability to recognize and distinguish this from all other patterns in for example an image, p. 8. Principle Component Analysis (PCA) A technique to reduce the dimensionality in a dataset but still keep the characteristics of the dataset that contribute most to its varians. It is a way of ranking the data in the dataset in order of importance, p. 23.. R RGB. Color space: R – red, G – green and B – blue. This color space is used in computers, p. 7.. S search space the area that needs to be searched in order to find the faces, p. 7. Siftware. it is a software package for visualization, analyzing and modelling of data, developed by KEDRI, p. 18.. Signal-to-Noise ratio (SNR) The ratio between the magnitude of a signal and the magnitude of background noise, p. 18. sliding window. A window moving across the image one pixel at a time, p. 12.. Sobel Operator it is a simple edge detection algorithm that uses the first order derivative of the intensity information. When the intensity abruptly changes the gradient is given a high value and this is likely to represent edges, p. 6.. 34.

(45) Glossary spatial context the environment, here: color range, in which the task, here: the face, is expected to be found, p. 3. support vector machine (SVM) Map the data into a space with higher dimensions than the space of the data. This is done to improve the seperability of the data, which simplifies the classification. A hyperplane is used to separate the classes from each other in the space of higher dimensions, p. 3.. T template matching method a technique used in digital image processing for recognition of parts of an image. The parts of the image are compared to the pattern of the class to be identified, the template, e.g. if a face is to be recognized all parts of the image are compared to a model of a face, p. 8.. U University College Dublin (UCD) University in Dublin, Ireland, p. 3.. W Wavelets. A wavelet is a waveform of effectively limited duration that has an average value of zero. Wavelets are mathematical functions that divide data into different frequency components, and then study each component with a resolution matched to its scale, p. 3.. Y YCbCr. Color space: Y – luminance, Cb and Cr – chrominance (color). This color space is used widely in digital video, p. 7.. YUV. Color space: Y – luminance, U and V – chrominance (color). This color space is used in the PAL system for television, p. 8.. 35.

(46)

No results found