Road and Traffic Signs Recognition using Support Vector Machines

(1)

Road and Traffic Signs Recognition using Support Vector Machines

Min Shi

Master thesis

2006 Nr: E3395D

(2)

DEGREE PROJECT Computer Engineering

Programme Reg number Extent

Master of Science in Computer Engineering E3395D 30 ECTS

Name of student Year-Month-Day

Min Shi 2006-07-18

Supervisor Examiner

Hasan Fleyeh Mark Dougherty

Company/Department Supervisor at the Company/Department

Department of Computer Engineering, Dalarna University Hasan Fleyeh

Title

Road and Traffic Signs Recognition using Support Vector Machines

Keywords

Road Sign Recognition, Support Vector Machines (SVM), Zernike Moments, Fuzzy ARTMAP, Intelligent Transportation System

Abstract

Intelligent Transportation System (ITS) is a system that builds a safe, effective and integrated transportation environment based on advanced technologies. Road signs detection and recognition is an important part of ITS, which offer ways to collect the real time traffic data for processing at a central facility.

This project is to implement a road sign recognition model based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road signs. We focus on recognizing seven categories of road sign shapes and five categories of speed limit signs. Two kinds of features, binary image and Zernike moments, are used for representing the data to the SVM for training and test. We compared and analyzed the performances of SVM recognition model using different features and different kernels.

Moreover, the performances using different recognition models, SVM and Fuzzy ARTMAP, are observed.

(3)

Acknowledgement

This work was supported by the Department of Computer Engineering at Dalarna University. I greatly acknowledge my supervisor, Mr Hasan Fleyeh, for motivating this work and providing many advices and help. During my Master’s studies, he taught me many knowledges related to Computer Graphic and Digital Image Processing, enabling me to lay a solid foundation for this work.

I am grateful to Professor Mark Dougherty, Dr. Pascal Rebreyend and all other teachers at Dalarna University for their indoctrination. I am also grateful to all the classmates, especially Xiao Hu and Jia Cai, for their help during my studies, all the friends in Sweden for sharing and memories.

Last but not the least, I would like to give my special thanks to my husband and my family for their love and support.

(4)

______________________________________________________________________________________________________________________________________________________________________

List of Figures

TUFIGURE 1.1 A STRUCTURE OF ROAD SIGN DETECTION AND RECOGNITION SYSTEM^UT...10

TUFIGURE 1.2 EXAMPLES OF WARNING SIGNS^UT... 11

TUFIGURE 1.3 EXAMPLES OF PROHIBITORY SIGNS^UT...12

TUFIGURE 1.4 EXAMPLES OF MANDATORY SIGNS^UT...12

TUFIGURE 1.5 EXAMPLES OF SIGNS GIVING INFORMATION^UT...13

TUFIGURE 1.6 SHAPES FOR RECOGNITION^UT...14

TUFIGURE 1.7 SPEED LIMITS FOR RECOGNITION^UT...14

TUFIGURE 1.8 ROAD SIGN RECOGNITION MODEL^UT...15

TUFIGURE 1.9 EXAMPLES OF DIFFICULTIES [10]^UT...17

TUFIGURE 2.1 BLOCK DIAGRAM OF SUPERVISED LEARNING^UT...21

TUFIGURE 2.2 AN OVERFITTING CLASSIFIER AND A BETTER CLASSIFIER^UT...22

TUFIGURE 2.3 AN OVERVIEW OF THE SVM PROCESS^UT...23

TUFIGURE 2.4 TWO WAYS SEPARATE THE DATA WITH TWO CATEGORIES^UT...24

TUFIGURE 2.5 LINEAR CLASSIFICATION FOR TWO-DIMENSIONAL INPUT VECTORS^UT...25

TUFIGURE 2.6 A MAPPING FROM A TWO-DIMENSIONAL INPUT SPACE TO A TWO-DIMENSIONAL FEATURE SPACE^UT...28

TUFIGURE 2.7 TWO EXAMPLES OF VC DIMENSION^UT...36

TUFIGURE 2.8 THE BOUND ON ACTUAL RISK OF A CLASSIFIER^UT...37

TUFIGURE 2.9 AN EXAMPLE OF SOFT MARGIN CLASSIFIER^UT...40

TUFIGURE 3.1 A BLOCK DIAGRAM OF ROAD SIGNS CLASSIFICATION USING SVM^UT...44

TUFIGURE 3.2 AN EXAMPLE OF BINARY REPRESENTATION OF ROAD SIGN^UT...47

TUFIGURE 4.1 SOME BINARY IMAGES AND THEIR CORRESPONDING CATEGORIES OF SPEED LIMIT SIGNS FOR RECOGNITION^UT...53

TUFIGURE 4.2 SOME BINARY IMAGES AND THEIR CORRESPONDING CATEGORIES OF ROAD SIGN SHAPES FOR RECOGNITION^UT...54

TUFIGURE 4.3 SOME INSTANCES OF SPEED LIMIT SIGNS 70 WITH THE FEATURE OF PEPPER NOISE.^UT...59

UTFIGURE 4.4 RECONSTRUCTED IMAGE WITH DIFFERENT ORDER OF ZERNIKE MOMENTS^T.64^U TUFIGURE 4.5 THE PERFORMANCES OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER C FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION^UT...67

TUFIGURE 4.6 THE SUPPORT VECTORS OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER ^UTν ^TU FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION^UT...68

TUFIGURE 4.7 THE PERFORMANCES OF THE SVM MODEL USING LINEAR KERNEL AND DIFFERENT PARAMETER ^UTν ^TU FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION^UT...68

TUFIGURE 4.8 THE PERFORMANCES OF THE SVM MODEL USING RBF KERNEL AND C-SVM

(7)

______________________________________________________________________________________________________________________________________________________________________

OF DIFFERENT PARAMETER ^UTγ ^TU FOR SPEED LIMIT RECOGNITION WITH ZERNIKE

MOMENT REPRESENTATION^UT...69

TUFIGURE 4.9 THE PERFORMANCES OF THE SVM MODEL USING SIGMOID KERNEL AND C-SVM OF DIFFERENT PARAMETER ^UTr^TU FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION^UT...70

TUFIGURE 4.10 THE PERFORMANCES OF THE SVM MODEL USING POLYNOMIAL KERNEL AND C-SVM OF DIFFERENT PARAMETER ^UTd^TU FOR SPEED LIMIT RECOGNITION WITH ZERNIKE MOMENT REPRESENTATION^UT...71

TUFIGURE 4.11 THE PERFORMANCES OF SA SEARCH WITH DIFFERENT COOLING RATIO^UT....75

TUFIGURE A.1 A USE CASE OF CONVERTING IMAGE DATA TO SVM DATA^UT...81

TUFIGURE A.2 DISPLAY THE BINARY IMAGES SELECTED FOR TEST AND THE BINARY IMAGES SELECTED FOR TRAINING^UT...83

TUFIGURE A.3 A USE CASE OF CONVERTING ZERNIKE MOMENTS DATA TO SVM DATA^UT...84

TUFIGURE A.4 DISPLAY THE TOTAL NUMBER OF TRAINING DATA AND THE TOTAL NUMBER OF TEST DATA FOR CONVERTING ZERNIKE MOMENTS TO SVM.^UT...85

TUFIGURE A.5 THE STEPS OF TRAINING AND TEST THE SVM MODEL WITH DEFAULT VALUES^T ...86^U TUFIGURE A.6 DISPLAY THE TRAINING AND TEST RESULTS.^UT...87

UTFIGURE A.7 AN ILLUSTRATIONS THE PROCESS AND THE CREATED FILES AT EVERY STEP^T88Û UTFIGURE A.8 AN ILLUSTRATIONS THE PROCESS AND THE CREATED FILES AT EVERY STEP^T89Û TUA.9 A USE CASE OF TRAINING AND TEST SVM FROM A PARAMETER FILEÛT...90

TUFIGURE A.10 A USE CASE OF TRAIN AND TEST SVM MODEL WITH GRID SEARCH^UT...91

TUFIGURE A.11 A USE CASE OF TRAIN AND TEST SVM MODEL WITH SA SEARCH^UT...92

TUFIGURE A.12 A USE CASE OF PREDICTING A IMAGE FILE^UT...93

TUFIGURE A.13 THE PROCESS OF PREDICTING WITH PROBABILITY^UT...93

(8)

______________________________________________________________________________________________________________________________________________________________________

List of Tables

TUTABLE 1.1: SOME CHARACTERISTICS OF TWO WAYS FOR ROAD SIGNS DETECTION AND

RECOGNITION^UT...9

TUTABLE 1.2 MAIN COLORS AND SHAPES IN SWEDISH ROAD SIGNS^UT...13

TUTABLE 2.1 THE PRIMAL FORM OF PERCEPTRON ALGORITHM^UT...26

TUTABLE 2.2 THE DUAL FORM OF PERCEPTRON ALGORITHM^UT...27

TUTABLE 3.1 ROAD SIGN SHAPES GROUP^UT...45

TUTABLE 3.2 SPEED LIMIT SIGNS GROUP^UT...45

TUTABLE 3.3THE ZERNIKE MOMENTS REPRESENTATION OF ROAD SIGN^UT...50

UTTABLE 4.1 DESIRED OUTPUTS OF ROAD SIGN SHAPES WITH BINARY REPRESENTATION^T..55^U TUTABLE 4.2 DESIRED OUTPUTS OF SPEED LIMIT SIGNS WITH BINARY REPRESENTATION^UT...55

TUTABLE 4.3 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH BINARY REPRESENTATION^UT...56

TUTABLE 4.4 CONFUSION MATRIX OF ROAD SIGN SHAPES CLASSIFICATION WITH BINARY REPRESENTATION ON TRAINING SET^UT...56

TUTABLE 4.5 CONFUSION MATRIX OF ROAD SIGN SHAPES CLASSIFICATION WITH BINARY REPRESENTATION ON TEST SET^UT...57

TUTABLE 4.6 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH BINARY REPRESENTATION^UT...57

TUTABLE 4.7 CONFUSION MATRIXES OF ONE WORST PAIR OF RESULTS FOR SPEED LIMIT SIGNS CLASSIFICATION WITH BINARY REPRESENTATION ON TRAINING SET^UT...58

TUTABLE 4.8 CONFUSION MATRIXES OF ONE WORST PAIR OF RESULTS FOR SPEED LIMIT SIGNS CLASSIFICATION WITH BINARY REPRESENTATION ON TEST SET^UT...58

TUTABLE 4.9 TWO EXAMPLES OF ROAD SIGN IMAGES THAT WERE CLASSIFIED INCORRECTLY WITH BINARY REPRESENTATION.^UT...58

TUTABLE 4.10 DESIRED OUTPUTS OF ROAD SIGN SHAPES WITH ZERNIKE MOMENTS REPRESENTATION.^UT...59

TUTABLE 4.11 DESIRED OUTPUTS OF SPEED LIMIT SIGNS WITH ZERNIKE MOMENTS REPRESENTATION^UT...59

TUTABLE 4.12 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH ZERNIKE MOMENTS REPRESENTATION^UT....60

TUTABLE 4.13 CONFUSION MATRIXES OF THE WORST PAIR OF RESULTS FOR ROAD SIGN SHAPES CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ON TRAINING SET^UT...61

TUTABLE 4.14 CONFUSION MATRIXES OF THE WORST PAIR OF RESULTS FOR ROAD SIGN SHAPES CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION ON TEST SET.^UT ...61

TUTABLE 4.15 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS ON THE TEN PAIRS OF TRAINING/TEST DATA SET WITH ZERNIKE MOMENTS REPRESENTATION^UT....62

(9)

______________________________________________________________________________________________________________________________________________________________________

TUTABLE 4.16 CONFUSION MATRIX OF THE WORST TRAINING RESULT FOR SPEED LIMIT SIGNS CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION^UT...62

TUTABLE 4.17 CONFUSION MATRIX OF THE WORST TEST RESULT FOR SPEED LIMIT SIGNS CLASSIFICATION WITH ZERNIKE MOMENTS REPRESENTATION^UT...63

TUTABLE 4.18 SOME INSTANCES OF ROAD SIGNS THAT WERE CLASSIFIED INCORRECTLY WITH ZERNIKE MOMENTS REPRESENTATION.^UT...63

TUTABLE 4.19 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES USING

DIFFERENT KERNELS AND SVM TYPES WITH BINARY REPRESENTATION^UT...65

TUTABLE 4.20 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS USING

DIFFERENT KERNELS AND SVM TYPES WITH BINARY REPRESENTATION^UT...65

TUTABLE 4.21 THE CORRECT CLASSIFICATION RATE OF ROAD SIGN SHAPES USING

DIFFERENT KERNELS AND SVM TYPES WITH ZERNIKE MOMENTS REPRESENTATION^UT

...66

TUTABLE 4.22 THE CORRECT CLASSIFICATION RATE OF SPEED LIMIT SIGNS USING

DIFFERENT KERNELS AND SVM TYPES WITH ZERNIKE MOMENTS REPRESENTATION^UT

...66

TUTABLE 4.23 THE CONTRASTS OF WITH AND WITHOUT GRID SEARCH FOR ROAD SIGN SHAPES CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS

REPRESENTATION^UT...72

TUTABLE 4.24 THE CONTRASTS OF WITH AND WITHOUT GRID SEARCH FOR SPEED LIMIT SIGNS CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS

REPRESENTATION^UT...73

TUTABLE 4.25 PROBABILITY OF ACCEPTANCE FOR T=0.1 AND EVAL(VÛBUCÛBU)=0.5ÛT...74

TUTABLE 4.26 THE CONTRASTS OF GRID SEARCH AND SA SEARCH FOR ROAD SIGN SHAPES CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION^UT75

TUTABLE 4.27 THE CONTRASTS OF GRID SEARCH AND SA SEARCH FOR SPEED LIMIT SIGNS CLASSIFICATION USING RBF KERNEL WITH ZERNIKE MOMENTS REPRESENTATION^UT76

UTTABLE 4.28 BEST CLASSIFICATION RESULTS FROM DIFFERENT RECOGNITION MODELS^T.77^U

TUTABLE A.1 THE FORMAT OF IMAGE LIST FILE^UT...82

TUTABLE A.2 THE FORMAT OF ZERNIKE MOMENTS DATA FILE^UT...84

TUTABLE A.3 THE FORMAT OF PARAMETER FILE^UT...89

(10)

______________________________________________________________________________________________________________________________________________________________________

Chapter One

Introduction

(11)

______________________________________________________________________________________________________________________________________________________________________

1.1 Background

Nowadays technology development has made that it is possible to drive a robotic vehicle automatically on roads. An intelligent transportation system (ITS) is a general term for a wide range of technologies incorporated into traditional transportation infrastructure and vehicles. These systems can include roadway sensors, in-vehicle navigation services, electronic message signs, and traffic management and monitoring.

ITS technologies are being widely deployed to maximize transportation safety and efficiency[1]. It aims to manage factors that are typically at odds with each other such as vehicles, loads, and routes to improve safety and reduce vehicle wear, transportation times and fuel costs[2].

Intelligent Transportation Systems vary in technologies applied, from basic management systems such as car navigation, traffic light control systems, container management systems, variable message signs or speed cameras to monitoring applications such as security CCTV systems, and then to more advanced applications which integrate live data and feedback from a number of other sources, such as real-time weather, bridge deicing systems, and the like. Additionally, predictive techniques are being developed, to allow advanced modeling and comparison with historical baseline data[2].

Road signs detection and recognition is an important part of ITS, which offer ways to collect the real-time traffic data for processing at a central facility. It can be realized in two different ways:

Information communication technologies based on distributed and pervasive applications.

Intelligent detection and recognition based on artificial intelligence and image analysis.

Some characteristics of both methods for road signs detection and recognition have been shown in table 1.1.

(12)

______________________________________________________________________________________________________________________________________________________________________

Table 1.1: Some characteristics of two ways for road signs detection and recognition

Ways Properties

Information Communication

Intelligent

Detection and Recognition Data Type Digital signal Physical signal to digital signal Data receiver Sensor and software interface Digital camera

Environment Soft environment Hard environment Technologies

Distributed and pervasive application based on wireless

communication

AI and image analysis Basic services Utilize network-based services None

Accuracy Refined classification Coarse classification

Feasibility Immature Feasible

Despite that the application of information communication can implement refined recognition of road signs, its implementation is dependent on the related facilities and services, for example network service, road condition data, road sign sensor and so forth. However, the implementation of information detection and recognition using AI and image analysis technologies is to mount a digital camera on each vehicle. In complex real-time environment, none of these two methods can be replaced completely by another. Actually, the combination of both methods makes the system more stable and reliable to provide higher security.

This project is to implement road sign recognition based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road sign with two kinds of digital signals, binary image and Zernike moments.

1.2 System Structure

When vehicles are drived on public roads, the rules of roads must be obeyed. Some of these rules are delivered through the use of road signs. So, an autonomous vehicle must have the capabilities of detecting and recognizing road signs and adjusting its behavior accordingly.

(13)

______________________________________________________________________________________________________________________________________________________________________

Figure 1.1 A structure of road sign detection and recognition system

Figure 1.1 shows a structure of road sign detection and recognition system based on AI and image analysis technologies. Assuming that a digital camera is mounted in the front of a vehicle, it is used to take pictures on roads. These pictures are transferred into the system. Color thresholding is used for road sign detection to segment the image and shape analysis. If a road sign is detected in an image, only the road sign part of this image is kept. Before a road sign image is inputted into a trained learning machine to be recognized, the feature values of this image have to be calculated. And then the learning machine outputs a value that implies a road sign.

In this project, a recognition model is implemented, which constructs the learning machine using a new pattern recognition technology, Support Vector Machines. For comparing and analyzing the effects of training the SVM with different feature values, two kinds of feature value, binary image and Zernike moments, will be used to train the test the SVM.

1.3 Swedish Road Signs

Road signs are designed to be easily recognized by human drivers mainly because their color and shapes are very different from natural environments[3]. Swedish road administration is in charge of defining the appearance of all signs and road markings in Sweden. They divide the road signs into four different classes.

Recognition Model

Detection Model Road Image Capturing

Color Segmentation And Shape Analysis

Extract Road Sign

Feature Values Classification Machine

Validation

(14)

______________________________________________________________________________________________________________________________________________________________________

Warning signs. A traffic warning sign is a type of traffic sign that indicates a hazard ahead on the road [4]. In Sweden, it is an equilateral triangle with a thick red border and a yellow background because it is easier to see a red/yellow sign in the snowy weather. Some other signs like the distance to level crossing signs and track level crossing are also belong to this class.

Prohibitory signs. Prohibitory traffic signs are used to prohibit certain types of maneuvers or some types of traffic. For example, no entry sign, no parking sign, speed limits sign and so on. Normally, they are designed in the shape of circle with a thick red border and a yellow background. There are few exceptions; the international standard stop sign is an octagon with red background and white border and the NO PARKING and NO STANDING signs that have a blue background instead of yellow. The ending restriction signs are marked with black bars.

Mandatory signs. They are always round blue signs with white border. They control the actions of drivers and road users. Signs ending obligation have a red slash.

Signs giving information. The Swedish diamond shaped and rectangle shaped signs are the signs informing about priority road, including the services along the road. These signs are normally with green, yellow, white, blue, black background.

Figure 1.2 Examples of warning signs

(15)

______________________________________________________________________________________________________________________________________________________________________

Figure 1.3 Examples of prohibitory signs

Figure 1.4 Examples of mandatory signs

(16)

______________________________________________________________________________________________________________________________________________________________________

Figure 1.5 Examples of Signs giving information

1.3.1 Properties of Road Signs

The above examples (figure 1.2-1.5) have shown that colors and shapes are basic characteristics of road signs. Road signs are designed, manufactured and installed according to tight regulations. They are designed in fixed 2-D shapes like triangles, circles, octagons, or rectangles. The colors of the signs are chosen to be far away from the environment, which make them easily recognizable by the drivers[5]. There are mainly seven different shapes and seven different colors in Swedish road signs, which have been shown in table 1.2. The road sign detection and recognition system can be implemented by either color information, shape information or both of them.

Combining color information and shape information may give better results[5].

Table 1.2 Main colors and shapes in Swedish road signs

Colors Shapes White

Yellow Orange Red Green Blue Black

Rectangle Diamond Circle Octagon Cross Buck Upward Triangle Downward Triangle

(17)

______________________________________________________________________________________________________________________________________________________________________

1.4 Shape-based Recognition

In recent years there are has been a surge of papers describing road sign recognition methods. One of the points supporting the use of shape information for road signs recognition is the lack to standard colors among the countries. Systems rely on colors need to be tuned by moving from one country to another. The other point in this argument is the fact that colors vary as daylight and reflectance properties vary. In situations in which it is difficult to extract color information such as twilight time and nighttime, shape detection will be a good alternative[5].

This project focuses on recognizing seven categories of road signs (figure 1.6) and five speed limit signs (figure 1.7). Since comparing with other categories these road signs are more important and more difficult to be classified by computers.

Figure 1.6 Shapes for recognition

Figure 1.7 Speed limits for Recognition

All of these road signs will be recognized by shape information only. In other words, the color properties of the road signs will be ignored during classification process.

Lots of road sign samples will be used for training the learning machine, while new road sign samples will be used to verify this machine.

(18)

______________________________________________________________________________________________________________________________________________________________________

Figure 1.8 Road sign recognition model

The binary image of the road sign is extracted as one of its features to train and test the learning machine. Feature model is optional in the recognition model; in this project Zernike moments are used to select the features of a binary image (figure 1.8).

Detailed process of the recognition using SVM will be discussed in the chapter three.

1.5 Potential Difficulties

Identification of traffic signs correctly at the right time and the right place is very important for car drivers to insure themselves and their passengers’ safe journey.

However, sometimes, due to the change of weather conditions or viewing angles, traffic signs are difficult to be seen until it is too late[6]. These potential factors, therefore, bring following or more difficulties for road sign detection and recognition.

The color of the sign fades with time as a result of the long exposure to the sun light[7], and signs may be damaged (figure 1.9 a-b).

Obstacles, such as trees, poles, buildings, and even vehicles and pedestrians, may occlude or partially occlude road signs[8] (figure 1.9 c).

Video images of road signs often suffer from blurring in view that the camcorder is mounted on a moving vehicle[8] (figure 1.9 d).

Lighting conditions are changeable and not controllable. Lighting is different according to the time of the day, season, cloudiness and other weather conditions, etc[9] (figure 1.9 e-g).

Models of all the possibilities of the sign’s appearance are not possible to generate off-line, because there are so many degrees of freedom[9] (figure 1.9 h).

Binary Image Feature Model Classification Model

Zernike Moments Trained SVM

Output Matched Class

(19)

______________________________________________________________________________________________________________________________________________________________________

The presence of objects similar in color and/or shapes to the road signs in the scene under consideration, like buildings, or vehicles[7] (figure 1.9 i).

It’s wrong to recognize the road signs that do not belong to your way (figure 1.9 j).

(a) Faded sign (b) Damaged sign

(c) Partial occlusions (d) Blurry image

(e) Bad light (f) Shadows

(20)

______________________________________________________________________________________________________________________________________________________________________

(g) Bad weather (h) Shape deformation

(i) Object with similar color (j) Identification of wrong signs Figure 1.9 Examples of difficulties [10]

Many methods have been proposed to overcome some of these difficulties. Escalera et al., [11], deal with road signs recognition in the environments that lighting conditions cannot be controlled and predicted, objects can be partially occluded, and their position and orientation is not known a priori. A genetic algorithm is used for the detection step, allowing invariance localization to changes in position, scale, rotation, weather conditions, partial occlusion, and the presence of other objects of the same color. A neural network achieves the classification.

In [5, 12-16] Hasan Fleyeh proposed some other methods. For example, in [16] he used a new algorithm for traffic signs color detection and segmentation in poor light conditions. The RGB channels of the road images are enhanced separately by histogram equalization, and then to extract the true colors of the sign by a color constancy method. The resultant image is then converted into HSV color space, and segmented to extract the colors of the road signs.

In [15], Hasan Fleyeh developed a new color detection and segmentation algorithm for road signs in which the effect of shadows and highlights are neglected to get better

(21)

______________________________________________________________________________________________________________________________________________________________________

color segmentation results. The RGB images of road signs are converted into HSV color space and the shadow-highlight invariant method is applied to extract the colors of the road signs under shadow and highlight conditions.

(22)

______________________________________________________________________________________________________________________________________________________________________

Chapter Two

Support Vector Machines

(23)

______________________________________________________________________________________________________________________________________________________________________

2.1 Introduction

Support Vector Machines (SVM) is a kind of machine learning methods based on mathematical foundations of statistical learning theory, which was proposed first in 1992 by Vapnik. In the last few years there have been very significant developments in the theoretical understanding of SVM as well as algorithmic strategies for implementing them, and applications of the approach to practical problems[17].

2.1.1 Machine Learning

Machine learning is the technique of computer algorithms that allows computers to learn automatically from their experience[18]. These experiences include data observation, statistic, analysis and some other means, which results in that a system has capacities of self-improvement and therefore acquires knowledges in certain fields.

Learning is intelligent processes of acquiring knowledges. There are several parallels between animal learning and machine learning. In intelligent system, some techniques derive from the biological research, through computational models to make more precise theories of animal learning.

In general there are two kinds of learning, one is inductive learning and the other is deductive learning. Inductive machine learning methods extract rules and patterns from a massive of data sets[19]. These rules could be applied to new data sets, but there is no guarantee that the rules will be correct[20]. Normally, the correct rate is calculated as one of criteria to evaluate the efficiency of the rules. For example, given ten features of a human, an inductive learning system might infer that any animals with part features of a human are human. Deductive learning methods learn from a set of known facts and rules to product additional rules that are guaranteed to be true[20].

Given rules that “All men are mortal” and “Jones is a man”, a deductive learning system can deduct that “Jones is mortal”.

Machine learning usually perform tasks associated with Artificial Intelligence (AI).

Such tasks involve recognition, diagnosis, planning, robot control, prediction, etc[21].

As an important field of Artificial Intelligence, it has been increasingly successful in real-world applications such as data mining, medical diagnosis, search engines, speech and image recognition etc.

This paper presents a machine learning method to perform an image recognition task that is to predict which image sign respond to which road sign. This method called support vector machines is an inductive learning method, more precisely, a supervised learning method, which a group of data sample is inputted into the system for training

(24)

______________________________________________________________________________________________________________________________________________________________________

the machine and another group of data sets are tested to verify the system.

2.1.2 Supervised Learning

Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way[22].

Figure 2.1 shows a block diagram of supervised learning. The pairs of input/output describe the state of the environment. Assuming that the teacher and the learning system both draw a training vector from the environment, and the teacher provides the learning system with a desired response for that training vector. The parameters of the learning system are adjusted according to the error signal. The adjustment is carried out iteratively until the learning system can emulate the teacher and is able to deal with the environment without the teacher.

Figure 2.1 Block diagram of supervised learning

2.1.3 Learning and Generalization

Early machine learning algorithms aimed to learn representations of simple symbolic functions that could be understood and verified by experts[23], it is to find an accurate

(25)

______________________________________________________________________________________________________________________________________________________________________

fit to the training data. However, the overfitting objective function makes essentially uncorrelated predictions on unseen data.

Generalization is the ability to classify previously the data not in the training set. It allows the learning system to cope with examples that have not seen before and hence to be more flexible and useful. Figure 2.2 shows a binary classification problem to illustrate the difference between overfitting classifier and a better classifier. Filled squares and triangles are the training data while hollow squares and triangles are the test data. The test accuracy of the classifier in figures 2.2 (a) and (b) is not good since it overfits the training data. While a better classifier in figure 2.2 (c) and (d) used generalization theory gives a better test accuracy. More detail about generalization will be discussed in the section 2.5.

Figure 2.2 An overfitting classifier and a better classifier

(26)

______________________________________________________________________________________________________________________________________________________________________

2.1.4 Support Vector Machines for Learning

Support vector machines are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory (Cristianini and Shawe-Taylor, 2000). SVM can perform pattern recognition (also known as classification) tasks by building decision boundaries that optimally separates the data into categories and perform real valued function approximation (also known as regression) tasks by constructing a function that best interpolates a given data set.

Figure 2.3 An overview of the SVM process

An overview of the SVM classification process is shown in figure 2.3. A set of data with predictor variables called attributes is presented in an input space. Since these data cannot be separated by linear function in the input space, so these attributes of the data are transformed and presented into a feature space; in other words each data in the input space has a mapping in the feature space. The goal of the transformation is to implement the separation more easily. A set of features that describes one case is called a vector. In the feature space a hyperplane that divides clusters of vector could be found, so that the data with one category of the target variable are on one side of the plane and the data with another category are on the other side of the plane.

(27)

______________________________________________________________________________________________________________________________________________________________________

Figure 2.4 Two ways separate the data with two categories

Besides of separating the data into different categories, the objective of SVM is to find an optimal hyperplane that correctly classifies the data as much as possible and separates the data as far as possible. Figure 2.4 shows two ways that separate the data with two categories, one is represented by filled circle and the other is represented by hollow circle. The broken lines mark the boundaries that run parallel to the separating line and the closest vectors to the line. The distance between two lines is called margin, while the vectors marked with hollow square are the support vectors, they constrain the width of the margin. To define the optimal hyperplane the SVM analyze and find the hyperplane that is to maximize the margin.

Because of the nature of the feature space in which these boundaries are found, Support Vector Machines can exhibit a large degree of flexibility in handling classification tasks of varied complexities. Several types of SVM models including linear, polynomial, radial basis function, and sigmoid will be introduced in this paper.

SVM models works very similarly to classical neural networks. Actually, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed forward neural network. However, comparing with traditional neural network approaches, the generalization theory of SVM enables the models to avoid overfitting the data.

2.2 Linear Classification

Linear functions are the most understood and simplest methods to be applied into machine learning. This section introduces the basic thinking for solving linear classification problems. The goal of introducing linear classification here is to present the non-linear classification that will be introduced in the next section.

Linear classification is normally performed by using a linear function of its input vectors. This function can be written as

(28)

______________________________________________________________________________________________________________________________________________________________________

b x w b

f

n

i i

i +

= +

⋅

=

∑

=1

)

(x w x (2.1)

where x_i is the ith attribute value of an input vector x ,w_i is the weight value for the attributex_i and b is the bias. For a binary classification, the decision rule is given by sgn(f(x)): the input vector x=(x₁,L,x_n)^' is assigned into the positive class if f(x)≥0, otherwise into the negative class.

Figure 2.5 linear classification for two-dimensional input vectors

Figure 2.5 plots the interpretation of linear classification for two-dimensional input vectors. The input space is divided into two parts by a bold line called hyperplane.

The hyperplane is defined by the equation ( ) 0

1

= +

=

∑

=

b x w f

n

i i

x i ; the region above

the hyperplane belongs to positive class while the region below the hyperplane belongs to negative class. The weight vector w defines the slope of the hyperplane and b defines the bias between the hyperplane and the origin of the input space.

Therefore a (n−1)-dimensional hyperplane can separate a n-dimensional space and )

1

(n+ parameters are used for adjusting the hyperplane.

2.2.1 Perceptron

The first iterative algorithm of linear classification is proposed by Frank Rosenblatt in 1956. The algorithm is shown in the following table 2.1.

(29)

______________________________________________________________________________________________________________________________________________________________________

Table 2.1 The primal form of perceptron algorithm

Given a linearly separable training set S and learning rate η∈ R⁺ l is the number of training samples

←0

wo ; 0b_o = ; k ←0

i

R←max₁_≤i_≤_l x repeat

for i=1 to _l

if y_i( w_k⋅x_i +b_k)≤0 then

i i k

k w y x

w ₊₁ ← +η

2

1 b y R

b_k₊ ← _k +η _i +1

← k k

end if

end for

until no mistakes are made with in the for loop

return )(w_k,b_k where k is the number of mistakes

This algorithm starts with an initial weight vector w and the weight vector and bias ₀ are updated when a training point is misclassified by the current weights. This procedure will converge if a hyperplane exists to correctly classify the training data.

In this case the data are linearly separable, otherwise non-linearly separable.

2.2.2 Dual Representation

Dual representation is an important and a particularly useful form in machine learning.

Assuming that the initial weight vector is the zero vector, the final hypothesis will be a linear combination of the training points:

∑

=

= ^l

1 i

i i iy x

w α (2.2)

where the coefficient of x is given by the classification _i y , the _i α_i are positive values proportional to the number of times misclassification of x has caused the _i weight to be updated.

(30)

______________________________________________________________________________________________________________________________________________________________________

The decision function can be rewritten as follows:

b y

b f

j

j j

j ⋅ +

= +

⋅

=

∑

=

x x x

w x

l 1

)

( α (2.3)

Table 2.2 shows a dual form of the perceptron algorithm.

Table 2.2 The dual form of perceptron algorithm

Given a linearly separable training set S

←0

α ; b=0

i

R←max₁_≤i_≤_l x repeat

for i=1 to _l

if ( ) 0

1 ⋅ + ≤

∑

= y b yi j j j j i

l α x x then

+1

← _i

i α

α

R2

y b b← + _i

end if

end for

until no mistakes are made with in the for loop return )(α,b

An important property of the dual representation is that the data only appear through entries in the Gram matrix and never through their individual attributes[24].

2.3 Non-linear classification

The linear learning solves problems by linear functions, however a simple linear function defined by the given attributes cannot achieve a target task flexibly. There are two main limitations of linear learning:

First, the function that are tried to learn may not have a simple representation and may not be easily verified in this way.

Second, normally the training data are noisy and so there is no guarantee that there is an underlying function that correctly classify the training data.

Therefore, complex real world problems require more expressive hypothesis spaces than linear functions. This section discusses a method that constructs a non-linear

(31)

______________________________________________________________________________________________________________________________________________________________________

machine to classify the data more flexibly.

2.3.1 Learning in Feature Space

The complexity of the target function to be learned depends on the way it is represented, and the difficulty of the learning task can vary accordingly[25]. Kernel representations offer a solution by constructing a mapping from the input space to a high dimensional feature space to increase the power of linear learning for complex applications.

Figure 2.6 shows an example of a mapping from a two-dimensional input space to a two-dimensional feature space. In the input space the data cannot be separated by a linear function, however a feature mapping simplify the classification task since the data in the feature space is linearly separable.

Figure 2.6 A mapping from a two-dimensional input space to a two-dimensional feature space

The quantities introduced to describe the data are usually called features, while the original quantities are sometimes called attributes. The task of choosing the most suitable representation is known as feature selection. The space X is referred to as the input space, while F ={φ(x):x∈X} is called the feature space[25].

Through selecting an appropriate kernel function, a non-linear mapping is performed between input space and high dimensional feature space. That means each input vector in the input space matches a feature vector in the feature space. However, this mapping does not increase the number of the tunable parameters. This technique also overcomes the curse of dimensionality in both computation and generalization.

(32)

______________________________________________________________________________________________________________________________________________________________________

2.3.1 Implicit Mapping to Feature Space

The implicit mapping from input space into feature space makes the data to be expressed by a new representation, and a decision function that is nonlinear in the input space but is equivalent to be linear in the feature space is constructed, in such a way the linear learning machine can be used. The function (2.1) that was represented in linear learning section is modified as:

b w

b f

n

i i

i +

= +

⋅

=

∑

=1

) ( (

)

(x w φ x₎ φ x (2.4)

where φ(x) is the mapping function. Therefore, there is two steps to construct a non-linear machine: first, a fixed non-linear mapping transforms the data from input space into a feature space; second, in the feature space a linear machine is used to classify them.

In the dual representation, the decision function is as follows:

b y

f

i

i j

j ⋅ +

=

∑

= l 1

)

(x α φ(x ) φ(x) (2.5)

and the decision rule can be evaluated by the inner products between the training points and test points. In such a way, the dimension of feature space will not affect the computation.

2.4 Kernel

Kernel functions provide methods that compute the inner product φ(x_i)⋅φ(x) in feature space directly using the original input points.

Definition 1. A kernel is a function K , such that for all x, z'∈X )

' ( ) ( ) ' ,

(x z = φ x ⋅φ z

K ,

where φ is a mapping from X to an (inner product) feature space F [25].

A kernel constructs an implicit mapping from the input space into a feature space and a linear machine is trained in the feature space. Gram matrix called kernel matrix describes the information of training data in the feature space. The key of this

(33)

______________________________________________________________________________________________________________________________________________________________________

approach is to find a kernel function to be evaluated efficiently. The decision rule can be evaluated at most _l times by the following function:

b K

y f

i

i j

j +

=

∑

= l 1

)

(x α (x ,x) (2.6)

2.4.1 Kernel Matrix

The training data are entered the algorithm through the entries of the Gram matrix that is also called kernel matrix. The information of a kernel matrix expresses kernel methods learning, each entry represents a measure of similarity between two objects.

Equation (2.7) shows the form of Gram matrix.

) , ( )

, ( )

, (

) , ( )

, ( )

, (

) , ( )

, ( )

, (

1 1

n n j

n n

n i j

i i

n j

K K

K

K K

K

K K

K

x x x

x x

x

x x x

x x

x

x x x

x x

x

K

L L

M O

M

M O

M

L L

= (2.7)

The Gram matrix is the central structure of kernel methods; it contains all the necessary information for the learning algorithm. Even if the number of features is infinite, Gram matrix might still be small and hence the optimization of the problem is solvable.

2.4.2 Properties of Kernels

Kernel functions are used for avoiding the feature space in the computation of inner products. This section discusses the properties of kernels to define a kernel function for an input space.

Mercer’s Theorem

Mercer’s theorem provides the properties to determine if a function K( zx, ) is a kernel. Obviously, a kernel must be symmetric,

) , ( ) ( ) ( ) ( ) ( ) ,

(x z x z z x K z x

K = φ ⋅φ = φ ⋅φ =

(34)

______________________________________________________________________________________________________________________________________________________________________

Proposition[25]. Let X be a finite input space with K( zx, ) a symmetric function on X . Then K( zx, ) is a kernel function if and if the matrix

n j i j

K( i, ))_, ₁

( ₌

= x x

K

is positive semi-definite (has non-negative eigenvalues).

Theorem[25]. Let X be a compact substet of R . Suppose K is a continuous ⁿ symmetric function such that the integral operator T_K :L₂(X)→L₂(X),

x x x f d K

f

TK )() X (, ) ( )

( ^⋅ ⁼

∫

^⋅ ^,

is positive, that is

0 )

( ) ( ) ,

( ≥

∫

× K x z f x f z dxdz

X

X ,

for all f ∈L₂(X). Then K( zx, ) can be expanded in a uniformly convergent series (on X× ) in terms of X T_K’s eigen-functions φ_j∈L₂(X), normalized in such a way

that 1

2

L =

φj , and positive associated eigenvalues λ_j ≥0,

∑

^∞

=

1

) ( ) ( )

, (

j

j j

K x z λjφ x φ z .

2.4.3 Examples of Kernel

So far many kernels have been proposed by researchers. This paper will introduce some kernels including four basic kernels that are used frequently.

Linear

Linear kernel is the simplest linear model, z x z x, ) ,

( =

K .

(35)

______________________________________________________________________________________________________________________________________________________________________

Polynomial

Polynomial mapping is a popular method for non-linear modeling, K(x,z)= x,z d.

For avoiding problems with the hessian becoming zero, a more preferable expression is,

(

^r

)

^d

K(x,z)= γ x,z + . Where γ,r,d are kernel parameters and γ >0.

Gaussian Radial Basis Function

Radial basis function is one of kernels that are paid significant attention; the form of the Gaussian radial basis function (GRBF) is,

2 ) exp(

) ,

( ₂

2

σ z z x

x −

−

=

K .

There are three reasons that Gaussian radial basis function is normally a reasonable choice in most applications:

First, GRBF kernel non-linearly maps data into a higher dimensional feature space, so it can handle the case when the relation between target value and attributes is nonlinear.

Second, the results of a model depend on the value of the kernel parameters, in other words the number of kernel parameters influences the complexity of model.

So it’s better to select a kernel with the number of kernel parameters as small as possible, obviously the polynomial kernel has more kernel parameters than the GRBF kernel.

Third, the RBF kernel has less numerical difficulties. Comparing the GRBF kernel with the polynomial kernel, the value of K( zx, ) in GRBF kernel is in the interval of

[ ]

⁰^,¹ , while the value of K( zx, ) in polynomial kernel is in the interval

[

0,∞

)

.

Road and Traffic Signs Recognition using Support Vector Machines