On the Construction of an Automatic Traffic Sign Recognition System

(1)

Master Thesis in Statistics and Data Mining

On the Construction of an Automatic

Traffic Sign Recognition System

Fredrik Jonsson

Division of Statistics

Department of Computer and Information Science

Linköping University

(2)

Examiner

(3)

Abstract

This thesis proposes an automatic road sign recognition system, including all steps from the initial detection of road signs from a digital image to the final recognition step that determines the class of the sign.

We develop a Bayesian approach for image segmentation in the detection step using colour information in the HSV (Hue, Saturation and Value) colour space. The image segmentation uses a probability model which is constructed based on manually extracted data on colours of road signs collected from real images. We show how the colour data is fitted using mixture multivariate normal distributions, where for the case of parameter estimation Gibbs sampling is used. The fitted models are then used to find the (posterior) probability of a pixel colour to belong to a road sign using the Bayesian approach. Following the image segmentation, regions of interest (ROIs) are detected by using the Maximally Stable Extremal Region (MSER) algorithm, followed by classification of the ROIs using a cascade of classifiers.

Synthetic images are used in training of the classifiers, by applying various random distortions to a set of template images constituting most road signs in Sweden, and we demonstrate that the construction of such synthetic images provides satis-factory recognition rates. We focus on a large set of the signs on the Swedish road network, including almost 200 road signs. We use classification models such as the Support Vector Machine (SVM), and Random Forest (RF), where for features we use Histogram of Oriented Gradients (HOG).

(8)

(9)

Acknowledgments

In no particular order, I would like to begin to thank my supervisor Mattias Villani at Linköping University for his useful guidance, and advice throughout this thesis process, and many suggestions and improvements on earlier drafts of this thesis. Many sincere thanks to the people at VTI, for giving me the opportunity and confidence to do this project. I would like to express my very great appreciation to my supervisor Olle Eriksson at VTI, for his expertise and assistance. I gratefully thank Leif Sjögren for his comments and improvements on an early draft of this thesis, and without whom this project would not be possible. I am also grateful to Peter Andrén for providing digital copies of the road images collected by the road surface tester vehicle.

Thanks are also owed to my opponent Noelia Borau for providing valuable sug-gestions and improvements. Special thanks are also due to Anders Nordgaard for giving his time to provide valuable comments and improvements to help improve my thesis.

Moreover, I am grateful to the Division of Statistics and Machine Learning (STIMA) within the Department of Computer and Information Science (IDA) for providing facilities and equipment that was necessary to complete this thesis.

I am especially grateful to Trafikverket for granting me permission to reproduce their photographs.

(10)

(11)

List of Tables

2.1. The superclasses of traffic signs in our study, which is the

combina-tion of shape and colour. . . 12

3.1. MSER parameters. . . 25

3.2. Image (geometric) properties used to filter MSER regions. . . 26

4.1. Frequency distribution of the collected colours, for original, after removing duplicates, and after removal of achromatic colours. . . . 39

4.2. Summary of the real image data of road signs for each superclass, the observed frequencies together with the minimum and maximum number of observations from each observed class. . . 62

4.3. Distribution of the classes for the real images of road signs. . . 63

4.4. Binary classifier on the two classes signs and non-sign objects (noise). Generalization error is calculated as 5-fold CV error. . . 64

4.5. Shape classification. Generalization error is obtained by 5-fold CV. 64 4.6. Parameters used for the HOG descriptors. . . 65

4.7. Recognition rates on validation data sets, V1 and V2 for each super-class. Abbreviation and # of classes: RC (red circles k = 43); RT (red triangles, k = 74); BC (blue circles, k = 33); BS (blue squares, k = 38); YW (yellow diamonds, k = 2), and the training time to fit a model. . . 67

4.8. Recognition rates on real data vs. synthetic data. . . 69

4.9. Accuracy of the system (detection). . . 71

4.10. Accuracy of the system (shape). . . 72

4.11. Accuracy of the system (recognition). True positives (TP), false positives (FP), false negatives (FN), and recognition performance: number of correct classifications (Recognized) and recognition rate of the class of sign. . . 72

(12)

(13)

List of Figures

1.1. Situations where road sign recognition is difficult. (Trafikverket) . . 3

1.2. Adverse effects such as occlusions and rotational distortions. (Trafikver-ket) . . . 4

2.1. Example of a typical image sequence. (Photos courtesy of VTI) . . 11

2.2. Images captured on roads inside towns. (Trafikverket) . . . 11

2.3. Examples of superclasses in our study. . . 12

2.4. Chevron sign. . . 14

3.1. Schematic overview of the various steps involved in the proposed Traffic Sign Recognition system: online system in blue, the of-fline part of the system includes the mixture model (green) and the trained classifiers (red). . . 16

3.2. Example of images returned by the probability model. . . 24

3.3. Sign oriented ‘away’ from the road shoulder to avoid glaring. . . 27

3.4. The various (counter-clockwise) rotations possible in each dimension. 28 3.5. Picturing the concepts cells, blocks and overlapping blocks. . . 35

3.6. Input image (gray-scale); example of a block and its corresponding cells (subimages). . . 36

4.1. Scatter plots for original colours. . . 40

4.2. Histograms for the individual components (all colours). . . 41

4.3. Density (estimated) plots for each component. . . 41

4.4. Histograms for Blue colour: Hue (left), Saturation (middle), Value (right). . . 43

4.5. Histograms for Red colour: Hue (left), Saturation (middle), Value (right). . . 43

4.6. Histograms for Yellow (yellow and diamond-yellow combined) colour: Hue (left), Saturation (middle), Value (right). . . 43

4.7. Scatter plots with BG (grey). . . 44

4.8. Histograms with BG colours superimposed. . . 45

4.9. Smoothed histograms with BG colours superimposed. . . 45

4.10. Colour distribution in 3D (unit cube). . . 46

(14)

4.14. Density curves for BG Blue (5 components). . . 49

4.15. Density curves for BG Red (5 components). . . 49

4.16. Density curves for BG Yellow (5 components). . . 49

4.17. Density curves for Blue (10 components). . . 50

4.18. Density curves for Red (10 components). . . 50

4.19. Density curves for Yellow (10 components). . . 50

4.20. Density curves for BG Blue (10 components). . . 51

4.21. Density curves for BG Red (10 components). . . 51

4.22. Blue colour: Fitted Beta mixture models. . . 52

4.23. Red colour: Fitted Beta mixture models. . . 52

4.24. Yellow colour: Fitted Beta mixture models. . . 52

4.25. Gray-scale images as obtained by the probability model. . . 53

4.30. Blue posterior probability with background versus no background classes modelled. . . 56

4.31. Yellow posterior probability with background versus no background classes modelled. . . 56

4.35. Found candidates (ROIs) for the Blue colour using the probability model followed by the MSER region detector. . . 58

4.36. Found candidates (ROIs) for the Red and Yellow colour using the probability model followed by the MSER region detector. . . 58

4.37. Instance where the probability model does not work due to the white colour not being modelled (zebra-crossing sign is split). . . 59

4.38. Example images of road signs for the real images and the synthetic images grouped by each superclass. . . 60

4.39. The differences by pixelating the image thereby degrading the image quality. . . 61

4.40. The differences by pixelating the image thereby degrading the image quality. . . 61

(15)

4.41. Conversion to gray-scale image. . . 65

4.42. Test classification error given the number of trees (RF model). . . . 66

4.43. Pair of classes confusing (∼ 5% misclassified) for synthetic data. . . 68

4.44. Pair of classes confusing (∼ 20% misclassified) for synthetic data. . 68

4.45. Pair of classes slightly confusing for recognizer for the real data. . . 68

4.46. Examples of misclassified signs (real images). . . 69

4.47. Correct classifications (real images). . . 70

5.1. Misclassified signs (low contrast). . . 82

5.2. Signs that are difficult to identify (correctly classified). . . 83

B.1. Blue colour: MCMC samples for µj and π . . . . 94

B.2. Blue colour: MCMC samples for Σj . . . 95

B.3. Red colour: MCMC samples for µj and π . . . 96

B.4. Red colour: MCMC samples for Σj . . . 97

B.5. Yellow colour: MCMC samples for µj and π . . . . 98

B.6. Yellow colour: MCMC samples for Σj . . . 99

B.7. BG Blue colour: MCMC samples for µj and π . . . 100

B.8. BG Blue colour: MCMC samples for Σj . . . 101

B.9. BG Red colour: MCMC samples for µj and π . . . 102

B.10.BG Red colour: MCMC samples for Σj . . . 103

B.11.BG Yellow colour: MCMC samples for µj and π . . . 104

B.12.BG Yellow colour: MCMC samples for Σj . . . 105

E.1. Blue circles. . . 117

E.2. Blue squares. . . 118

E.3. Red circles. . . . 119

E.4. Red triangles. . . 120

E.5. Yellow diamonds. . . 121

G.1. Traffic sign recognition system: Correct recognitions (1). . . 125

G.2. Traffic sign recognition system: Correct recognitions (2). . . 126

G.3. Correct recognition but where some of the bounding boxes are not completely correct to the ground truth: (a)-(b) numeral 0 classified as a sign, (c)-(d) bounding box does not enclose the sign (incorrect segmentation) – the best segment gives correct classification. . . 127

G.4. Missed detections: (a) zebra-crossing sign, (b) priority sign, (c)-(e) difficult conditions, (f) missed detection of blue sign, due to blue-painted vehicle in background. . . 128

G.5. Recognition – occluded signs. . . 129

G.6. Shape misclassified. . . 129

(16)

(17)

Abbreviations

CV Cross-Validation. 63

EM Expectation-Maximization. 20

H H (Hue) component of the HSV colour space. 18

HSV Hue, Saturation, Value. 18

LAB (L∗a∗b∗) Colour space that includes all perceivable colours

to the human eye, where L∗ corresponds to the

light-ness, and the colour channels are given by the two

com-ponents a∗ and b∗. 18, 19

LIDAR Light Detection and Ranging. 5

MCMC Markov Chain Monte Carlo. 21

MSER Maximally Stable Extremal Regions. 5, 24

PMS Pavement Management System. 9

RF Random Forest. 34

RGB Red-Green-Blue. 18

ROIs Regions of Interest. 7

RST Road Surface Tester. 1, 9

S S (Saturation) component of the HSV colour space. 18

SVM Support Vector Machine. 34

TSR Traffic Sign Recognition. 1

V V (Value) component of the HSV colour space. 18

VTI Swedish National Road and Transport Research

Insti-tute (Swedish: Statens väg- och transportforskningsin-stitut). 2, 9

(18)

(19)

1. Introduction

Traffic sign recognition (TSR) has been studied since the beginning of the 1980s – it was first studied in Japan in 1984. Lately, this research has grown immensely, due to advances in intelligent systems, e.g., Intelligent Transport Systems (ITS), and recent applications such as Mobile Mapping Systems (see e.g. Ishikawa et al. (2007)) for highway asset management.

Intelligent vehicles need to be equipped with various sensors to identify objects in the surrounding for better driving safety. These include most traffic devices (road markings, traffic lights, and road signs), as well as recognizing objects such as vehicles and pedestrians. Road signs inform the driver about traffic conditions and prepare the driver for possible risk sections and risk situations on the road ahead.

Most focus in the TSR work is on how to implement recognition systems that can work in real-time, as in self-driving cars, or in advanced driver assistance systems, requiring methods that scale well with low complexity. In this thesis the aim is to develop an automatic traffic sign recognition system for the purpose of doing an automatic inventory of road signs, in which case complexity is not of great concern. Maintenance and inventory of road signs on the state road network in Sweden is the responsibility of the Swedish Transport Administration. Unlike municipal roads for which the municipality is responsible, there is currently no routine inventory of traffic signs on state roads in Sweden. One reason is the sheer size of the road network. The only information that is maintained, is limited to speed limit signs (to some extent).

The objective of this work is to detect and recognize traffic signs based on the digital images that are gathered by the Laser Road Surface Tester (RST) Research Vehicle (Arnberg et al., 1991). These images are routinely collected from the state roads every year and stored in the Swedish Transport Administration’s Pavement Management System, PMSv3. Further work would include retrieving the GPS position of the traffic signs, by combining the GPS data and the location of the sign on the image, to find an approximation of the sign’s position.

(20)

One motivation for identifying and later locating the GPS position of a sign is in building realistic driving simulators. At the Swedish National Road and Transport Research Institute (VTI) (Swedish: Statens väg- och transportforskningsinstitut) an important research field is the development of driving simulators, which allows the studying of various types of experiments under controlled conditions. Build-ing realistic road models is crucial for this task. Some of the virtual road models are generated in an automatic manner based on gathered data on existing roads. Currently, the task of building virtual roads on existing roads that includes infor-mation about road signs and their position has not been automated. Therefore an automatic system that could identify and locate signs on existing roads would help alleviate the problems of manually inserting this type of information.

1.1. Background

Data sets for training and validation of traffic sign recognition systems are scarcely limited, due to the possible commercial applications (Paclık and Novovicova, 2000), e.g., in developing driver support systems. Examples of public data sets include the German Traffic Sign Recognition Benchmark (GTSRB) data set, see Stallkamp et al. (2011).

For traffic sign recognition one considers both traffic sign detection and traffic sign classification (Mathias et al., 2013). Traffic sign detection is known to be more difficult in comparison to traffic sign classification (Farhat et al., 2016).

1.1.1. Challenges

The detection and recognition of road signs using digital images present many challenges due to the complexity of natural scenes. Images are captured in out-door environments under conditions that we cannot control – the background and foreground of a road sign may change drastically when moving from one point to another.

Signs can be occluded, that is, parts of the sign are partially hidden due to geom-etry in 3D. Perspective distortions are also common, due to the viewing geomgeom-etry between the car (mounted camera) and the road sign. Moreover, the perceived colours of road signs can change due to external factors such as weather condi-tions, and lighting conditions. There are many types of colour distorcondi-tions, for example the colour of a sign captured with a colour camera next to a tree will appear closer to green (Yuille et al., 1998).

(21)

1.1 Background

In Figure 1.1 we show examples of conditions, in which identification of road signs can be difficult.

(a) Poor lighting and shadows.

(b) Poor lighting and shadows.

(c) At dusk. (d) Blur.

(e) Motion blur. (f) Motion blur. (g) Degraded quality

(faded colour).

(h) Damaged.

Figure 1.1.: Situations where road sign recognition is difficult. (Trafikverket)

More difficult conditions are presented in Figure 1.2, including adverse effects such as strong occlusions and perspective projection distortions, a form of distortion which is quite common as road signs are not always parallel to the image plane.

(22)

(a) Partial occlusion. (b) Partial occlusion (road signs over-lapping).

(c) Strong occlusion. (d) Strong occlusion.

(e) Heavy perspec-tive distortion.

(f) Rotated.

Figure 1.2.: Adverse effects such as occlusions and rotational distortions.

(Trafikverket)

1.1.2. Traffic sign detection

In detecting road signs, one needs to separate the road signs from the environment (background). Common methods include segmentation, where the road signs are detected using colour information using some carefully chosen colour threshold. Other methods include edge detectors to locate the signs. Edge detectors identify regions where the intensity value of pixels changes abruptly.

Image segmentation using colour information can be problematic in many ways. It is very difficult and a common problem to find the correct physical colour of an object based on an image. Colour distortions can be present due to many factors, e.g., variations in illumination, weather conditions, and the image capturing device used to reproduce the colour. Using colour information can e.g. be useful for detection of red road signs, as the red colour is rarely seen in outdoor environments. Colour information is, however, not as precise in comparison to edge information. Edges are more precise than colour information, at the expensive of being more noisy than colours (Smadja et al., 2010). Additionally, edge extraction is more

(23)

1.1 Background

difficult to work with since the number of edges on an outdoor image can be in abundance.

Mathias et al. (2013) consider road signs that are mandatory (M), danger (D), prohibitory (P), using data from the German Traffic Sign Detection Benchmark (GTSRB). For detection they use an open source release detector that exhaustively looks for traffic signs with a specific ratio using sliding window searching over a grid of specific scales and aspect ratios. It achieves very high detection rate, at the expense of computational speed.

In Timofte et al. (2009) colour information and shape is used in extraction of candidate regions. In Greenhalgh and Mirmehdi (2012) the Maximally Stable Ex-tremal Regions (MSER) algorithm is used for the detection of candidate regions. De La Escalera et al. (1997) use colour information for image segmentation, fol-lowed by a shape analysis to detect the signs. In Zaklouta and Stanciulescu (2011) colour enhancement formulas are used for image segmentation. In Supreeth and Patil (2016) detection is done by colour segmentation where the resulting can-didate regions are then separated from non-traffic signs using computed shape constraints such as the circularity. For detection of signs, object detectors such as the Viola-Jones algorithm (Viola and Jones, 2001) can also be used and is well suited for real-time applications.

Related research in the detection step, is the development of Mobile mapping systems that use gathered 3D data to map the surrounding environment. Mobile mapping systems can be used in building precise, high-resolution 3D digital maps of the road network that e.g., include road features such as curbstones, guard rails and road signs (Ishikawa et al., 2007). The 3D maps can be used in road safety assessment to improve driving safety. Car navigation systems in self-driving vehicles also benefit from this, as this can enhance the driving safety by providing detailed information about road features. In Japan, the government has taken the effort of mapping the roads in 3D, as a project to have self-driving vehicles on the road by 2020, in time for Tokyo (Summer) Olympics.

In mapping the 3D environment, the method of Light Detection and Ranging (LIDAR) is typically used, which uses emitted laser light to measure the dis-tance to objects. LIDAR-based Mobile mapping systems use mounted profiling scanners that send beams of (rotating) laser light at high frequency to produce three-dimensional points (point clouds) of the surrounding environment (data in the form of points in the 3D space). The point clouds are then used to build 3D models of the road, and depending on the precision required, the number of data points collected each second can be in millions. The 3D model provides high-resolution data that can be used in detecting candidate objects that are likely to be road signs, by looking for point clouds that have pole-like structure and using

(24)

the geometry property that most signs are vertically standing. In Kukko et al. (2009), they show how data gathered by mobile mapping systems and digital im-ages can be used in classifying road features such as road markings, curbstones, and pole-like objects (road signs).

1.1.3. Traffic sign classification

The framework of classification of road signs depends heavily on the type of sign to be recognized. Most papers are only involved with recognition of one type of signs. To handle multiple types of road signs, a common methodology is to split it into two stages, where the sign is first recognized by its shape, followed by an appropriate recognizer to determine the exact sign. Consequently, road signs are split into superclasses, which is determined by the shape (and colour). It is natural, since many of the classification algorithms can suffer from the problem of a high number of classes and unbalanced classes etc.

In Timofte et al. (2009) they use AdaBoost classifiers together with SVM classifiers, using Histogram of Oriented Gradients (HOG) as features. HOG features are also used in Mathias et al. (2013) for feature extraction, where they use various classifiers, such as Support Vector Machine (SVM) (Cortes and Vapnik, 1995), and Iterative Nearest Neighbours Classifier (INNC) with INNC having the highest

accuracy at 98%. Moreover, they also study different types of dimensionality

reduction techniques, e.g. Linear Discriminant Analysis (LDA) (Martínez and Kak, 2001).

Classification of speed limit traffic signs in Sweden can be found in Fleyeh and Roch (2013), where they use HOG descriptors for features and for training they use a Gentle AdaBoost Classifier that is known to handle noisy data. In Greenhalgh and Mirmehdi (2012) training data for road signs in the UK are synthetically generated using template images retrieved from an online database, and where classification is done using SVM classifiers for each combination of shape and colour, using HOG descriptors as features. Classification of road signs using neural networks is studied e.g., in De La Escalera et al. (1997), and Supreeth and Patil (2016). A comparison study of commonly used classifiers for road signs is conducted in Jo et al. (2014) for the following classification models: artificial neural network (ANN), k-nearest neighbours (k-NN), SVM, and RF, and they find the k-NN and RF classifiers having the highest performance. In Kurnianggoro et al. (2014) they propose a variation of the k-NN classifier referred to as the k-nearest cluster neighbour (k-NCN) classifier with the motivation that it scales better.

State-of-the-art (competing) classifiers use recognizers based on convolutional neu-ral network (CNN), with recognition rates of above 99% (Zhou and Deng, 2014).

(25)

1.2 Objective

Mathias et al. (2013) argue that classification of traffic signs is a type of supervised classification with rigid objects closely related to face and digit classification.

1.2. Objective

The objective is to implement a TSR system that incorporates both traffic sign detection and recognition:

1. Detection: We detect road signs by means of image analysis and statistical modelling, which requires the process of removing the background, and lo-cating the sign(s). This step involves finding the Regions of Interest (ROIs), that is the cropped images containing the road signs.

2. Recognition: We recognize the extracted ROIs found in the detection step, and classify them into their right class, using a cascade of classifiers.

We will focus on a large set of traffic signs used in Sweden, but not all types of signs will be considered. Specifically, the scope of traffic signs to be included is the following:

• Traffic signs with pictograms (this includes signs with numbers as in speed limit signs, and some signs such as the Priority sign B4-1 having no pic-togram) – see Appendix E for a complete list of road signs used in our study. • Traffic signs whose aspect ratio is roughly 1.0.

Traffic signs with text will not be considered (part of the reason is that Optical Character Recognition (OCR) needs to be implemented), however, the main reason is that text is sometimes not legible due to low image quality (highly compressed images).

We begin by giving a brief description of the source of the images in Section 2.1 and the raw images in Section 2.2, and in addition the collected data of template images, colours, and (real) images of road signs as described in Section 2.4, Section 2.5, Section 2.6, respectively. The methods are described in Chapter 3, which can be divided into the four sections: modelling the colours in Section 3.4; detection step in Section 3.5; generation of synthetic images of road signs in Section 3.6; and clas-sification in Section 3.7. Results are presented in Chapter 4, followed by discussion in Chapter 5, and conclusions are given in Chapter 6.

(26)

(27)

2. Data

2.1. Data sources

At the Swedish National Road and Transport Research Institute (VTI) (Swedish: Statens väg- och transportforskningsinstitut), research is done in the field of the transport sector. An important subject is the maintenance of roads, and in eval-uating the condition of roads based on gathered data describing the road surface characteristics. This type of information is important in the study of how the traffic load and climate affect different design, maintenance, and construction standards of the road. Eventually, the roads will start to deteriorate, depending on the con-struction standard of the road, the traffic volume and environmental causes such as weather, and the local environment. By providing information that describes the surface condition of the road pavements, one can make cost-effective decisions, as to when is the most optimal time to do maintenance or reconstruction to prevent and minimize pavement deterioration. This process of planning the maintenance such as to optimize the conditions of the road pavements over time using road sur-face data is referred to as pavement management system (PMS). The PMS used in Sweden is PMSv3 and provides data about the road surface condition on public roads in Sweden.

The collection of road surface characteristics has to be done in a system that is automated and uses measurement methods that are both as objective and re-producible as possible. The Laser Road Surface Tester (RST) has been specially developed by VTI for this task (see Arnberg et al. (1991)), and can be summarized as a high precision measurement system that collects various data variables related to road surface characteristics, such as road texture, road geometry, unevenness, crossfall, and cross profile of the road. Images of the road are also provided to vi-sually inspect any occurrence of extreme values etc. The system achieves real-time data processing, handling traffic speeds of up to 90 km per hour (Arnberg et al., 1991). The road surface data is collected at most public roads in Sweden every year. The service to collect data is currently outsourced and procured every 4th year. A comprehensive comparison test is done to approve companies to take part in the procurement. Some roads are selected to be measured twice a year, as a control measurement that the repeated measurements do not deviate as a

(28)

reliability check. For minor roads that are part of the main road network, and maintained by the national government, the measurements are done in one di-rection of the road only, and not as often as every year. Roads with high traffic volumes and importance, such as the E4 road, are measured several times a year in both directions.

The Laser RST vehicle is equipped with a front camera that provides digital images of the road which are captured at 20m intervals, which can be used to give visual information about the road surface, as well as the identification of road signs which will be the focus of this thesis. Only the image data will be considered in this thesis.

2.2. Raw data

An example of a typical image sequence is given in Figure 2.1, where the con-secutive images are delivered every 20 meters. The sizes of the images may vary throughout the years (due to the specifications in the procurement process on the image quality to be met by the company) but generally range from size 1300 × 975 to 1600 × 1200, with 24 colour bits. The collection of data is done in late spring, to late autumn, during morning to the late afternoon. The images are delivered in all types of weather: sunny, cloudy, rainy, foggy etc., except for snowfall (since collection is not done in winter). The images that are used in this thesis are the complete set of captured images in the county of Östergötland from measurements taken during the years 2010–2015 (Trafikverket, 2016). In total about 500,000 images have been collected. As explained earlier, the images are taken in major roads, minor roads, etc. Private roads, and forest roads are not considered. Occasionally, images are also captured for roads located in smaller towns, see Figure 2.2, which are technically part of the state-maintained roads.

(29)

2.3 Traffic sign characteristics

(a) 000060. (b) 000080. (c) 000100.

(d) 000120. (e) 000140. (f) 000160.

Figure 2.1.: Example of a typical image sequence. (Photos courtesy of VTI)

2.3. Traffic sign characteristics

Figure 2.2.: Images

cap-tured on roads inside towns. (Trafikverket) Road signs in most countries of Europe are based on

the Vienna Convention on Road Signs and Signals, and have adopted to follow this standardization. There are some minor country-specific differences, however. Unlike most other European countries, coun-tries such as Sweden (also Finland) have specified the background colour of warning and prohibitive signs to be in yellow instead of the standard white colour. This is to increase the visibility, or contrast to the back-ground, during months of the year where heavy snow loads are typical. The majority of the road signs can be divided into specific superclasses, where the super-class of a sign, is determined by the combination of

colour and shape. In Table 2.1 we give a summary of the superclasses of traffic signs used in our study, see Figure 2.3 for examples of road signs for each superclass. The octagonal sign (STOP sign) is considered to be a circle. The superclasses

(30)

used in our study are limited to the following five superclasses: red circles, red triangles, blue circles, blue squares, and yellow diamonds.

Table 2.1.: The superclasses of traffic signs in our study, which is the combination

of shape and colour.

Colour Shape Message type # classes

Red Circle Prohibition / Restriction 43

Red Triangle (vertex up) Warning / Danger 73

Red Triangle (vertex down) Yield 1

Blue Circle Obligation / Mandatory 33

Blue Square – 38

Yellow Diamond Priority 2

Figure 2.3.: Examples of superclasses in our study.

Following are some minor issues related to the given set of signs in our study. Some signs have pictograms together with variable numbers, which can have different appearances even though they belong to the same class. These are usually road signs that describe restrictions on the width, height, or the maximum allowed gross weight. It is difficult to find all instances of signs with variable numerals. One way to address this is to generate all possible instances. Enumerating the possible (accepted) range of numbers (currently Tratex is the typeface used for traffic signs in Sweden) for these classes has not been done in our study, we only focus on the pictogram that is.

Signs that indicate the end of a restriction are challenging to detect due to having a white background and no border, see e.g. Caraffi et al. (2008) for a study of derestriction signs. These signs, will not be studied, part of the reason is that they are rare.

Since road signs must be visible and legible to a driver of a moving vehicle, the size of the sign needs to be set appropriately. Road signs can come in varying sizes, depending on the road characteristics such as the prevailing speed, visibility along

(31)

2.4 Data – Image templates of road signs in use

the road, geometry of road etc. The aspect ratios are, however, all the same for the varying sizes (Sektion Utformning av vägar och gator, 2004). Types of road signs:

• Red triangles include A. Danger warning signs (Varningsmärken) and the yield sign.

Shape: Equilateral triangle. Background colour is yellow, border is red. Pictogram in black.

• Red circles include Prohibitory or restrictive signs (Förbudsmärken).

Prohibitory or restrictive signs inform the driver what restrictions are on the road as well as prohibitive actions such as the restriction of driving of specific vehicles, or specific driving maneuvers. It regulates what the driver must not do.

Background is yellow, border red. Pictogram in black. • Blue circles include D. Mandatory signs (Påbudsmärken).

Mandatory signs inform the driver what obligations that must be followed. It regulates what the driver must do, such as what traffic rules that must be enforced by the driver, on a road section, or a specific place on the road. The superclass Blue circles also includes some prohibitory signs.

Background is blue, with white pictograms. Some signs have red diagonals as well.

• Blue squares include Information, direction and advisory signs E. (Anvis-ningsmärken), B. Priority signs (Väjningspliktsmärken) and T. Additional panels (Tilläggstavlor).

Background is in blue, with some exceptions – a few signs have a combination of blue and white as background such as the zebra-crossing sign. Pictogram in white or black (less common). Some signs have red diagonals as well.

2.4. Data – Image templates of road signs in use

For building a classifier, we need (image) data on the types of road signs on the Swedish road network. The Swedish Transport Agency (Swedish: Transport-styrelsen), is responsible for the regulations of road signs in Sweden. Data on road signs in Sweden in the form of computer graphics are provided by the Swedish Transport Agency at Transportstyrelsen (a). Moreover, they also provide the spe-cific colours used on road signs in Sweden at Transportstyrelsen (b). Data that has been retrieved are images in the PNG format, and vector-graphics in the EPS format. For a list of road signs used in our study see Appendix E. Each sign is identified by an abbreviation, prefixed with the letter that is used to identify the

(32)

type (category) of the sign, followed by the numbers X-x, where the first number identifies the sign, and the second number specifies possibly alternative versions. For e.g., the road sign (category A) warning of strong crosswinds ahead exists in two versions, one from the right direction A24-1, and one from the left direction A24-2.

Figure 2.4.: Chevron

sign. In this study we will consider signs with an aspect ratio

of roughly 1.0. However, chevron signs (signs that mark for obstacles along the road, or warning of sharp bends, see Figure 2.4), are not included.

2.5. Data – Colour distribution

In the detection step, we use a colour segmentation approach, based on collected data on colours of road signs coming from real images. As means of building a probability model we have collected an array of colours using the images captured by the Laser RST vehicle, for the colours red, yellow, and blue. Additionally, we have collected colours which are potentially confusing for a probabilistic model, such as yellow-like colours: grass, vegetation; red-like colours: red-painted houses, brick walls; and blue-like colours: sky.

2.6. Data – Real images of road signs

For assessing the quality of the synthetic images that is used for the purpose of training a classifier, we have collected images of road signs taken from real images. The collection of this data has been manually done (in Wolfram Mathematica), by cropping images containing road signs, and extracting the regions of interest (ROIs). The bounding boxes are (roughly) the smallest rectangle that fully en-compasses the road sign. The task of specifying an appropriate rectangle that encompasses the sign is (extremely) time-consuming. The (real) images of road signs (n = 537) that have been collected vary greatly in image size, ranging from 15 × 15 to 291 × 291 pixels. Likewise, the quality of the images ranges from de-graded to high-quality. This collected data, however, only contains a small subset of all existing road signs, since many types of road signs are scarcely found. Manual collection of negative images has also been done by extracting images of non-road sign objects (noise). This set of negative images has been collected in order to build a classifier that can separate road signs from non-road sign objects. The negative images (n = 840) include types of noise such as cars, grass, houses, roofs, walls, sky, etc. that have colours of red, yellow, or blue.

(33)

3. Methods

Recognition of traffic signs can be divided into two steps: (1) detection of can-didate signs, and (2) classification of the cancan-didate signs. The workflow of the procedure proposed in this thesis can be summarized as follows (see Figure 3.1 for a schematic overview of the proposed Traffic Sign Recognition system):

1. Pre-process the original image.

2. Segment the image using the colour information. Colour segmentation is done using a statistical approach based on data on collected colours. The distribution of each colour is fitted to a mixture model, which can then by using a Bayesian approach detect pixels that are likely to come from a road sign for the colours red, yellow, and blue, respectively.

3. Detect regions of interest (ROIs) on the image that is produced by the colour segmentation in the previous step, by using Maximally Stable Extremal Re-gions (MSER).

4. Resize, and normalize the ROIs to a standard size. The found candidates are then fed to a cascade of classifiers.

5. Confirm that the candidate region is a traffic sign: A binary classifier trained on the classes traffic signs and non-sign traffic sign objects (noise) is used to filter out false positives.

6. Once a candidate region has been confirmed to be a traffic sign, the shape of the sign is determined, by a trained shape classifier.

7. Finally, the appropriately trained sign classifier is used to classify the given road sign. In total five sign classifiers are trained for each superclass. The classifiers are trained on synthetic images that have been generated based on template images by adding noise and applying various distortions to better reflect the ‘quality’ of images of road signs captured in real environments. Given the superclass of a found sign, the sign is then fed to the appropriate classifier to determine the exact type of sign.

(34)

Input image stored in RGB. Colour segmen-tation using a Bayesian ap-proach which outputs posterior probabilities. Find Regions of Interests (candidates) using a MSER detector. Filter candidates based on image properties. Mixture models fitted to data of pixel colours using Gibbs sampling. Find the traffic signs (non-traffic sign objects are discarded). Binary sign classifier. {Traffic sign, Noise} OFFLINE OFFLINE

Find the traffic sign shape. Traffic sign shape classifier. {Circle, Square, Triangle, Diamond} Find superclass of traffic sign – combination of colour and shape.

Classify the class (identity)

of traffic sign. Set of traffic sign

classifiers for each superclass: {(red, circle), (blue, circle), . . .}, etc. OFFLINE

Figure 3.1.: Schematic overview of the various steps involved in the proposed

Traffic Sign Recognition system: online system in blue, the offline part of the system includes the mixture model (green) and the trained classifiers (red).

3.1. Pre-processing

Digital images produced by image capturing devices are susceptible to artifacts and noise due to both external and internal factors. Removing noise from an image can be restated as trying to recover the original signal, or making the signal-to-noise ratio as high as possible. This is normally done before segmentation as a pre-processing step. Typical pre-processing methods include image enhancements to suppress or remove noise; however, this step is optional. Methods that are commonly used are low-pass filters, such as the Median filter or linear smoothing filters. A median filter of relatively small size [2, 2] is used to preserve edges.

(35)

3.2 Segmentation using colours

3.2. Segmentation using colours

In this section we describe how to use the colour information to segment (‘cluster’) an image into regions that are likely to come from road signs. To alert drivers of potential dangers ahead, road signs are painted with specific (high-contrasting) colours to make them distinguishable from the background. The first step in the detection of signs is to exploit this information. However, colour is a complicated property and the distribution of ‘perceived’ colours is unlikely to have a uniform distribution due to colours being affected by factors such as shadows, lighting sources, and the colour model used to reproduce the colour. One way to combat this is to build a mixture model, as any type of (complicated) data can be approx-imated by a mixture model provided the densities are chosen with care and with enough components. In theory, any density can be approximated by a mixture model using the multivariate normal density, provided there are sufficient compo-nents (Gelman et al., 2013). In modelling the colours, we will fit a mixture model for each colour separately, using a collection of colours retrieved from real images depicting road signs. By using mixture models for the colours within a class we can use Bayes’ theorem to compute the (posterior) probability that a pixel is part of a road sign. We do this for the colours red, yellow, and blue, respectively (these are the most common colours of road signs for the background and the border).

3.3. Data collection of colours

Data on the distribution of colours found on images of road signs, has been col-lected, for the following colours: red, yellow, and blue. The collection of colour pixel information is extremely tedious, and in reducing the risk of error during this collection some additional steps and macros had to be used. We use Wolfram Mathematica for the purpose of extracting the pixel values. The collected pixel values are then converted to an appropriate file format using some regular expres-sions. The pixel values are collected for each visible sign and normally up to 5 values are extracted from each sign. For larger traffic signs up to 10 pixel values are retrieved.

3.4. Modelling colours using mixture models

In segmenting the image based on the colour, we partition the pixels into clusters that can be part of a road sign and those that can be considered to be part of a

(36)

background. An important consideration that has to be undertaken is which colour space to choose – The Red-Green-Blue (RGB) model is commonly used in image capturing devices such as digital cameras in reproducing colours. In the RGB model, the colour of each pixel is defined by the 3-dimensional vector (R, G, B) where each component measures the intensity of the red, green, and blue light. Note that to avoid confusion with the ‘components’ of the colour model and that of a mixture model, the ‘components’ of the colour model will henceforth be referred to as the colour channels. In the case where it is clear ‘component’ will be used instead. The intensity values for each colour channel are typically of 8 bits, i.e. 256 intensity values are used for each colour channel, and where the values are stored in the range [0, 255]. In the RGB model, additive mixing is used to reproduce colours using the mixture of red, green, and blue light. The usage of additive mixing of the three components to reproduce colours makes the RGB unintuitive for specifying the type of colour (e.g. ‘yellow’ is encoded as [255, 255, 0]). Furthermore, the components in the RGB model are known to be highly correlated. There are many other colour models used for representing colours numerically. More suitable colour models for specifying colour include the Hue, Saturation, Value (HSV) colour space which better match the human perception of colours. In the HSV colour space, the three components Hue, Saturation, and Value, are used where the Hue given by the H component describes (loosely speaking) the perceived colour. The Saturation, S, measures the ‘purity’ of colour, where the most saturated colour corresponds to ‘pure’ colour, and where colours with low saturation have shades of gray. The value, given by the V component can be seen as the intensity value that distinguishes white from gray or black. In transforming the RGB colour space to the HSV colour space, the effect of illumination changes almost vanishes, making it more suitable for our analysis. The Hue component is relatively robust to varying lighting conditions (Coronado et al., 2011).

In deciding what colour space to be used for the fitted models, we have been experi-menting the discriminative power to separate signs from non-signs (=background),

in the following four different colour spaces: RGB, L∗a∗b∗, HSV, and YCbCr. The

methodology we used, was to threshold a typical image containing road signs into a binary image using the distribution of the collected colour data as a guideline for specifying the threshold values. The thresholds were specified by the greatest lower bound and the least upper bound in each colour channel (in the 3D space) for each colour data, respectively. These thresholds for the red, yellow, and blue colour were then used as a logical ‘mask’, giving us a binary image to find pixels that can be considered to be part of the foreground (i.e. road signs). We experimented on the four different colour spaces to explore the extent to which the background was

removed for the original images. The colour spaces: HSV, and L∗a∗b∗ were found

(37)

3.4 Modelling colours using mixture models

possible (i.e. the found foreground contained almost no background). However,

the colour space L∗a∗b∗is not practical for detection since the transformation from

the RGB to the L∗a∗b∗ colour space has quite high time complexity (takes about

one second for a typical image). For generation of synthetic images, however,

L∗a∗b∗ will be used, since this colour space has the property of being a perceptual

uniform colour space, which is useful for manipulation of colours.

In using the HSV model to segment the pixels into clusters, we have to consider that at certain values the pixel value contains colours of low ‘chromatic’ value, and can be said to contain no chromatic information. E.g., for low intensity values of the saturation component we get shades of grey regardless of hue. That is, for some regions in the HSV colour space, the hue (which carries the majority of the chromatic information) may not be useful in segmentation. Furthermore, pixel values having a low value of the V component can be seen as ‘black’ colours. In Vitabile et al. (2002), they study the amount of drift of the hue when colours are affected by variations in lighting. They identified three regions in the HSV colour space depending on the component values S and V that can affect the behaviour of the hue component. The three regions identified were the following: (i) the achromatic region classified as {s ∈ S, v ∈ V | s ≤ 0.25 or v ≤ 0.2 or v ≥ 0.9}; (ii) the unstable chromatic area classified as {s ∈ S, v ∈ V | 0.25 < s < 0.5 and 0.2 < v ≤ 0.9}; (iii) the chromatic area classified as {s ∈ S, v ∈ V | s ≥ 0.5 and 0.2 < v ≤ 0.9}. Based on this, and some slight adjustments to the lower bounds, we remove colours that are part of the ‘achromatic’ area, specified by the region {s ∈ S, v ∈ V | s ≤ 0.2 or v ≤ 0.2}, thus we also include colours for which V ≥ 0.9 as many of the collected colours were found to have a peak around V = 1 (for yellow colours about half of those have values for which V ≥ 0.9).

3.4.1. Mixture distribution

The colours have been collected in various conditions, and colour is subject to many factors that can adversely affect the ‘perceived’ colour. Examples of factors are lighting conditions, sunlight, shadows, weather conditions, time of the day, and to properties of the painted colour on the sign, such as the degree of fading. Therefore, it is realistic to assume that the distribution of colours can be subdivided into (sub)clusters, and for which the distribution of each cluster may be explained by the type of settings in which the colour was collected. The number of mixture components, k, corresponds to the number of (sub)clusters in the data. A mixture model with carefully chosen parameters should be able to identify any such clusters. In our case we have made the assumption that the subclusters exhibit normal distributions.

(38)

3.4.2. Gaussian Mixture model

Given an input image where the colours are, say, stored in the HSV colour space (transformed from the original RGB format), each pixel value, x (colour), is

de-scribed by the colour channels H, S and V , which we write x ∈ {xH, xS, xV}.

Our goal is to fit a mixture model to a specific colour class Ci, using a set of

ob-served pixel values, and where the set of classes {C1, . . . , Ck} is the set of (different)

colours that we seek to model. We consider colours that appear in road signs: red, yellow, and blue. For each class (or colour), we model the colour distribution using a mixture of k multivariate Gaussians following Titterington et al. (1985):

f (x|θ) = k X

j=1

πjfj(x|µj, Σj) (3.1)

where π = {πj; j = 1, . . . , k} is the k-vector probabilities for each mixture

compo-nent j, satisfying πj ≥ 0 and Pkj=1πj = 1. The density functions, fj, are normally

distributed with some specific mean vector µj, and covariance-variance matrix

Σj, that is, Np(µj, Σj), with the parameters given by µ = {µj; j = 1, . . . , k} and

Σ = {Σj : j = 1, . . . , k}, where Σj is a p × p covariance matrix. The parameters to estimate in the model are thus θ = (µ, Σ, π). Additionally, we will include a

classification vector z = {zi : i = 1, . . . , n} where zi = j implies that observation

xi is drawn from component j. The vector z can be seen as a latent variable. We

assume here that the number of components k is known – for the case in which the number of components k is unknown, is very difficult to handle efficiently and will not be considered here, see e.g. Dellaportas and Papageorgiou (2006).

3.4.3. Fitting of a k-component mixture multivariate Normal

model

In fitting a mixture multivariate Normal model with k components, to some data, the parameters are typically found by means of the Expectation-Maximization (EM) algorithm, which is an iterative algorithm. Implementing the EM algo-rithm for fitting the normal mixture model with a known number of components, is often straight-forward, and closed formulas for parameter estimates exist, see e.g. Titterington et al. (1985) (2 components only) for multivariate normals with equal covariance-variance matrices, or Chen and Tan (2009). However, EM has the drawback of converging slowly in the vicinity of the mode, and it can easily get trapped in local modes, and the choice of initial values are therefore crucial. An alternative to finding the parameters using the deterministic EM algorithm is to

(39)

3.4 Modelling colours using mixture models

instead consider Markov Chain Monte Carlo (MCMC) to simulate from the joint posterior distribution of all model parameters:

p(θ|D) = p(D|θ)p(θ)

p(D) (3.2)

where p(D|θ) is the likelihood, a function that measures how probable the observed data is given some parameters θ, and where p(θ) is our prior, the degree of belief in how the parameters θ are distributed, before we observe any data D. The normalizing constant p(D) does not depend on θ, and we can equally well work with the unnormalized posterior

p(θ|D) ∝ p(D|θ)p(θ). (3.3)

In modelling the distribution of colours we assume that f (x|θ) is a three-dimensional multivariate normal distribution. In fitting the distribution in the HSV colour space, we do not make the assumption that the colour channels are independent,

i.e. the covariance matrices Σj are not restricted to be diagonal.

3.4.4. Parameter estimation and Statistical model

For the theory and methodology on fitting a mixture model of k multivariate normal distributions using Gibbs sampling see the Appendix A. In estimating the parameters we draw 2000 MCMC samples and estimated the parameters by the posterior mean, and discarding the first 500 draws, as burn-in period.

3.4.5. Beta Mixture model

Another model that will be considered is a univariate mixture of Beta distributions (assumption of independency between colour channels is held). The model for a mixture of (univariate) Beta distributions is

f (xm|θ) = k X

j=1

πjfj(xm|aj, bj) m = 1, . . . , 3 (3.4)

with class probabilities π = {π1, . . . , πk}, and Beta density function, fj, with shape

parameters a = {a1, . . . , ak}, b = {b1, . . . , bk}, for the specific colour channel m.

Since the Beta distribution is only valid for x ∈ (0, 1) (in the case where a < 1 and b < 1), whereas the colour channels are valid in [0, 1], the boundary values 0, and 1, are replaced by 1/255 and 254/255, respectively. The mixture model

(40)

that considers all colour channels using the Beta model is then the product of the mixture models for each colour channel, that is:

f (x|θ) = 3 Y m=1 k X j=1 πjmfjm(xm|ajm, bjm) (3.5)

where {πjm : j = 1, . . . , k, m = 1, . . . , 3}, and likewise for the other parameters.

We use the package betareg in R (Grün et al., 2012) to fit a mixture of univariate Beta distributions, which uses an Expectation-Maximization (EM) algorithm in finding the parameters.

3.5. Detection

We here discuss the various steps involved in detection, and we begin by explaining how a (Bayesian) probability model is used for the pixel colour.

3.5.1. Probability model

Once we have fitted a mixture model for each colour we can use Bayes’ Theorem to build a probability model for assigning the probability of a pixel to belong to either class (or colour). Bayes’ Theorem tells us that the (posterior) probability

p(Ci|x) ∝ p(x|Ci)p(Ci) (3.6)

where p(Ci) is the prior probability of the class and we have ignored the

normal-ization constant since it does not depend on the class.

In using the Bayesian paradigm to build a probability model for calculating the (posterior) probability of a pixel to belong to either colour, we must assume that the classes are exhaustive, thus we need to consider the (full) set of possible classes. We therefore consider a background class as well, to incorporate all possible values; The classes to model would then be all colours plus a background class (BG), that

is, y ∈ {Cred, Cblue, Cyellow, CBG}.

In modelling the background class, we can use flat likelihood functions. Thus,

we write the likelihood for the background class as p(x|CBG) = U (0, 1)3, in which

the colour channels are assumed to be independent. Given this specification, the posterior probability can be calculated as

p(y = Ci|x) = p(x|Ci)p(Ci) p(x) = p(x|Ci)p(Ci) P4 i=1p(x|Ci)p(Ci) (3.7)

(41)

3.5 Detection

which is then restricted to [0, 1]. In our case we make the assumption of equal (class) priors, so the posterior probability can be calculated as

p(y = Ci|x) =

p(x|Ci)

P4

i=1p(x|Ci)

(3.8) We can with this model calculate the (posterior) probability of each pixel to belong to either of the colours that are found on road signs based on the real collected data of colours. Applying the probability model to a colour image gives us a gray-scale image where pixels with high posterior probabilities can be identified by white (bright) pixels, indicating that the pixel colour is likely to be part of a road sign. For example, a colour image stored in the RGB model, has three colour channels, and extracting, say, the blue channel gives a gray-scale image with the intensity value indicating the amount of ‘blue’ colour in each pixel (or more specifically, the degree of light that is spectrally sensitive to the blue light using the RGB model). Since this strategy of modelling traffic sign colours using colours of red, yellow, and blue may also model colours not part of the road sign, a natural extension would be to include additional background classes, where we model colours that are similar. The (additional) background classes can therefore be pixels that are part of non-traffic sign objects, but with similar colours to that of the traffic signs. The three background classes we use are the following: blue background (BG Blue), such as sky; red background (BG Red), such as red-painted houses, or brick walls; and yellow background (BG Yellow), such as vegetation. The likelihood for these background classes, is then the sum of a mixture model that has been fitted to this class (separately), and a uniform distribution, to handle the complete space of values (the range for each colour channel is [0, 1]). For example the red background class could be modelled as

p(x|CBGRed) = 1 2   k X j=1 πjfj(xi|µj, Σj) + U (0, 1)   (3.9)

where we set k to be the number of components, x is stored in the HSV colour space, and where the weights (1/2) have been set arbitrarily (these weights can, however, be estimated). In using this probability model we need to make sure that the classes are well separated in this space (HSV is studied here). If the classes are overlapping this will not work very well.

Note that many road signs have a combination of colours, e.g. red triangles being red-bordered and with a yellow background. To find red triangles, one way is to combine the intensity values (representing the posterior probabilities) for each colour. We can combine the red and yellow posterior probabilities for each pixel using:

(42)

giving us the new intensity image Ired,yellow.

An example of a set of gray-scale images that is returned by applying the proba-bility model is given in Figure 3.2.

(a) Original. (b) Blue posterior probability.

(c) Red posterior probability. (d) Yellow posterior probability.

Figure 3.2.: Example of images returned by the probability model.

3.5.2. MSER

In the colour segmentation, we assign a ‘probability’ to each pixel that indicates how likely it belongs to a road sign based on the colour. To find the set of (seg-mented) pixels that are adjacent, and form a connected group, a region detector is used. A region detector finds the set of pixels that can be considered to be part of a connected region based on some heuristic. In finding the connected com-ponents (regions), we apply the method of Maximally Stable Extremal Regions (MSER) (Matas et al., 2004). The MSER detector finds regions with stable (non-varying) intensity values, and in which the shape of the region is maintained when thresholding the image at varying intensity values. The MSER detector takes as

(43)

3.5 Detection

input the intensity values of an image (basically a gray-scale image) and considers thresholding the image at all possible intensity values (8 bits for gray-scale image results in 256 threshold values). The set of pixels that are above or equal to a given threshold is set to ‘white’ and those below ‘black’. At the first threshold value, the image is completely white, and as the threshold is successively increased the regions below the threshold will form black regions that are considered to be local intensity minima. The regions that are above or equal to the threshold value are considered to be local intensity maximal regions. The idea is that a sequence of connected components (white regions) of which the shape is maintained while thresholding the image at varying intensity values is considered to be a (maximally) stable region.

The MSER detector is invariant to affine transformations and is suitable for find-ing regions with stable intensity values and distinctive boundaries. Considerfind-ing that signs are painted with discriminative (=high contrast) colours to make them separable from the background, makes this method suitable for our task. We use detectMSERFeatures in Matlab to detect MSER (see Table 3.1 for the parameters used).

Table 3.1.: MSER parameters.

Threshold Delta Min and Max Area Max Area Variation

10 [200, 90000] 0.25

The MSER detector gives many detections for the same sign. To remove the repeated detections and regions that are overlapping we apply non-maximal sup-pression on the bounding boxes. The found MSER regions are merged if the area of the bounding boxes overlaps more than 80%.

Furthermore, the MSER detector generates many false positives, and some addi-tional steps are necessary to take care of this. We consider using various geometric attributes to discard the false positives, some of the false positives are removed by using various computed statistics such as aspect ratio etc. Table 3.2 gives a list of the image properties we compute to discard some of the false positives. Perimeter ratio is the ratio of the perimeter of the object (as calculated by Matlab in some sense) and the perimeter of the bounding box. Extent is the area (units of pixels) of the object divided by the area of the minimal rectangle that encloses it. E.g., a rectangular region would have an extent of 1.0. Solidity is a measure of the area of the region divided by the convex hull of the region. E.g., a circular region would have a solidity of 1.0.

(44)

Table 3.2.: Image (geometric) properties used to filter MSER regions.

Parameter (statistic) Minimum Maximum

Aspect ratio _1.41 1.4 Width (pixels) 20 400 Height (pixels) 20 400 Perimeter ratio 0.2 1.4 Extent 0.4 1.0 (max) Solidity 0.6 1.0 (max)

Based on manual inspection of about a few thousands of images, the maximum size of signs rarely go above 230 pixels in either dimension, the largest was found to be 291 × 291 in pixel size. We set the maximum size to be 400 × 400, and the minimum size to 20 × 20.

However, not all false positives will be discarded after this step. As an additional step, a binary classifier is used to verify that the found ROIs are indeed containing a road sign. The classifier is trained to separate traffic signs from non-traffic sign objects to filter out the false positives.

3.6. Generation of synthetic data

This section deals with the subject of generating synthetic (image) data in order to obtain a data set that constitutes most road traffic signs. By artificially generating data sets in this manner, we can avoid the laborious task of finding, collecting, and labelling the great range of signs that are in use.

Collecting training data that constitutes all signs on the Swedish road network, is not only time-consuming but unrealistic as well. There are types of road signs that are scarce, and some may not even be found present in Östergötland (such as the sign for a road tunnel). By using a generative algorithm, we can create synthetic images of signs, by adding random distortions to image templates of existing road traffic signs. These are distortions such as variations in lighting, blurring, rotational (geometric) distortions, colour variations, and image noise. By creating realistic synthetic images it is possible to circumvent the need for manual collection and annotation of road sign images. It further allows us to potentially increase the generalization performance as the artificially generated image data should to some extent resemble the quality of digital images of road signs captured in real environments.

On the Construction of an Automatic Traffic Sign Recognition System

Master Thesis in Statistics and Data Mining