Machine Learning : for Barcode Detection and OCR

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Machine Learning

for detection of barcodes and OCR

Examensarbete utfört i Datorseende vid Tekniska högskolan vid Linköpings universitet

av Olle Fridolfsson LiTH-ISY-EX--15/4842--SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Machine Learning

for detection of barcodes and OCR

Examensarbete utfört i Datorseende

vid Tekniska högskolan vid Linköpings universitet

av

Olle Fridolfsson LiTH-ISY-EX--15/4842--SE

Handledare: Freddie Åström

isy_{, Linköpings universitet}

Ola Friman

SICKIVP

Examinator: Lasse Alfredsson

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Computer Vision Laboratory Department of Electrical Engineering SE-581 83 Linköping Datum Date 2015-06-14 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-XXXXX

ISBN — ISRN

LiTH-ISY-EX--15/4842--SE

Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Maskininlärning för detektion av streckkod och OCR Machine Learning for detection of barcodes and OCR

Författare Author

Olle Fridolfsson

Sammanfattning Abstract

Machine learning can be utilized in many different ways in the field of automatic manufac-turing and logistics. In this thesis supervised machine learning have been utilized to train a classifiers for detection and recognition of objects in images. The techniques AdaBoost and Random forest have been examined, both are based on decision trees.

The thesis has considered two applications: barcode detection and optical character recog-nition (OCR). Supervised machine learning methods are highly appropriate in both applica-tions since both barcodes and printed characters generally are rather distinguishable. The first part of this thesis examines the use of machine learning for barcode detection in images, both traditional 1D-barcodes and the more recent Maxi-codes, which is a type of two-dimensional barcode. In this part the focus has been to train classifiers with the technique AdaBoost. The Maxi-code detection is mainly done with Local binary pattern features. For detection of 1D-codes, features are calculated from the structure tensor. The classifiers have been evaluated with around 200 real test images, containing barcodes, and shows promising results.

The second part of the thesis involves optical character recognition. The focus in this part has been to train a Random forest classifier by using the technique point pair features. The performance has also been compared with the more proven and widely used Haar-features. Although, the result shows that Haar-features are superior in terms of accuracy. Nevertheless the conclusion is that point pairs can be utilized as features for Random forest in OCR.

Nyckelord

(6)

(7)

Abstract

Machine learning can be utilized in many different ways in the field of automatic manufacturing and logistics. In this thesis supervised machine learning have been utilized to train a classifiers for detection and recognition of objects in im-ages. The techniques AdaBoost and Random forest have been examined, both are based on decision trees.

The thesis has considered two applications: barcode detection and optical char-acter recognition (OCR). Supervised machine learning methods are highly appro-priate in both applications since both barcodes and printed characters generally are rather distinguishable.

The first part of this thesis examines the use of machine learning for barcode de-tection in images, both traditional 1D-barcodes and the more recent Maxi-codes, which is a type of two-dimensional barcode. In this part the focus has been to train classifiers with the technique AdaBoost. The Maxi-code detection is mainly done with Local binary pattern features. For detection of 1D-codes, features are calculated from the structure tensor. The classifiers have been evaluated with around 200 real test images, containing barcodes, and shows promising results. The second part of the thesis involves optical character recognition. The focus in this part has been to train a Random forest classifier by using the technique point pair features. The performance has also been compared with the more proven and widely used Haar-features. Although, the result shows that Haar-features are superior in terms of accuracy. Nevertheless the conclusion is that point pairs can be utilized as features for Random forest in OCR.

(8)

(9)

2.7 Cascade of classifiers . . . 14 3 Methods 15 3.1 Supervised learning . . . 15 3.2 Features . . . 16 3.3 Dataset . . . 16 3.4 Evaluation . . . 17 3.5 References . . . 17 4 Barcode detection 19 4.1 Background . . . 19 4.2 Overview . . . 20 4.3 Dataset . . . 22 4.3.1 Training data . . . 23 4.4 Tile size . . . 24 4.5 Features . . . 25 4.5.1 Standard deviation . . . 25 4.5.2 Structure tensor . . . 26

4.5.3 Local binary pattern . . . 29

4.6 Code detection with AdaBoost . . . 30

4.6.1 Step 1: Standard deviation . . . 30

(10)

vi Contents

4.6.2 Step 2 and 3: Structure tensor . . . 31

4.6.3 Step 4: Local binary pattern . . . 33

4.6.4 Cascade model . . . 33

4.7 Code detection with Random forest . . . 34

5 Evaluation of barcode detection 37 5.1 Parameters . . . 37

5.1.1 AdaBoost . . . 37

5.1.2 Random forest . . . 37

5.2 AdaBoost . . . 38

5.3 Random forest . . . 40

5.4 Results and conclusions . . . 41

6 Optical character recognition 45 6.1 Background . . . 45

6.2 Overview . . . 47

6.3 Training dataset . . . 48

6.4 Features . . . 51

6.4.1 Haar-like features . . . 51

6.4.2 Random point pairs . . . 52

6.5 Data processing during testing . . . 53

6.6 Postprocessing . . . 54 7 Evaluation of OCR 57 7.1 Parameters . . . 57 7.1.1 Training . . . 57 7.1.2 Testing . . . 58 7.2 Evaluation of training . . . 59

7.2.1 Training with point pair features . . . 60

7.2.2 Training with Haar-like features . . . 61

7.2.3 Summery of the result . . . 63

7.3 Result for some real test images using point pair features . . . 64

7.3.1 Cases when the classifier fails . . . 65

8 Discussion and conclusions 67 8.1 Discussion of methods . . . 67 8.2 Conclusions . . . 68 8.2.1 Barcode detection . . . 68 8.2.2 OCR . . . 70 8.3 Future work . . . 72 Bibliography 73

(11)

1

Introduction

Machine learning is a technology that can be utilized in many ways, within the field of automatic manufacturing and logistics. It can involve automatic inspec-tion, recogniinspec-tion, and quality control. The objective of this thesis has been to investigate the use of machine learning for detection and recognition in images. In these applications, it is usually possible to obtain labeled training samples, i.e. each training sample has been marked with a certain class label. For that reason, a suitable method is to use supervised machine learning, where labeled training samples are used to train a classifier, which then can be used to classify unlabeled test samples. In this work, the use of machine learning has been investigated for detection of barcodes in images and optical character recognition.

1.1 Objective

This thesis is part of a pilot study which is currently under way at SICKIVP in Linköping. The study comprises different ways to utilize machine learning in cameras used in manufacturing. The usage involves recognition of objects in images. Recognition is the process of detecting and labeling objects in images which contain an unknown number of objects. The methods that are currently used for recognition involve template matching. Therefore it is of interest to investigate if methods involving machine learning give a better result.

In this thesis, the objective has been to investigate the use of machine learning for:

• Detection of barcodes, both 1D- and 2D-codes, in images. • Optical character recognition (OCR).

(12)

2 1 Introduction

In figure 1.1 two different types of barcodes are illustrated. To the left is a Maxi-code that is a type of 2D-Maxi-code. To the right is a 1D-Maxi-code.

Figure 1.1: Two different types of barcodes. The Maxicode, to the left, is a type of 2D-code and to the right is a 1D-code.

Figure 1.2 below, illustrates the type of characters which have been used for OCR. This is a special type of font called OCR-A, which has been particularly designed for OCR.

Figure 1.2:Some different characters in font OCR-A.

Methods for both applications have been implemented in C++ using OpenCV. The supervised machine learning methods that are of great interest for these ap-plication are:

• Discrete AdaBoost • Random forest

In both methods, decision trees are used as classifiers and they are frequently referred to as the state of the art in the area, due to their high speed classification and low error rate properties. Random forest is for instance used by Microsoft in their Xbox together with the Kinect camera [Criminisi et al., 2011]. AdaBoost is used as classifier in Viola and Jones seminal work on face detection presented in [Viola and Jones, 2001].

When machine learning is used for recognition in an image, it is necessary to first extract features from the image (this will be explained in more detail later on). When the applications are used in real situations, i.e. in the final products, they need to be relatively fast. For that reason, one challenge is to find suitable

(13)

1.2 Limitations 3

features which are less computational. These features can also be combined in different ways, which significantly affect the computational performance of the system.

For barcode detection, the objective has first been to train a classifier using Ad-aBoost with some different features. The second objective has been to train the classifier with Random forest and compare the results. The question that is of most interest regarding barcode detection is: Which technique is most suitable, AdaBoost or Random forest?

For OCR, only Random forest has been used for training, since it is more suitable in cases where the number of classes is more than two. For OCR, two different features have been tried out. The first one involves comparison of pixel values in point pairs, which are randomly distributed in the image. This is a rather new method, originally used for face detection and presented in [Nenad et al., 2014]. The second feature is based on the Haar-features presented in [Viola and Jones, 2001]. There are two questions that are of interest regarding OCR:

• Are point pairs suitable as features when used in OCR?

• How does point pair features perform compared to Haar-like features?

1.2 Limitations

Regarding the use of machine learning for barcodes, it should be emphasized that it only involves detection. The objective is not to read the codes in any way, only to detect where they are located in the images. There exist many different types of 2D barcodes, however, in this work only Maxicodes have been investigated. While computational performance has not been the ultimate goal, it is desired to have methods which are as fast as possible, since the system is intended to be used in a real time environment. For that reason, the focus will be on features that are computationally efficient. The evaluation in this case is focused on classification accuracy and processing time.

1.3 Summary of thesis

This thesis begins in chapter 2 with theory about machine learning and the theory presented is mainly focused on the methods used in this work. This is followed by chapter 3, which is a summary of the methods that are presented in this thesis. After that, the work and evaluation of barcode detection and OCR are described separately in chapters 4, 5, 6, and 7. These chapters also contain theory about the features used in each application. This is followed by chapter 8, that contains conclusions of the results and answers to the questions stated in the introduction. The report ends with a discussion about the methods used in the thesis.

(14)

(15)

2

Machine learning

Since techniques involving machine learning are used throughout the thesis, this chapter contains the theory of the methods that have been used. In the first part of this chapter, machine learning is described in a general context, then the focus shifts to the techniques used in the thesis.

2.1 Machine learning in general

The field of machine learning has grown a lot the last decades and every year nu-merous new methods are published. The use of machine learning in applications has expanded rapidly and the technique is now common in many different fields. Techniques involving machine learning are used in web search, spam filters, stock trading, weather forecasting and many other applications.

Machine learning is often used when the desired algorithm is too complex for a human to implement but it is easy to generate examples of what the algorithm should do. It is common that the data of interest is high-dimensional which makes it difficult for humans to handle. The fundamental goal of machine learn-ing is to make a computer learn and generalize from observed data or from expe-rience. This is usually done by observing patterns and structures in data.

The field of machine learning is typically divided into three categories [Bishop, 2009, chapter 1]:

• Supervised learning

Involves methods where the training data is labeled. • Unsupervised learning

Involves methods where the training data is unlabeled.

(16)

6 2 Machine learning

• Reinforcement learning

When the program learns by generating policies that lead to a reward. In machine learning it is also common to distinguish between classification and regression. In classification the output variable for the algorithm will have a class label of some sort. Regression is about estimating a response, the output variable will here take a continuous value.

In this thesis the focus is on supervised learning and classification, consequently this is what will be discussed further on in the chapter.

2.2 Supervised learning

Supervised learning is often used in classification problems and it is usually the most suitable technique if labeled training data is available. Figure 2.2 illustrates an overview of a system using supervised learning.

labeled training data

extract features

training using

machine learning classifier test data

extract features

classification

Training

_Testing

Figure 2.1:Supervised learning, overview.

In supervised learning a classifier is trained from labeled training data pairs, (xi, yi), where xi is the training sample, yi is the corresponding label, and i is

(17)

2.2 Supervised learning 7

the index for the vector containing all training samples. From the dataset, fea-tures need to be extracted in some way. The choice of feafea-tures is one of the most critical aspects to make a good classifier. Each input sample is described by a feature vector, x_i =           fi1 .. . fiM           . (2.1)

where fij is the value of feature j for sample i, and M is the number of features.

The training set needs to be representative for the data which the classifier is going to be applied on. It is common to add some noise to the training data to make it less perfect. This is done to avoid overfitting which can be a problem in supervised learning. Overfitting means that the classifier memorizes the dataset instead of making a more general classifier. This will lead to problems, if for example the training data contains outliers. In image 2.2 the problem of overfit-ting is illustrated. The black curve separates the two classes in a more general way, while the green curve completely separates the two classes. However, the separation described by the green curve does not take the outliers into account.

Figure 2.2:The image illustrates the problem of overfitting. Source: [Ignacio Icke, 2008]

There exists a wide range of different supervised learning methods. Some of the most common are listed below.

• Neural network, [Bishop, 2009, chapter 5] • Decision tree, [Breiman and J. Friedman, 1984]

(18)

– AdaBoost, [Freund and Schapire, 1999] – Random forest, [Breiman, 2001]

• Support vector machine, [Cortes and Vapnik, 1995] • Nearest neighbour, [Keller, 1985]

• Kernel estimator, [Bishop, 2009, chapter 6] • Linear regression, [Tibshirani, 1994] • Logistic regression, [Bishop, 2009, p. 205] • Naive Bayes, [Rennie et al., 2003]

• Linear discriminant analysis, [Lachenbruch, 1979]

In this thesis two supervised learning technique have been used, AdaBoost and Random forest. Both these methods use decision trees in their classifiers. A deci-sion tree is commonly referred to as a weak classifier since each individual tree is usually less accurate. The final classifier which is received after training is called the strong classifier. It is obtained by combining many weak classifiers. This is one way to reduce overfitting in supervised learning.

2.3 Decision trees

Both AdaBoost and Random forest classifiers consists of several decision trees which together comprise a strong classifier. This section explains how a single decision tree is trained.

Figure 2.3 illustrates the upper four levels in a decision tree used in OCR. A decision tree is a tree structure where all nodes, except the leaf nodes, has two child nodes. In classification, all leaf nodes have a class label. When a decision tree is used for classification, the test sample starts at the top of the tree. In every node a decision is made whether the test sample shall go to the left or to the right. The test sample will at some point reach a leaf node where it receives a label.

(19)

2.3 Decision trees 9

Figure 2.3: Illustration of the 4 upper levels in a classification tree used in OCR

A good explanation of how decision trees are trained can be found in [Sazonau, 2012]. During training the tree is created from the training data and its corre-sponding labels. This is usually done by maximizing the information gain Einf

in every split, i.e.

Einf = I(P ) − NL NP I(L) −NR NP I(R) ! , (2.2)

where P is the training samples before the split and L and R are the training samples in the left and right node. NP is the number of training samples before

the split and NLand NRare the number of samples in the left child node and the

right child node, respectively. The function I( · ) is called the impurity and can for example be computed using Gini impurity [Raileanu and Stoffel, 2000],

I( · )Gini= 1 − C

X

c=1

p2c, (2.3)

or with the entropy [Raileanu and Stoffel, 2000],

I( · )Entropy = − C

X

c=1

pclog2(pc), (2.4)

(20)

1...C in the node is calculated in the following way:

pc= X n∈N ycn X n∈N C X k=1 y_kn , (2.5)

where N is the set of training samples in the specific node.

When maximizing Einf, all features can be used or just a subset. The

maximiza-tion is usually done with brute force, i.e., the algorithm searches for a threshold in one dimension at a time in the feature space and choose the one that best sep-arates the data.

When building a decision tree the algorithm starts at the root node and makes a split for every node recursively until it reaches the leafs. All the leafs have then received a label. The algorithm reaches a leaf node when one or several of the following conditions are met:

• The tree has reached a specified maximum depth.

• The number of training samples in the node has reached a minimum thresh-old.

• All the samples in the node belong to the same class.

• The best found split does not give a noticeable improvement.

These conditions are referred to as the stop splitting condition in algorithm 1 below.

Algorithm 1Algorithm for decision trees

Given the data set P = (x, y) and a set of features f . The function runs recursively and stops when all leaf nodes have been reached.

function trainDecisionTree(P , f )

ifthe stop splitting condition is reached then label← The most common class in P else

Maximize Einf for P

Receive the the left dataset L and the right dataset R Decide which features to use next, that is fLand fR

Add left child← trainDecisionTree(L, fL)

Add right child← trainDecisionTree(R, fR)

end if end function

(21)

2.4 AdaBoost 11

2.4 AdaBoost

AdaBoost was introduced by Yoav Freund and Robert Schapire and a short in-troduction to their work is presented in [Freund and Schapire, 1999]. A more comprehensive description of the method can be read in [Friedman et al., 2000]. The training process and the final classifier is described in algorithm 2. The basic idea is to train a number of decision trees, referred to as weak classifiers. These are then combined to a strong classifier. To each training sample there are corre-sponding weights which are equal for all samples at the start. The weak classifiers are trained sequentially and after each step the weights are adjusted depending if the corresponding training samples are correctly or incorrectly classified. In each iteration, all the training samples are used.

Algorithm 2Algorithm for discrete AdaBoost

Given: training samples (x1, y1), ..., (xN, yN), where yi ∈[−1, +1].

Initialize the weights: D1(i) = 1/N for i = 1, ..., N .

For t = 1, ..., T :

• Train weak classifier using weights Dt.

Get weak classifier ht: X → −1, +1 by minimizing the error:

t = N

X

i=1

Dt(i)I(yi, ht(xi)), (2.6)

where I is an indicator function • Chose αt =12( 1−t t ) • Update, for t = 1, ..., N : Dt+1(i) = Dt(i) exp(−αtyiht(xi)) Zt (2.7) where Ztis a normalization factor, such that:

N

X

i=1

Dt+1(i) = 1 (2.8)

Final strong classifier:

H(x) = sign        T X t=1 αtht(x) − ϕ        (2.9)

The parameter ϕ regulates the number of weak classifiers that have to be true to get a true classification in the strong classifier. If ϕ is set to zero the strong classifier will get a true classification if at least 50% of the weak classifiers get a true classification.

(22)

af-12 2 Machine learning

fect the training time and also the resulting classifier: • The number of weak classifiers, T

• The depth of the trees • The trim rate of the weights

AdaBoost most commonly use tree depths equal to 1. This means that the weak classifiers consist of a single decision tree with only one split, i.e. only a root node and two leafs. But it is possible to use trees with an arbitrary depth.

One alternative in the AdaBoost algorithm is to trim the weights during the train-ing. As the training progress some of the training data will become unimportant, i.e., their weights will be very small. One option is then to discard data which weights are below a certain threshold.

2.5 Random forest

Random forest was first introduced by Leo Breiman [Breiman, 2001]. A thorough description of the method can be found in [Criminisi et al., 2011]. Random forest is a type of supervised learning method based on decision trees. It is mainly used for classification problems and it has some similarities with AdaBoost. The algorithm creates decision trees that works as weak classifiers, the trees can have an arbitrary depth. When the training is done the trees are combined to a strong classifier.

During training the trees are created one at a time with a random subset of train-ing data. In machine learntrain-ing this kind of method is called Baggtrain-ing or Bootstrap Aggregation [Breiman, 2001]. The training with Random forest is described in algorithm 3.

Algorithm 3Algorithm for Random forest Given: training samples (x1, y1), ..., (xN, yN)

For t = 1, ..., T :

• Pick a random subset (Xb, Yb) from the training dataset (X, Y ) with

replace-ment

• Train a decision tree fbon (Xb, Yb)

• Add the decision tree to the Random forest

The final strong classifier is received from the majority vote of all trees

One of the special things with Random forest is that in each split node in a tree only a random subset of the features is used. From this subset the feature that splits the data in the best way is chosen by maximizing the information gain. The reason for only choosing a subset of the features in every split is to make the trees

(23)

2.6 AdaBoost and Random forest comparison 13

uncorrelated to each other. If for example some features are more discriminative than others, these features would be chosen more frequently if all features were used for every split. This would make the trees more correlated to each other. The technique of using random subsets of training samples for each tree and random subsets of features in every split is a way to deal with the problem of overfittning. The random subsets of training samples will not likely contain the same outliers.

When using the classifier during classification the sample is put through the trees one at a time. Every tree makes a classification of the sample and the prediction is then made by the majority vote of all the trees. Here it is normal to use a threshold for the amount of votes compared to the total amount of trees.

When using Random forest as a classifier there are a number of different param-eters that can be adjusted:

• The number of trees

• The maximum depth of the trees

• The minimum number of samples in a leaf node • The number of active variable in a split

• The type of the termination criteria

The maximum depth of the trees determines how deep a tree can get before it stops growing.

The minimum number of samples in a leaf node, i.e. N in equation (2.5), is the minimum number of samples in a node which is required for it to be split. The number of active variables in a split determines how many features each split will chose randomly. By altering this parameter it is possible to control the randomness of the forest. A high parameter value will lead to more correlated trees. A normal value for this parameter is the square root of the total number of features.

Also it is possible to choose what type of termination criteria to use, i.e. when the training will stop. The most common criteria is to terminate the training when the number of trees has reached a certain value.

2.6 AdaBoost and Random forest comparison

Random forest and AdaBoost are both techniques based on decision trees. How-ever there are some distinct differences that should be emphasized:

• In Random forest, every weak classifier only use a subset of all training data. In AdaBoost, all training data are used in every iteration, together with weights.

(24)

• Random forest only uses a subset of all features in every split, while Ad-aBoost uses the whole set of features.

• Random forest works with an arbitrary number of classes, while the version of AdaBoost, which has been used here, only works for two classes.

Both AdaBoost and Random forest are based on the same idea, to form a strong classifier by combining many weak classifiers. The main reason behind this ap-proach is to avoid overfitting. In Random forest this technique is even further developed. The technique of only using random subsets of training samples and features leads to more uncorrelated trees, this was discussed earlier in section 2.5. The risk of overfitting is generally reduced with more uncorrelated trees.

2.7 Cascade of classifiers

Since the system is intended to be used in a real time environment it is desired to decrease the amount of computations as much as possible during testing. To speed up the system it is common to use several classifiers in a cascade. In each step in the cascade only one or a few features are used. In figure 2.4 the principle for a cascade is illustrated. A cascade model can be chosen manually or it can be automatically selected during training. By using a cascade it is possible to quickly discard false test samples.

test data classifier 1 classifier 2 classifier 3 true

false false false

(25)

3

Methods

In this work supervised learning has been utilized for detection of objects in im-ages. Two different applications have been investigated, barcode detection and OCR. The overall approach has basically been the same for both applications, however there are some differences. In this chapter a summary of the methods is presented in order to get an overview of the work that has been done.

3.1 Supervised learning

In supervised learning labeled training samples are used to train a classifier. The classifier is then used to classify unlabeled test samples. Two different supervised learning techniques are used throughout the work:

1. AdaBoost 2. Random forest

There exist several different implementations of these two methods and both of them can also be used for classification and regression. In this work the imple-mentations described in section 2.4 and section 2.5 for classification has been used. For both AdaBoost and Random forest there are also several parameters which might affect the resulting classifier. However the effect of some of these parameters has not been investigated. This is further explained in the beginning of chapter 5 and chapter 7 where the evaluation is described. Both AdaBoost and Random forest are implemented in OpenCV. This open source library has been used throughout the thesis regarding the training and testing of classifiers.

(26)

16 3 Methods

3.2 Features

An important part of all supervised learning methods is to extract features from the data samples. The choice of features is a critical step to achieve a good classi-fier. The features should be chosen in a way that, in the feature space, separates the classes as much as possible. The features that are used in this work are stated below, these are described in detail in the subsequent chapters.

For barcode detection the purpose is to separate the 1D- and 2D-codes, and the background from each other. A barcode is more distinguishable from its tex-ture than from its shape. For this reason featex-tures describing the structex-tures and patterns in the image have been utilized. For barcode detection some different features were considered and three of them are used, these are described in more detail in section 4.5.

1. Standard deviation, section 4.5.1 2. Structure tensor, section 4.5.2 3. Local binary pattern, section 4.5.3

There exist many more features that can be utilized for barcode detection, al-though the objective in this thesis has not been to make a thorough investigation of all possible features. However the features that have been used are common in many different fields regarding analysis of texture in images.

In OCR the different characters are distinct only by their shape. For OCR only two different types of features are tested, these are described in section 6.4.

1. Point pair features, section 6.4.2 2. Haar-like features, section 6.4.1

To use point pairs as features is a rather new method and one of the objectives has been to investigate how it performs as features for OCR. Haar-features are one of the most common features in machine learning and are widely used in applications regarding OCR. For that reason it is of interest to also investigating Haar-like features, in order to have something to compare with.

3.3 Dataset

One critical aspect in supervised learning is the dataset. The way the images used for training and testing have been obtained differs between the barcode de-tection and the OCR. The training data set should be chosen in a way that makes it representative for the data the classifier is supposed to work on. This is a very important step to make a good classifier.

The dataset used for barcode detection consist of 230 images containing different types of barcodes, 100 of these images are used for training and 130 are used for testing. The images have then been labeled manually, the parts of the images

(27)

3.4 Evaluation 17

containing codes are labeled as "1D-code" or "2D-code" and the rest is labeled as "background". Both during training and testing the images are divided into tiles of the same size each corresponding to a certain label, "1D-code", "2D-code" or "background".

For OCR, one reference sample for each character have been created manually. From these reference samples the training dataset is produced automatically by varying the characters position and adding noise. Each training sample will have a label which corresponds to its character. In contrast to the barcode detection there is not any large set of test images available. Instead the testing is done on synthetic images which are automatically generated.

3.4 Evaluation

For both barcode detection and OCR the most important factor is the accuracy. Both applications are intended to be used in the industry, where high reliability is necessary. For that reason, when the classifiers are evaluated the focus is on accuracy. The accuracy is measured in the amount of true detections in percent per image and the number of false detection per image.

Also the average processing time per image is calculated. However this is mainly in order to compare different methods and different parameter settings.

3.5 References

In the beginning of this thesis a literature study was done. The literature com-prises the theory of supervised machine learning in general and articles of other work including barcode detection and OCR. The aim has been that all literature included are scientific books and articles from well known universities and com-panies. The articles have been discovered by using scientific databases such as Google Scholar and Web of Science, this has been a way to keep to scientific sources.

(28)

(29)

4

Barcode detection

This chapter presents the methods that have been used for barcode detection. The chapter begins with an overview of bacode detection in general where some of the more common applications and techniques are described.

4.1 Background

Barcodes of different sorts are widely used, particularly in goods packing for au-tomatic identification of products. Barcodes can not be read and understood by humans, however for traditional reading devices the detection is usually done manually. In automatic detection of barcodes both run time and accuracy is of high significance. In an industrial environment missed codes might for example lead to loss in profit. For that reason it is of high interest to investigate differ-ent methods for barcode detection. Originally barcodes only included 1D-codes, i.e. codes with parallel lines with different width and spacing. In recent years barcodes evolved to also include different types of 2D-codes that consist of rect-angles, dots, hexagons and other geometrical patterns in two dimensions.

In recent years barcode detections and reading applications for mobile phones have been developed. In the paper [Ohbuchi and Hock, 2004], a method for QR-code detection is presented. The algorithm, introduced in this work, detects the corners of the QR-code. For mobile phone applications a lot of research have been done for detection and reading of barcodes. Although, there is a large dif-ference between mobile phone applications and industrial usage of barcodes re-garding the required performance. When barcodes are utilized in industrial en-vironments the required accuracy is usually high.

Techniques for barcode detection can involve different types of template

(30)

20 4 Barcode detection

ing. A good example of this approach is presented in [Szentandr and Dubská, 2013]. However by utilizing machine learning it is possible to make both more accurate and faster detections. A method for detecting QR-codes in images is pre-sented in [Belussi and Hirata, 2012], where a boosting technique has been used together with Haar-like features. An other example is [Bodnar et al., 2014], where neural networks are used to localize QR-codes in images. In their work a method similar to Local binary pattern, described in section 4.5.3, have been used on ev-ery tile in the images. On the classified tiles, a clustering method is applied to create bounding boxes around the QR-codes.

In this thesis the objective is to investigate detection of 1D-codes and 2D-codes on packages. The dataset that has been provided contains images of traditional 1D-codes and 2D-codes of the type Maxi-codes. The Maxi-codes consist of a pat-tern of hexagonal dots and a circular symbol at the center, as was illustrated to the left in figure 1.1, in chapter 1. This type of 2D-code is mainly used for track-ing and managtrack-ing of packages. The classifiers have been trained with AdaBoost and Random forest from the features described in section 4.5. The detection of 1D-codes is effectively done with features generated from the structure tensor, described in section 4.5.2. The detection of Maxi-codes is mainly done with the features Local binary pattern. This type of features have been found to be pow-erful for texture classification. In work, presented in the paper [Sun et al., 2010], Local binary pattern is used to detect QR-codes with high accuracy.

Barcode detection is in several ways a suitable application for supervised ma-chine learning. Collection of training data can be done relatively easy, by using real or synthetic images. As was mentioned in chapter 3, pixel areas containing barcodes are highly distinguishable by their texture. Consequently, a conceiv-able solution is to use machine learning to train classifiers with features based on texture analysis.

4.2 Overview

The system for detection of barcodes is primarily divided into two parts. The data consists of a large amount of images which contain 1D- and 2D-codes.

Figure 4.1:The left image contains three 1D-codes and one 2D-code and the right image illustrates the corresponding ground truth. The gray areas in the ground truth image are the 1D-codes and the white area is the 2D-code

(31)

4.2 Overview 21 Figure 4.1 illustrates a typical image that contains different codes. Each image has a corresponding ground truth where the parts of the image containing codes have been labeled as "1D-code" or "2D-code" and the rest as "background".

labeled training data

extract features training using Machine learning classifier test data extract features cascade classification

Training

Testing

Figure 4.2:Overview of the system used for barcode detection.

The first part of the system is illustrated to the left in figure 4.2. This part involves training using machine learning which produces a classifier. It consists of the following steps:

• Divide images and ground truth into tiles. • Generate training samples from the tiles. • Calculate features for each training sample. • Use AdaBoost or Random forest for training.

(32)

The objective of the second part of the system is to classify unlabeled data, this part is illustrated in the right side in figure 4.2. The classification is done with the trained classifiers and a cascade. It involves the following steps:

• In each step of the cascade calculate some specific features.

• Use the corresponding classifier for each feature to reduce the amount of data between each step in the cascade.

An overview of the system for barcode detection is illustrated in figure 4.2.

4.3 Dataset

The dataset, provided by SICKIVP, consists of a large amount of gray scale im-ages, similar to figure 4.1. Most of the images containing packages marked with 1D- and 2D-codes of some different sizes and orientation. All images are of size 2048x2048 pixels. For each image the corresponding ground truth has been la-beled. This has been done with MeVisLaB, a program primarily used for process-ing and visualization of medical images. The images are first divided into one training dataset and one testing dataset. Each image is then divided into tiles of the same size. This is illustrated in figure 4.3.

Figure 4.3:Image containing codes has been divided into tiles, to the left is the original image and to the right is the corresponding labels.

To the left is the original image and to the right is the corresponding image con-taining the labels. The images concon-taining labels are divided into tiles as well. Naturally a suitable label is not attainable for all tiles, for example if only half of the tile is covering a code area. If the classifier shall be able to detect the whole code it is necessary to include tiles that are partly outside the code area. For that reason it has been decided that a tile will be considered as "code" if more than 70% of its area is covered with code. Each tile will then be a data sample with a label. If more than 70% of a tiles area is covered with, for example 1D-code, it will get labeled as 1D-code. Consequently every tile will have one of the three different labels:

(33)

4.3 Dataset 23

• Background • 1D-code • 2D-code

4.3.1 Training data

It is important that the training dataset is representative for the test data. If, for example, the classifier shall be able to detect codes of different sizes, the train-ing dataset has to contain codes with those particular sizes. An other important factor is the brightness. Images of different brightness should be included in the training dataset if the classifier shall be able to handle those cases. Although, the dataset used in this work only contains images taken from the same location, thus the lightning condition is similar in all images. Also, in this dataset, just a few different code sizes are included.

The training dataset also contains codes with different orientation. However, all the features used for barcode detection are independent of the orientation of the code. For example, the standard deviation will approximately be the same even if the code is rotated. So even if variation in the training dataset is important, this is not a necessary arrangement to obtain a classifier that can detect codes of different orientation.

When training the classifier a large amount of training samples are extracted from the training images. The training samples can contain areas of background, 1D-code or 2D-code.

Figure 4.4:Four examples of training samples containing 1D-code, the sam-ples are of size 48x48 pixels

Figure 4.5:Four examples of training samples containing 2D-code, the sam-ples are of size 48x48 pixels

Figure 4.6: Four examples of training samples containing background, the samples are of size 48x48 pixels

(34)

Figure 4.4 illustrates four examples of training samples with the label "1D-code" and figure 4.5 illustrates four examples of training samples with the label "2D-code". Figure 4.6 illustrates four examples of training samples which are labeled as "Background".

The training dataset that has been created contains approximately 50% samples containing code and 50% samples containing background. As explained in sec-tion 4.4, four different tile sizes have been tested. For that reason four different training datasets have been created, each containing training samples with one particular size.

4.4 Tile size

The tiles that are extracted from the images have to be of the same size, but it is not obvious which size that is optimal. This can only be determined by testing different tile sizes and evaluate the result. Although it is clear that the tiles can not be too large, in that case the codes will be difficult to detect. The features that are extracted from the tiles are based, in different ways on the structures of the pixel values inside the tiles. For that reason the tiles can not be too small either, since the characteristic structure of the codes might be lost in this case.

One of the problems that can occur with too big tiles is that the ground truth is not calculated correctly. As was explained in 4.3, a training sample will be labeled as "1D-code" or "2D-code", if the code covers at least 70% of its area. For this reason there is a chance that, if the tile is too large, some of the codes will be excluded entirely. This is primarily the case for the 1D-codes since some of them are very thin. Figure 4.7 shows an example of an image containing a 1D-code that is hard to detect. The tile showed in the image is too large which means that the thin 1D-code most likely will be labeled as background.

Figure 4.7:Image illustrating an example when the tile is too large.

The sizes of the codes in the dataset varies and there are not many images which contains thin 1D-codes. By inspecting the result of some difficult images one can

(35)

4.5 Features 25

make a rather good assessment about the necessary tile size. If the tile size is more than 64x64 pixels it will be too big and some of the 1D-codes will not be detected. Figure 4.8 shows an example for the case when the tile size is 48x48 pixels.

Figure 4.8: Image illustrates an example for the case when the tile size is 48x48 pixels.

From these observations the following tile sizes have been considered when train-ing and testtrain-ing the classifiers:

• 24x24 pixels • 32x32 pixels • 48x48 pixels • 64x64 pixels

4.5 Features

This section presents the features used for barcode detection. The purpose of the features was explained in section 3.2. The features used for barcode detection are all based on techniques that, in different ways, describes the structure of the pixel values inside the tiles. As was mentioned in section 4.3, all the features are independent of the orientation of the codes.

4.5.1 Standard deviation

One efficient method to exclude tiles that do not contain any code is to compute the standard deviation of each tile. Tiles that contain code will have a high stan-dard deviation, hence all data with stanstan-dard deviation under a certain threshold can be discarded. This is a fast method to reduce the amount of data before ap-plying other more complicated features. The standard deviation is calculated according to equation (4.5.2), where xi are the pixel values in the tile, N is the

(36)

26 4 Barcode detection std = v u t 1 N N X i=1 (xi−x)¯2. (4.1)

A tile containing 1D- or 2D-codes has a high variety in pixel intensity in all parts of the the tile. In order to reduce the amount of false tiles even more, the tile is divided into four regions according to figure 4.9, and on each region the standard deviation is calculated.

Figure 4.9:The standard deviation is calculated on four parts of the tile. For each tile, four different features are calculated,

fj = std(regionj), j = 1, 2, 3, 4. (4.2)

4.5.2 Structure tensor

The structure tensor, described in [Bigun and H.Granlund, 1987], is often used to analyze image structure. The structure tensor for a tile is produced by first computing the gradients, this can be done by convolving the tile with Sobel filters,

gx and gy. To handle the borders the tile is first padded, i.e. the convolution is

done on a copy of the tile which is slightly larger. The edges of the original tile will be replicated to the edges of the padded tile. The subindices x and y denote the horizontal and vertical derivatives and ∗ is the convolution operator,

∇_{f (x) =} f ∗ gx

f ∗ gy

!

(x). (4.3)

An other way would be to convolve the whole image and then divide it into tiles. The reason for not doing this is because a cascade is used during testing and a large amount of the tiles are discarded in the first step. The structure tensor is then calculated in the following way:

T = (w2∗(∇w1f ∇ T

(37)

4.5 Features 27

where w1and w2are optional weight functions which typically are Gaussian. The

structure tensor will be a 2x2 symmetric matrix:

T = T11 T12 T21 T22

!

, T12= T21. (4.5)

The structure tensor can be decomposed into its eigenrepresentation using eigen-values λ1and λ2and orthonormal eigenvectors ˆe1and ˆe2:

T = λ1ê1êT1 + λ2ê2êT2. (4.6)

The structure tensor can be used in several ways to estimate the structure inside the tile. Areas that contain 1D-codes will have a one dimensional structure, i.e. the gradient in this area will only vary in one direction. On the other hand, for areas that contain 2D-codes the variation will be fairly equal in both directions. The amount of variation of the gradient is measured by studying the eigenvalues of the structure tensor. For an area which only varies in one direction the first eigenvalue will be much larger than the second. For areas which have variation in many directions the values of the two eigenvalues will be more equal.

The structure tensor is primarily used to detect 1D-codes. A tile that covers an area with 1D-code will have the same 1D-structure in all parts of the tile. This is something that highly distinguishes the 1D-codes from the background areas of similar structure. For that reason, in similarity to the standard deviation, the tiles are divided into four parts, illustrated in figure 4.9. The structure tensor is then calculated on all four parts. The features, f1and f2, from the four areas are

calculated in the following way:

f1 = 4 X i=1 λ1(regioni), f2= 4 X i=1 λ2(regioni), (4.7)

where λ1(areai) and λ2(areai), are the first and second eigenvalues for the

struc-ture tensor of area i. For a tile covering a 1D-code, all four parts of the tile will have distinct 1D-structure, hence the first eigenvalue will be higher than the sec-ond eigenvalue for all the four parts. Consequently, the first feature will be very high for tiles containing 1D-code, compared to the second feature.

It has been found that features calculated from the structure tensor works more effectively on thresholded tiles. The reason behind this is that the contrast in the images varies. Even if the training data set contains samples with different contrast level the result is much better with thresholding than without. This approach is used for QR-code detection, presented in the paper [Lin and Fuh, 2013].

(38)

Each tile is thresholded by first calculating the mean of its pixel values,

mean = 1 N N X i=1 I(pi), (4.8)

where N is the number of pixels, I(pi) is the intensity of pixel pi. The tile is then

thresholded by using its mean as threshold. By using the mean the threshold will be adapted to the current tile. The thresholding is illustrated in figure 4.10.

Figure 4.10: A tile containing parts of a Maxi-code, before and after the thresholding has been applied.

The technique of thresholding the tiles in the image separately is called adaptive thresholding, [Gonzalez and Woods, 2001]. It is possible to use something else as threshold instead of the mean of the tile, for example the median. However, the mean has given good results and for that reason it has been used as threshold. The effect of thresholding the tiles before applying the structure tensor is illustrated by plotting the two features.

Figure 4.11:The features of the structure tensor for 3000 tiles, the red circles are background, the blue circles are 1D-code and the green circles are 2D-code.

(39)

4.5 Features 29

The structure tensor have been calculated on 3000 tiles, 1000 samples containing 1D-code, 1000 tiles containing 2D-code and 1000 tiles containing background. Only background tiles with lots of details have been used. Background tiles with lots of details are of most interest, since some of these tiles have a similar struc-ture to the tiles containing code. Figure 4.11 illustrates the result without thresh-olding, the tiles containing background are marked with red circles, the tiles con-taining 1D-code are marked with blue circles and the tiles concon-taining 2D-code are marked with green circles. The features have been normalized between 0 and 1. Figure 4.12 illustrates the result when thresholding has been applied on the tiles before calculating the structure tensor.

Figure 4.12:The features of the structure tensor for 3000 thresholded tiles, the red circles are background, the blue circles are 1D-code and the green circles are 2D-code.

By comparing figure 4.11 and 4.12 the conclusion is that it is easier to distinguish tiles containing 1D-codes from the rest of the tiles if thresholding is used. In figure 4.12 the blue circles are more separated from the other circles compared to 4.11.

4.5.3 Local binary pattern

Local binary pattern is a method that can be used for detection of 2D-codes [Hung et al., 2011]. The basic idea is to compute a binary code for every pixel, based on the difference of the intensity between the pixel and the surrounding pixels, illustrated in 4.13. The surrounding pixels gets the value 1 or 0 depending if they are larger or smaller than the center pixel. The binary code is then transformed to a decimal scalar value. If a 3x3 neighbourhood is used there will be 256 different possible values. For each block a histogram is calculated for all these values.

(40)

Every bin in the histogram is then used as a feature. If there is a bin for every possible value, there will be 256 features.

Figure 4.13: An illustration of Local binary pattern applied on a pixel. The pixel value is compared to the values of its surrounding pixels.

4.6 Code detection with AdaBoost

AdaBoost was the first machine learning technique that was tested for barcode detection. As was discussed in section 1.2, the required processing time of the system is not specified. However, if detection of barcodes is going to work on im-ages in a real time environment it is desired to keep the amount of computations low. One effective way to lower the amount of computations is to use a cascade when classifying the test samples, illustrated in section 2.7. In the cascade one or several features are calculated at each step. In each step the corresponding classifier of the feature is used to classify the sample. If the sample is classified as true it will continue to the next step. Otherwise it will not be used any more.

4.6.1 Step 1: Standard deviation

In the first step of the cascade it is natural to use some features that removes as many false tiles as possible but still keeps all the true tiles. For that reason standard deviation has been used. AdaBoost is not able to handle more than two classes, true and false. When using standard deviation to train a classifier the tiles are divided into two classes in the following way:

• Tiles with the label "1D-code" belong to the class "true". • Tiles with the label "2D-code" belong to the class "true". • Tiles with the label "background" belong to the class "false".

Figure 4.14 illustrates the result when a classifier trained with standard deviation as features has been applied to an image. The rectangles marked in the image are tiles that have been classified as "true". The green rectangles are tiles which have been classified as "true" and also belong class "true". The blue rectangles are tiles that have been classified as true but belong to the class "false". This exam-ple shows that the amount tiles containing background are reduced significantly, while the all the tiles containing 1D- and 2D-code have been correctly classified.

(41)

4.6 Code detection with AdaBoost 31

Figure 4.14: Result for for one example images when using standard devi-ation as features. The green tiles are true detections and the blue tiles are false detections.

4.6.2 Step 2 and 3: Structure tensor

As was mentioned earlier, an AdaBoost classifier is only able to handle two classes, true and false. Since the images contain both 1D- and 2D-codes it is necessary to separate them in some way. This is an other reason to use a cascade, where one of the steps is to separate 1D- and 2D-codes.

Figure 4.12 shows that structure tensor effectively separates 1D-codes from the rest of the tiles. In this step of the cascade a classifier has been trained to detect 1D-codes with only structure tensor as feature. The tiles are divided into two classes in the following way:

• Tiles with the label "1D-code" belong to the class "true". • Tiles with the label "2D-code" belong to the class "false". • Tiles with the label "background" belong to the class "false".

Figure 4.15 illustrates the same example image as in figure 4.14. The image to the left shows the remaining tiles after the first step in the cascade, i.e. after standard deviation has been applied. The image to the right shows the result after the structure tensor classifier has been applied. The green rectangles are tiles which have been classified as "true" and also belong to the class true.

(42)

Figure 4.15:Result for for one example images when using structure tensor as features to detect 1D-codes. The green tiles are true detections and the blue tiles are false detections.

This example shows that the structure tensor effectively distinguishes the 1D-code from the 2D-1D-code and the background.

In the next step in the cascade the structure tensor is used again but this time to detect tiles containing 2D-code. The tiles are divided into two classes in the following way:

• Tiles with the label "1D-code" belong to the class "false". • Tiles with the label "2D-code" belong to the class "true". • Tiles with the label "background" belong to the class "false".

Figure 4.16 illustrates the result on the same example image that was used in figure 4.15. The only difference here is that the classifier was trained to detect 2D-codes instead of 1D-codes.

Figure 4.16:Result for for one example images when using structure tensor as features to detect 2D-codes. The green tiles are true detections and the blue tiles are false detections.

(43)

2D-4.6 Code detection with AdaBoost 33

codes. However there are also some false detections, which are marked with blue rectangles in the right image in figure 4.16. Although, from the plot in figure 4.15, this is expected. The tiles containing 2D-codes and the tiles containing background are not separated that much. However since the structure tensor features already have been calculated for detection of 1D-codes there is no reason not to use it again for detection of 2D-codes.

4.6.3 Step 4: Local binary pattern

In step 4 of the cascade Local binary pattern has been used to train a classifier to detect 2D-codes. As was discussed earlier, Local binary pattern has shown to be effective for detecting 2D-codes in images. The tiles in the training dataset are divided into two classes in the following way:

• Tiles with the label "1D-code" belong to the class "false". • Tiles with the label "2D-code" belong to the class "true". • Tiles with the label "background" belong to the class "false".

Figure 4.17 illustrates the result before and after the classifier trained with Local binary pattern has been applied. The image to the left contains the remaining tiles after step 3 in the cascade, i.e. the image to the right in figure 4.16.

Figure 4.17: Result for for one example images when using Local binary pattern as features to detect 2D-codes. The green tiles are true detections and the blue tiles are false detections.

The example in figure 4.17 shows that Local binary pattern detects the tiles con-taining 2D-codes, however there are also two false detections in the image. Figures 4.14 to 4.17 only shows one test image. Although the result is similar on most of the images in the dataset, but there are also some images with more false detections, this is discussed in chapter 5.

4.6.4 Cascade model

Figure 4.18 illustrates the cascade model for AdaBoost. In the first step standard deviation is calculated to reduce the number of background tiles. In the second

(44)

step the structure tensor is used to separate the tiles containing 1D-code from the rest of the tiles. After the tiles containing 1D-code have been removed, the calculated structure tensor is used again but this time to detect 2D-codes. In the fourth step of the cascade the Local binary pattern is applied to the remaining tiles. test sample Step 1: Standard deviation Step 2: Structure tensor Step 3: Structure tensor Step 4: Local Binary Pattern 1D-code 2D-code false false false false

Figure 4.18:Cascade for barcode detection with AdaBoost

4.7 Code detection with Random forest

Random forest and AdaBoost work in completely different ways regarding how the training samples and the features are used. When using Random forest the cascade model in section 4.6 would not be a good choice. The reason is that Ran-dom forest is most suitable for a high dimensional feature space. The classifiers trained with standard deviation and the structure tensor only uses a few features. This would not be optimal when training a Random forest classifier. One charac-teristic thing with the Random forest algorithm is that it only uses a small part of all features in every split, this makes the trees in the forest more uncorrelated to each other.

One advantage with Random forest is that it is able to handle more than two classes. This can be utilized when using it for code detection since the images contain both 1D- and 2D-codes. One possible way would be to simply train one classifier with all features. However, this would not be very effective since it is already known that some features are more computational than others. The result in section 4.6.1 also indicates that the standard deviation effectively removes false

(45)

4.7 Code detection with Random forest 35

tiles. For that reason standard deviation has been used separately in the same way as in section 4.6, to discard tiles and the the structure tensor and Local binary pattern are used in one classifier. The cascade model for barcode detection with Random forest is illustrated in figure 4.19.

test

samples _deviationStandard

Strucure tensor, Local Binary Pattern 1D-code 2D-code false false

(46)

(47)

5

Evaluation of barcode detection

This chapter presents an evaluation of the two cascade models, presented in sec-tion 4.6 and secsec-tion 4.7. Each cascade model is evaluated with different tile sizes. For each tile size an estimate of the accuracy of the code-detection is calculated and also the average processing time per image. As have been mentioned in ear-lier chapters the focus is primarily on accuracy.

5.1 Parameters

In this section the parameter setting is presented that has been used for training and testing in the evaluation. The parameters for AdaBoost and Random forest were explained in section 2.4 and 2.5.

5.1.1 AdaBoost

For AdaBoost the following parameter setting is used when training the classi-fiers:

• The number of weak classifiers is 100. • The depth of the trees is 1.

• The trim rate of the weights is 0.95.

5.1.2 Random forest

For Random forest the following parameter setting is used when training the classifiers:

• The number of trees in the forest is 100.

(48)

38 5 Evaluation of barcode detection

• The maximum depth of the trees is 15.

• The minimum number of samples in a leaf is 5.

• The number of active variable in a split is the square root of the total num-ber of features.

5.2 AdaBoost

In this section the individual classifiers, trained with AdaBoost, will be combined according to the cascade model presented in section 4.6. As was discussed in section 4.4, four different tile sizes have been considered. In table 5.1 the result in presented for all the four tile sizes. The cascade have been tested with 130 test images. Approximately 100 of the images contain both 1D- and 2D-codes, and the rest contain 1D-codes but no 2D-codes. As was discussed in section 4.3, each test image is divided into tiles , where each tile belongs to one of the three labels, "1D-code", "2D-code" or "background". In the table below the amount of true detections, in percent, and the average number of false detections are presented for 1D- and 2D-codes respectably. Also the average processing time per image have been calculated.

time per image true detections

average false tiles per image tile size 24x24 175 ms 1D: 98% 2D: 84% 1D: 4.9 2D: 8.2 tile size 32x32 130 ms 1D: 98% 2D: 90% 1D: 2.4 2D: 1.1 tile size 48x48 100 ms 1D: 98% 2D: 96% 1D: 0.1 2D: 1.0 tile size 64x64 84 ms 1D: 99% 2D: 94% 1D: 0.03 2D: 1.4 Table 5.1:The result from testing the AdaBoost cascade model.

The result from table 5.1 is discussed in more detail in section 5.4. Although it is clear that the detection of 2D-codes is more accurate for larger tile sizes. The detection of 1D-codes is of high accuracy for all the tile sizes. To get more overview, the result for tile size 48x48 pixels have been plotted in figures 5.1 to 5.4 below.

Figure 5.1 and figure 5.2 illustrate the amount of true detected tiles, in percent, per image and the number of false detected tiles per image for 1D-codes.

(49)

5.2 AdaBoost 39

Figure 5.1: The amount of true detections for 1D-codes from evaluation of the cascade for AdaBoost using tile size 48x48 pixels.

Figure 5.2:The number of false detections for 1D-codes from evaluation of the cascade for AdaBoost using tile size 48x48 pixels.

Figure 5.3 and figure 5.4 illustrates the results for 2D-codes for tile size 48x48 pixels. Figure 5.3 presents the amount of true detections for each image, although the images that only contain 1D-codes and no 2D-codes have been left out.

Figure 5.3: The amount of true detections for 2D-codes from evaluation of the cascade for AdaBoost using tile size 48x48 pixels.

(50)

40 5 Evaluation of barcode detection

Figure 5.4:The number of false detections for 2D-codes from evaluation of the cascade for AdaBoost using tile size 48x48 pixels

Figure 5.1 and 5.3 shows that in most images the majority of all tiles containing 1D- and 2-codes are correctly classified. Figure 5.2 and 5.4 shows that some of the images containing falsely detected 1D- and 2D-codes. The result is discussed in more detail in section 5.4.

5.3 Random forest

In this section the Random forest cascade model, presented in section 4.7, is eval-uated. The testing is made on the same 130 test images that was used for Ad-aBoost in section 5.2. The testing is done on the same four tile sizes and the result is presented in table 5.2 in the same way as in table 5.1.

time per image true detections false tiles per image tile size 24x24 143 ms 1D: 97% 2D: 82% 1D: 9.4 2D: 10.1 tile size 32x32 111 ms 1D: 99% 2D: 90% 1D: 3.6 2D: 0.4 tile size 48x48 99 ms 1D: 99% 2D: 98% 1D: 0.3 2D: 0.3 tile size 64x64 89 ms 1D: 100% 2D: 100% 1D: 0.7 2D: 2.8

Table 5.2:The result from testing the Random forest cascade model

The result when Random forest is used for training the classifiers are very similar to the result for AdaBoost. For 1D-codes the accuracy is very high for all the tile sizes and the accuracy for 2D-codes gets lower for smaller tile sizes.

(51)

5.4 Results and conclusions 41

5.4 Results and conclusions

As have been mentioned earlier the size of the tile is significant for both the ac-curacy and the processing time. Figure 5.5 illustrates the result of one of the test images for all the four different tile sizes. The blue rectangles are tiles that have been correctly classified as 1D-code and the turquoise rectangles are tiles that have been incorrectly classified as 1D-codes. The green rectangles are correct classified 2D-codes and yellow rectangles are incorrect classified 2D-codes. The red rectangles are tiles that have been incorrectly classified as background.

Figure 5.5:The result of one of the test images for the different tile sizes that have been tested.

This test image shows better results for tile size 32x32 pixels and 48x48 pixels. For these two tile sizes most of the code areas have been detected and there are also few false detections. For tile size 24x24 pixels, there are parts of the 2D-code that are incorrectly classified and there are also some false classifications. For tile size 64x64 pixels large parts of the codes are not classified, this is most apparent for the 2D-code and the smallest of the three 1D-codes.