On visionsystems for pallet identification and positioning for autonomous warehouse vehicles

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap

Linköping University Linköpings universitet

g n i p ö k r r o N 4 7 1 0 6 n e d e w S , g n i p ö k r r o N 4 7 1 0 6 -E S

LiU-ITN-TEK-A--17/016--SE

On visionsystems for pallet

identification and

positioning for autonomous

warehouse vehicles

Elias Olsson

Jerry Sundin

(2)

LiU-ITN-TEK-A--17/016--SE

On visionsystems for pallet

identification and

positioning for autonomous

warehouse vehicles

Examensarbete utfört i Elektroteknik

vid Tekniska högskolan vid

Linköpings universitet

Elias Olsson

Jerry Sundin

Handledare Anna Lombardi

Examinator Amir Baranzahi

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

i Linköpings University

Abstract

LITH ITN Master Thesis

On Pallet Identification and Positioning for Autonomous Warehouse Vehicles

by Elias OLSSON& Jerry SUNDIN

This thesis was conducted to provide an overview of different camera systems and possible algorithms to detect and position EUR pallets in a warehouse rack, to be used by autonomous forklifts. Time of Flight cameras, stereo modules and struc-tured light cameras were investigated along with ordinary 2D cameras and their lenses. The implementation evaluated two 2D cameras along with a ZED stereo module from Stereolabs. The resulting system managed to identify and position a pallet with greater than 5 mm precision, but the results were based on limited test data.

(5)

(6)

iii

Acknowledgements

We would like to thank our supervisor at the university, Anna Lombardi, for great support during the entire length of the project. Reiner Lenz for giving us feedback on our ideas and how to improve them. We would also like to thank our supervisors at Toyota Material Handling, Martin Wigren and Daniel Krusell.

(7)

(8)

v

8.4 Step Four - Specify Changes in Pallet Placement in Images Containing Forks . . . 67 9 Discussion 69 9.1 Hardware . . . 69 9.1.1 Choosing a 2D Camera . . . 69 9.1.2 Choosing a 3D Camera . . . 69 9.1.3 The Lenses . . . 70 9.1.4 SOM - Jetson TX1 . . . 70 9.2 Software . . . 70 9.2.1 Machine Learning . . . 70 9.2.2 Image Segmentation . . . 70

9.2.3 Depth vs 2D Image Processing . . . 71

9.2.4 Results - Difference in Algorithm Implementation between 2D Cameras . . . 71

9.2.5 Difference Between MATLAB and C++ Implementation . . . . 71

9.2.6 MATLAB Code Generation . . . 71

9.2.7 The Demo Implementation . . . 72

9.3 Circumstances . . . 72

9.3.1 Only Short Side Handling . . . 72

9.3.2 The Test Setup - Emphasize Lighting More . . . 72

9.3.3 The Test Setup - The Coordinate Table . . . 72

9.3.4 Different Placements Considered . . . 73

9.3.5 Pallets Not Handled . . . 73

9.4 Results . . . 73

9.4.1 Step One - Find a Pallet . . . 73

9.4.2 Step Two - Repeatability of The Camera Systems . . . 73

9.4.3 Step Three - Specify Changes in Pallet Placement . . . 74

9.4.4 Step Four - Specify Changes in Pallet Placement with Forks . . 74

9.4.5 The Best Camera System . . . 74

9.5 Comparing Results with the Existing Solution . . . 75

10 Conclusion 77

11 Future Work 79

(11)

(12)

ix

List of Figures

1.1 Uniformly scaled EUR pallet viewed from both short and long side with the

1:1 dimensions presented in mm. . . 4

1.2 Field of view for a complete vision system with cameras placed in the tip of the forks . . . 6

3.1 Setup for stereo vision using two cameras, where xlpand xrprepresents the left and right image plane. The focal length f and baseline b are known. . . . 14

3.2 Setup for a structured light system, where the unknown position of the POI is calculated using the known projection angle θ, focal length f and baseline b. The positions of the POI in image plane is given as xipand yip, where the y-axis and yipare projected outwards from the figure. . . 15

3.3 The flange focal length for different lens standards [11]. . . 16

3.4 Relationship between lens type and image sensor size [12]. . . 17

3.5 Lenses are normally created in these standard sizes [12]. . . 17

3.6 Focal length relationships. [11]. . . 18

3.7 Field of view illustrated in one dimension either horizontal or vertical. The θ represents the AOV and z’ is a measure of the distance to a scene. . . 19

3.8 The camera model from EMVA 1288. It contains characterization parameters and noise sources. . . 20

3.9 Example of SNR graph from [16]. . . 22

4.1 Example image of a pallet shot at an angle, producing tilted features. . . 26

4.2 Figure 4.1 modified by a projective transform. . . 26

4.3 Orginal image and its histogram and histogram equalized image with corre-sponding histogram . . . 27

4.4 A Gaussian envelope in 3D space, modulated by a sinusoid. . . 29

4.5 The STFT achieves resolution in both time/space and frequency, the resolu-tion in frequency is less than in a normal Fourier transform. . . 29

4.6 The Fourier transform achieves great resolution in frequency. . . 30

4.7 The morphological procedure, where a shape is convoluted to an image. 30 4.8 The result of the two morphological procedures dilation and erosion. 31 4.9 The four different angular regions for the non-maxima suppression, if the orientation gradient of two lines falls in the same region the non-maximas are suppressed.[9] . . . 32

4.10 Illustration of the double threshold filter, with τuand τlas upper and lower thresholds for the edge magnitude. Edges present in the dashed area are only passed if they link up with a segment passing through the τuthreshold. . . . 33

4.11 A straight line represented in the parameters θ and ρ. . . 33

4.12 A set of Hough lines calculated from an image showing different geometric objects. The lines are represented with different radius ρ and angle θ, the black dots represent maximums where the greatest probability is that a line exists. . . 34

(13)

x

5.1 Illustration of the field of view for a single camera, where the big dashed rectangle illustrates the set window size and the point represents the center of the pallet. . . 36 5.2 EMVA data for the selected 2D cameras. . . 38 5.3 Camera positioning inside the forks of a forklift. The dashed lines illustrates

the possible vertical FoV and the distance between the camera and a pallet. . . 44 6.1 The tilt of the camera compensated for using projective transform . . . 47 6.2 Original and histogram equalized depth image. . . 48 6.3 Edge map and the identified Hough lines plotted on the original image. . . 49 6.4 Images cropped to find internal features. Their expected positions are marked

by the blue dots. . . 50 6.5 Each line represents one of the three possible features the heuristic might lock

onto. . . 51 6.6 Original image down sampled and cropped into two separate images

con-taining the left and right fork openings. . . 52 6.7 The image has been cropped for the purpose of identifying the forks. . . 53 6.8 Binary edge map of Figure 6.7 resulting from the edge detection algorithm. . 53 7.1 An example from the scenario without pallet, using a fake pallet short side. . . 56 7.2 An example from the scenario without pallet, using only the steel platform

with extra light. . . 56 7.3 The setup used for step one of the evaluation process. . . 57 7.4 Test pallet with pallet collar on the coordinate table with extra lighting, using

the acA1920-40um camera. . . 58 7.5 Test pallet with pallet collar on the coordinate table with normal warehouse

lighting, using the acA2500-14uc camera. . . 59 7.6 Test pallet with pallet collar on the coordinate table with normal warehouse

lighting, using the ZED stereo module. . . 59 7.7 One photo was taken at each intersection of the lines, the value of each line is

in cm. . . 60 7.8 A non-autonomous forklift positioned as closely as possible to the coordinate

table, the ZED camera is mounted on a camera tripod between the forks. . . . 61 7.9 A non-autonomous forklift positioned as closely as possible to the coordinate

table, the ZED camera is mounted on a camera tripod between the forks. . . . 62 7.10 A depth map image produced by the demo program implemented in c++ on

the Jetson SOM, after processing. . . 62 8.1 Percentage of number of identified pallets for each test scenario of a set of 10

different pallet short sides. . . 63 8.2 STD of the estimated center point in horizontal plane for the different pallets

short sides in the different test scenarios. . . 64 8.3 STD of the estimated center point in vertical plane for the different pallets

short sides in the different test scenarios. . . 64 8.4 Overall performance of the systems with the STD of the errors for each plane

separately. . . 65 8.5 At each vertical position the STD is calculated from all horizontal positions. . 66 8.6 Misplacement of the summed error for each vertical position visited at

differ-ent horizontal positions. . . 67 8.7 STD of the total error in pallet movement in the vertical and the horizontal

(14)

xi 8.8 STD of the error calculated from the misplacement of the pallet relative to the

forks. . . 68 9.1 Images acquired using the TOF camera used on the forklifts today, displaying

(15)

(16)

xiii

List of Tables

1.1 The length of the fork and total distance between the camera system

and pallet. . . 4

1.2 Acceptable errors in pallet position inside the pallet rack . . . 5

1.3 Acceptable errors in forklift position when facing the pallet racks . . . 5

1.4 The required window sizes to cover a pallet in a loading operating, with the possible misplacement of the forklift and pallet accounted for 5 1.5 Typical and theoretical maximum fork size for EUR pallet handling . . 6

5.1 Parameters of the chosen cameras . . . 37

5.2 Specification for Kowa LM4HC F1.4 f8mm 1” [26] . . . 38

5.3 Specification for BASLER C125-0418-5M [27] . . . 39

5.4 A summary of each stereo module and their respective parameters are presented . . . 40

5.5 The HFV for the different camera systems at given distances of 950 mm for short side handling and 1350 mm for long side handling . . . . 40

5.6 The HFV for the different camera systems at a given distances . . . 41

5.7 SL cameras considered with and their specifications. . . 41

5.8 The HFV for the different camera systems at given distances. . . 41

5.9 Precision in _pixelsmm for each camera. . . 42

5.10 Each TOF camera is presented with their respective parameters . . . . 42

5.11 The HFV for the different camera systems at distances required by short and long side pallet handling . . . 43

5.12 The resolution in _pixelsmm for each camera. . . 43

5.13 Cameras consider for placements inside a fork. . . 44

8.1 Horizontal and vertical standard deviation given in mm for ten im-ages taken on the same pallet in the same conditions and on the same position . . . 65

8.2 Horizontal and vertical standard deviation given in mm for the protype using the ZED camera, in two different lighting conditions . . . 65

8.3 Offset of the standard deviation of the horizontal and vertical errors. . 66

8.4 Horizontal and vertical error calculated in a scene where the fork and pallet are adequately aligned. . . 68

(17)

(18)

xv

List of Abbreviations

TOF Time of Flight

EMVA European Machine Vision Association

GPU Graphics Processing Unit

SOM System On Module

PAN-Robots Plug & Navigate Robots

ROI Region of Interest

HT Hough Transform

PMD Photonic Mixer Device

SL Structured Light

SNR Signal to Noise Ratio

POI Point of Interest

FOV Field of View

MP Mega Pixel

AOV Angle of View

QE Quantum Efficiency

DYN Dynamic Range

STFT Short Time Fourier Transform

HAV Horizontal Angle of View

HFV Horizontal Field of View

(19)

(20)

xvii

(21)

(22)

1

Chapter 1 Introduction

Toyota Material Handling offers a range of different autonomous forklifts that relies on a positioning system which yields a precise position of the forklift. To ensure the forklifts position to a known cargo, extra information is needed for pick up and release operations of cargo. Pallets might be rotated or misaligned when placed in racks, and the forklift needs to account for this. Todays autonomous forklifts use a vision system together with other sensors to yield the precision needed. Typically two different cameras are used; one camera used for the pick up operations and the other for unload operations. For the unload operation, fiducial markers are used to locate the unload position in large racks. The aim of this project was to investigate different camera technologies and determine how suitable they were for the task of pallet identification and placement. Different camera positions on the forklift were investigated as well along with different image processing approaches to detect and model a pallet. Prototype code was created in MATLAB and implemented on a Jet-son TX1 NVIDIA Sytstem On Module (SOM).

When choosing a camera in past times there has been a disarray among camera manufacturers and this has been reflected in the data sheets for different cameras. Different manufacturers presented different metrics, in different units and ways, un-til 2003 when the European Machine Vision Association (EMVA) was formed and cre-ated a standard for cameras and image sensors. Japan and other countries have also adopted the standard. It is free to download and get certified by the EMVA 1288 standard. Some parts of the standard are optional, and currently it does not enclose Time Of Flight (TOF) cameras nor non-linear cameras. The standard is created as an aid for choosing cameras and not to replace lab testing and evaluation, which always will be needed.

1.1 History

Warehouse automation is an emerging technology. An interesting prior work was the European PAN-Robots project, comprising six partners in five EU countries and supported by EU funding of EUR 3.33 million. The project was charged with pro-viding innovative technologies for automating logistics operations in the Factory of the Future.

Today Toyota uses a Time of Flight camera system at the moment. The system is used for positioning a pallet to the forklifts forks for pick up and to make sure that a pallet rack is empty before unloading.

(23)

2 Chapter 1. Introduction

1.2 Purpose of the Project

The purpose of the project was to evaluate options for pallet detection and position-ing in order to replace the existposition-ing camera system with a less expensive or improved system. The project was divided in two parts. One is to research and motivate the use of different cameras in different scenarios and different placements/positions on the vehicle. The other part aims to create a specific implementation with one or more camera modules and a development board, then perform comparisons. The project will focus on the pick up operation. Emphasis was put on creating a product which could be tested in the lab located at the Toyota Material Handling Mjölby site. An effect of this choice was that plug and play sensors were preferred which directly could be used on the SOM, such as cameras using USB3 or Gig-Ethernet or other well known interfaces and with a Software Development KitSDK provided.

The following list describes questions that should be answered during the project: 1. How are objects modeled for computer vision.

2. How can a pallet be modeled for computer vision.

3. Which parameters and features are important when choosing a camera mod-ule.

4. Which camera parameter is most crucial to consider for this specific applica-tion.

5. What differences are there for pallet recognition and positioning when using 2D or 3D -images.

6. Which position on the forklift is the most beneficial for camera placement. 7. Which algorithm and camera type of the evaluated ones, is the most beneficial

for the loading operation with a Toyota autonomous forklift.

1.3 Limitations

Cameras on autonomous warehouse vehicles can be used for many different tasks. This project is limited to the very close proximity of the pallet rack. The tasks main focus is limited towards retrieving a pallet rather than unloading one.

1. The Jetson TX1 Development platform will be used to implement the image processing algorithms.

2. Different kinds of image processing can be used to solve the problem. Only if there is enough time, an attempt to implement machine learning and object classification will be made.

3. The solution will not be directly transferable to newly manufactured forklifts. 4. Only EUR pallets will be considered.

5. The forklift is placed on a fixed distance and with a specified rotation towards the pallet rack.

6. The Pallet will appear in a limited area of the image every time due to the precision of the navigation system.

7. Small effort will be put into considerations of different lighting, but enough camera lighting will be used to neglect the shadows from ambient lighting.

(24)

1.4. Outline 3

1.4 Outline

The outline of the project follows:

Chapter 2Related works covers literature and works relevant to this project which were studied at the very start of this project.

Chapter 3An investigation and description of the theory and workings behind dif-ferent cameras and an important machine vision standard called EMVA1288.

Chapter 4A combination of different image processing filters that were used to solve the problem, this chapter provides a more in depth treatment of each filter.

Chapter 5Different camera systems are compared based on how well they fit the requirements stated.

Chapter 6Reveals the final implementation of the program which identified and po-sitioned the pallets.

Chapter 7Shows the method of how the cameras were evaluated and how the tests were conducted.

Chapter 8More than 400 images were produced during testing and the results are presented in a statistical manner.

Chapter 9The discussion is divided in three parts. Concerning different parameters to considered in this project.

Chapter 10 Summaries if the questions asked in the problem formulation is an-swered.

Chapter 11 Based on the experience gained from the project, some ideas are pre-sented on how to proceed in the future.

1.4.1 System Specification

The task of pallet recognition and positioning of the forklifts using the camera sys-tems, starts at a given position related to the pallets supposed position. The system is used for both a pick up and a drop off procedure, the system is limited to only handling EUR pallets whose measurements are shown in Figure 1.1. The system should be able to identify EUR pallets regardless of their color, or cargo (as long as it is positioned inside the boundaries of the pallet).

(25)

4 Chapter 1. Introduction Short side 800 mm 100 mm 145 mm 100 mm 100 mm 144 mm 227.5 mm 227.5 mm Long side 1200 mm 145 mm 145 mm 145 mm 382.5 mm 382.5 mm 78 mm 144 mm

FIGURE1.1: Uniformly scaled EUR pallet viewed from both short and long

side with the 1:1 dimensions presented in mm.

The specifications of the vision systems are set depending on the camera posi-tions which are either at the base of the forks or inside the tip of the forks. The specifications for the camera system when placed at the base of the forks, are based on the specifications of the currently used vision system.

Camera Position at the Base of the Forks

The forklift is limited to only handle pallets placed in racks. Depending on if the forklift is built for handling pallets from its short or long side, the length of the forks differ. The maximum and minimum distances the fork lift is positioned from the pallet when the camera system is placed at the base of the forks are presented in Table 1.1. The total distance is the length of the forks plus a distance of 150 mm which is how far the forks are positioned from the pallet in todays autonomous forklifts systems.

TABLE1.1: The length of the fork and total distance between the camera system and pallet.

Fork type Fork length[mm] Total length[mm]

Short 800 950

Long 1200 1350

The misplacement of the pallet according to its supposed position together with the accepted misplacement between the forks and the pallet for a autonomous fork-lift are presented in Table 1.2. The misplacement of a pallet with a human operator is set to±40 mm.

(26)

1.4. Outline 5

TABLE1.2: Acceptable errors in pallet position inside the pallet rack

Misplacement Long side [mm] Short side [mm]

Horizontal ±25 ±25

Vertical ±20 ±20

Depth (3D) ±20 ±20

Instead of processing the whole image, a search window can be defined to the actual area of the image where a pallet might be present based on the precise posi-tioning system of the forklift. The resulting search window depends on whether the forklift is handling pallets by its short or long side and the possible error in misplace-ment by a human operator. Apart from possible misplacemisplace-ment of the pallet another source of error is that the autonomous forklift itself is misplaced. Table 1.3 presents the possible errors in the positional system, while the different sizes of the window of interest are presented in Table 1.4.

TABLE1.3: Acceptable errors in forklift position when facing the pallet racks

Misplacement Absolute value Direction of travel ± 10 mm

Side shift ± 5 mm

Rotation ± 0.05 deg

TABLE1.4: The required window sizes to cover a pallet in a loading operating, with the possible misplacement of the forklift and pallet

accounted for

Operation type Window size [mm] Long side 1290 x 184

Short side 890 x 184

The camera system must be able to provide the position adjustment information to the forklift within 5 mm precision. This to allow for other factors of errors inher-ent in the hydraulic system and because the forklift is a non-rigid construction. To manage this the resolution of the camera system has to be greater than 1.5mm_pixel . This decision was taken with possible noise and possible dead pixels in consideration.

Camera Position Inside the Forks

The vision system based on cameras inside the forks is limited to EUR pallet short side handling, where the system should be able to keep the same specifications for pallet short side handling as in Section 1.4.1. The system requires depth information from the cameras. The size of the camera system is limited by the dimensions of the forks. Typical dimensions of a fork and a theoretical maximum dimension of a fork for EUR pallet handling are presented in Table 1.5.

(27)

6 Chapter 1. Introduction

TABLE1.5: Typical and theoretical maximum fork size for EUR

pallet handling

Fork size Fork dimensions (width x height) [mm] Typical 100 x 37

Maximal 175 x 70

The thickness of fork walls surrounding the camera are set to 5 mm. An illus-tration of the camera system and its combined field of view are illustrated in Figure 1.2.

Search window

Cameras Forks

FIGURE1.2: Field of view for a complete vision system with cameras placed in the tip of the forks .

(28)

7

Chapter 2 Other Studies Related to

Autonomous Pallet Handling

This chapter summarizes previous work relevant to this problem, from pallet de-tection, computer vision and camera evaluations. Each subsection summarizes one article where the heading of each section is the heading of the article, followed by a small evaluation of the work and how it relates to this project.

2.1 Vision-Based Autonomous Load Handling for Automated

Guided Vehicles

This article presented a method for automatic pallet detection using stereo cameras. The method also estimated the detected pallets position and orientation in a 3D space. For pallet recognition, a sliding window approach together with boosted classifiers to speed up the computations was used. The system was required to have an accuracy of 5 cm and 1 degree at 2.5 meters and 1 cm and 1 degree at a distance of 2 meters. It was required for the Automated Guided Vehicle to be able to find its path to the pallet and pick it up.

The 3D image was collected using two separate cameras with a fixed distance between them, then the depth of the image was calculated using common trigono-metric formulas.

According to the authors, the pallet was modeled as an object with three legs separated by two pockets, where the aspect ratios and dimensional specifications corresponded to the frontal view of an EUR pallet. First the image was preprocessed to remove noisy details and then the contrast was equalized, followed by candidate generation. The candidate generation was performed using Sobel filters, one for each axis. Next the horizontal lines were detected followed by the vertical lines. To remove possible false detection a feature extraction module was used. Four different features were used: difference of mean intensity, mean strength along x/y directions, mean disparity and disparity difference. Different classifiers were used to evaluate the best type.

The results showed that the optimal classifier was an ensemble of classifiers with 1000 boosted decisions trees, with cascaded predictions that relied on both stereo and intensity features. The selected method could perform one computation under 1 second and with the desired accuracy.[1]

2.1.1 Evaluation

The Vision-based Autonomous Load Handling for Automated Guided Vehicles provided a model for a EUR pallet, and a method for pallet detection using stereo vision and

(29)

8 Chapter 2. Other Studies Related to Autonomous Pallet Handling mono vision. The problem formulation in the paper closely resembled the problem formulation of this thesis, to detect a pallet from a given position with demands on accuracy.

2.2 Improved Autonomous Load Handling with Stereo

Cam-eras

The work presented was an improvement of the load handling system related to the previous Section 2.1, and also a part of the PAN-Robots project whose aim was to create an automated logistics environment. The system was stereo vision based and able to provide the position of the pallet with an accuracy of 1 cm and the orientation of the pallet with 1 degree accuracy within the common field of view. An improved pallet detection system was used based on aggregate channel features. Eight image channels were computed from the input image for generating classification features: grayscale, gradient magnitude and oriented gradient magnitudes at six orientations. A single sliding window was used and instead of using several windows the image was re-sized in different scales to be able to detect pallets of different pixel sizes. A boosting classifier was used to classify the content. Boosting is a machine learning algorithm which converts weak learners to strong ones by reducing bias and vari-ance. This improved pallet detection algorithm was based on advances in pedestrian detection where a sweet spot for the decision trees is to limit them to a height of two layers, results showed that pallet detection was improved by allowing a depth limit of four layers. Also the feature vector contained more descriptive features called Normalized Pair Differences, which were proposed by a different work for the specific task of pallet recognition. The NPD represents intensity pair differences normalized to ensure certain illumination invariance properties. The conclusions were based on real world testing.[2]

The paper provided a method based on techniques used for pedestrian detection. It also indicates that machine learning can be used to effectively increase the accuracy of the pallet detection process.

2.3 Focus Based Feature Extraction for Pallet Recognition

The body of the article was focused on creating a suitable model for pallet recog-nition and pallet hole extraction. The flow of the recogrecog-nition process started with limiting the search area to Regions Of Interest (ROI). The ROIs were found by match-ing possible sections of the image with a model of the loadmatch-ing plane of a EUR pallet. The next step was to search for possible pallet holes within the ROI.

The modeling of the loading plane of the pallet was done using Houghs Transform (HT). The variants of the HT used were the Correlated Houghs Transform and the Edge-based Houghs Transform, where the Correlated Houghs Transform was used to identify and extract long and thin lines ranging between 0 to 2π from the image. Each line was ranked by how close they were to the length of the possible objects. If vertical and horizontal lines together created a region that both in size and form resembled a pallet, then the region was marked as a ROI.

(30)

2.4. A Robust Autonomous Mobile Forklift Pallet Recognition 9 To find the pallet holes the algorithm searched the ROI for "Virtual corners" and "Real corners", where the Virtual corners represented mathematically modeled cor-ners and the Real corcor-ners represented feature like corcor-ners. The Virtual corcor-ners were found using a Canny filter together with the Edge-based Houghs Transform to recog-nize edges. The Virtual corner were identified by intersecting vertical lines within the ROI together with the lower straight lines of the loading plane. The corners could later be extracted using the following equations:

x = ρ1−

(ρi−ρlcos θi

cos θl) sin θl

(sin θi−tan θl cos θi)

cos θl y =

(ρi−ρlcos θi_{cos θl})

sin θi−tan θlcos θi

Where the equations were derived from the intersection point between two lines described by their line equations ρi = x cos θi+ y sin θi and ρl= x cos θl+ sin θl. The

indexes of the angles θ and radius ρ describes which line it belongs to. Index i repre-sent the horizontal loading plane and index l reprerepre-sent the one of the four possible vertical lines intersecting with it.

The Real corners were extracted using feature extraction by a heuristic algorithm for region and real corner extraction. Both algorithms later used decision trees, which both had been trained using a set of several pre-classified images.

The results were presented as a table showing the number of false and miss de-tections for different conditions. The result of the algorithm using both the real and virtual corner extraction was compared together with both methods separately. The results are shown for a single parameter set, and a set of optimized values.

The authors claimed that a more robust method was needed to reduce the num-ber of falsely detected ROIs. The reason was to reduce the numnum-ber of false detections in a more complex environment. [3]

The article provided mathematical formulations for feature detections of pallets, which provided a deeper understanding of the main features of a pallet and the extraction of them.

2.4 A Robust Autonomous Mobile Forklift Pallet

Recogni-tion

This paper discussed and evaluated a method on how to identify a pallet using color segmentation in real time. The aim of the paper was to describe a complete recogni-tion process in a robotic industrial applicarecogni-tion.

The authors created an algorithm using color segmentation to segment the im-ages. Using an offline register of possible pallet color, objects that did not fulfill the threshold for a pallet color were rejected. Possible noise on the candidate pallet was removed using a morphological filter. The edges of the candidate pallet was found using a Sobel operator and the lateral sides of the pallet was found using Houghs transform. The forking side of the pallet was set as the line with the maximum length among the Hough lines. The pose of the pallet was calculated using the Coplanar Tsai calibration method together with the calculated positions of the pallets corner coordi-nates.

(31)

10 Chapter 2. Other Studies Related to Autonomous Pallet Handling The algorithm was tested using an Autonomous Mobile Forklift platform devel-oped by the authors together with the standard European EUR pallets. The authors claimed that the discrepancy of a pallets midpoint coordinate was less than 5 cm. The system showed to be successful and meeting the discrepancy in 26 of 30 trials with a processing time of 40 ms. The biggest interference for the system was shad-ows and reflections. [4]

The article provided information about a different approach of pallet recognition which might be useful for pallet recognition in a more demanding environment. The pallet recognition based on color could be used as a complementary part of the image segmentation process used in the previous articles.

2.5 A Comparison of PMD-Cameras and Stereo-Vision for

the Task of Surface Reconstruction using Patchlets

This article compares two common 3D camera technologies, the Photonic Mixer De-vice(PMD) and the Stereo Vision technique. The PMD measures the depth of each pixel in a 2D field of pixels by measuring the time of flight of a coherent infrared light.

The performance and accuracy of the two systems were compared on a rotating rig containing different planar elements called patchlets. The test was performed under optimal conditions for both systems. The authors derived the uncertainty of a patchlet next to its position and normal for both systems. Which showed that it can be expressed the same way for both systems.

The result was compared using the uncertainty in positions and normal of a patchlet as a benchmark. The positioning accuracy of the PMD outperformed the Stereo Camera by a magnitude of one. However the angular uncertainty of a patch-let showed to be smaller with a stereo camera system than for the PMD system. However, using a PMD with a greater resolution evidently decreased the angular uncertainty.

The authors conclude that the PMD system had a greater accuracy than a stereo camera system. But the PMD suffered from low image resolution and also only worked in a given range. However the possibility to use a combined stereo camera and PMD system was considered a viable option. Where the PMD provided its measure to a set of points that could be used as a benchmark for the rest of the stereo pixels yielding greater result.[5]

Different vision techniques were evaluated as part of the thesis problem, this article provided an explanation of some 3D camera systems and a comparison between them.

2.6 Rectangle Detection based on a Windowed Hough

Trans-form

The Article introduces a method based on hough transform for identification of rect-angles. There are several different proposed methods on identifying square objects

(32)

2.6. Rectangle Detection based on a Windowed Hough Transform 11 using Hough transform. One of the most common methods used for rectangle iden-tification is the Generalized Hough Transform. However this method needs a 5-D ac-cumulator array which is demanding in both memory and computational power according to the authors.

The authors propose an algorithm which identifies a rectangle based on it being built up by two pairs of parallel lines with an angle of 90◦

between each pair. A rectangle is detected if:

∆α = ||αk− αl| − 90◦| < Tα

where Tα is an angular threshold that determines if each pair of parallel lines are

orthogonal to each other, and α is the mean angle between the proposed parallel lines. This search was performed on a small section of the image one by one in a sliding window approach.

The results shows that the algorithm in most cases identifies rectangles in both natural and synthetic images. It also performs well in images regarded as noisy by the authors. The algorithm performed well in finding rectangles no matter their size or orientation. The authors conclude that the method works well and with a computational power requirement far less than the Generalized Hough Transform. [6]

The article provided information on rectangle extraction, shortly explaining differ-ent methods available and providing a well documdiffer-ented algorithm. As part of the image segmentation square extraction could have been used based on the square-like structures of a pallet.

(33)

(34)

13

Chapter 3 Image-Capturing Hardware and

related Standards

This chapter describes the theory behind the different camera systems and important parameters to consider when choosing a camera for machine vision. It also covers the EMVA 1288 standard which is a standard for measurement and presentation of specifications for machine vision sensors and cameras.

3.1 Camera Systems

This section describes the techniques behind some of the most common approaches for 3D cameras; the Time Of Flight (TOF), Stereo and Structured Light (SL) cameras. The description of the 2D cameras are left out as they function along the lines of cellphone cameras which are commonplace today. The characteristics of the lens are treated with more care because the lens greatly affects the performance of the whole system.

3.1.1 Time of Flight Camera

The TOF camera, also referred to as a Photonic Mixer Device, is a sensor array able to measure the distance and intensity of each pixel. Each sensor represents one pixel, in one common method the sensor calculates the distance based on the reflected light time it takes for a transmitted signal to get reflected and received again. Since the speed of light is known, the time together with the speed gives the distance. There are several different techniques for time of flight measurements, common ones are Pulsed modulation and Continuous modulation.

The Pulsed modulation sends a pulse and simultaneously starts counting the time until the pulse is received again. With a short pulse it is possible to send a high energetic pulse under a short time, to increase the Signal to Noise Ratio (SNR) of the incoming signal. The power of the pulse is restricted with regards for eye safety.

The continuous modulation offers the possibility to use larger variety of light sources since it is not restricted by having short rise and fall times like the pulse modulation. For continuous modulation it is possible to use a sequence of modu-lated waves which both can be sinusoidal or digital. The distance to the object is calculated using the phase shift of the received signal. The continuous modulation can be used to get a greater working range of the sensor. Many different continuous modulation techniques are available and presented in [7].

The PMD camera uses the continuous modulation method where the phase shift is calculated for each pixel. The possible working range of the camera is restricted

(35)

14 Chapter 3. Image-Capturing Hardware and related Standards by half the wave length of the modulated signal due its repetitive nature. The maxi-mum measuring distance can be calculated as Equation(3.1) according to [5].

λmax=

c

fm (3.1)

Where c is the speed of light in vacuum and fmrepresents the modulated frequency.

The limitations, when it comes to how close the target can be to a camera, are set by when the sensor gets over exposed according to [8].

3.1.2 Stereo Camera

Stereo vision is the use of two separate but equal cameras at a fixed distance from each other. The space between the cameras fixed along a one dimensional axis is called the baseline b. With the focal length of the cameras known, the distance to a Point Of Interest (POI) can be calculated using common trigonometric formulas, illustrated in Figure 3.1.

Lens center plane

Left camera Right camera

Image plane, x x-axis Focal length f

b

z − axis Depth POI(x,y,z)

Left camera axis, zlp Right camera axis, zrp

xlp xrp

FIGURE3.1: Setup for stereo vision using two cameras, where xlpand xrp

represents the left and right image plane. The focal length f and baseline b are known.

Using Figure 3.1 the ratio of the image projection can be derived in Equation(3.2) and Equation(3.3). x z = xlp f (3.2) x − b z = xrp f (3.3)

Combining both ratios the distance to the POI can be calculated according to Equation(3.4).

z = bf

(36)

3.1. Camera Systems 15 One problem using stereo vision is that the corresponding POI must be found and identified for both camera images. To limit the search area one can assume that the corresponding POI is located on the same horizontal axis called the epipolar plane. The pixels on the epipolar plane are compared with corresponding points on the other cameras image.

Edge matching is a common method used for image matching, it filters out all non-changing elements of an image. An epipolar search is then preformed on the filtered image. One problem using the edge matching technique is to be aware of when the depth is not well defined in a region, correlation is used to measure such areas.[9]

Known factors that might cause errors for a stereo camera system are the fol-lowing; camera calibration, low image resolution, occlusions, changes in brightness, motions and low contrast areas [10].

3.1.3 Structured Light Camera

Structured light uses a camera and a projector that projects a known geometrical pattern of light over the cameras field of view. The distance to the illuminated point is calculated using Equation(3.5)

[x, y, z] = b f cot θ − xip

[xip, yip, f ] (3.5)

where b represents a known baseline between the projector and camera and θ repre-sent the projection angle. The xipand yipare the position in the image plane. Figure

3.2 illustrates the setup of a structured light system [9].

x-axis Image plane Projector camera z-axis (0,0,0) POI (x,y,z) θ b f xip

FIGURE3.2: Setup for a structured light system, where the unknown

posi-tion of the POI is calculated using the known projecposi-tion angle θ, focal length f and baseline b. The positions of the POI in image plane is given as xipand

yip, where the y-axis and yipare projected outwards from the figure.

This procedure is repeated for each POI in the image. The illuminated pattern is distorted by the shape of the object which gives information in discontinuities and changes in orientation and curvature. There are several different approaches used for the projection pattern ranging from mesh pattern to scattered dots (used in the Xbox Kinect). The Structured light is often used in industrial environments where a

(37)

16 Chapter 3. Image-Capturing Hardware and related Standards common application is for conveyor bands, scanning objects. One drawback of the structured light technique is that the object needs to be visible for both the projector and camera to calculate the distance. [9].

3.2 The Lens

An important part of the camera module apart from the image sensor, is the lens. The most important parameters to consider when choosing a lens are presented in the following subsections.

3.2.1 Lens Mount

The lens mount is the connection point between the lens and the camera module. The most common mounts for machine vision are: F-mount, C-mount, CS-mount and S-mount. They are characterized by different flange focal lengths according to Figure 3.3. Lenses with C-mount can be used on cameras with CS-mount by using a 5mm adapter. The camera data sheets specifies which kind of mount is required.

FIGURE3.3: The flange focal length for different lens standards [11].

3.2.2 Image Circle/Sensor Size

Lenses are circular and image sensors are often rectangular. The maximum image circle parameter of the lens should be equal or larger than the image sensor size. One effect of having a larger image circle is that the image becomes more centered and suffers less from the degradation of resolution, which occurs from the lens center towards the edges. Lens type or optical format are often given in fractional inches for historical reasons, and defines the outer diagonal of the glass lens rather than the diagonal of the image sensor as in Figure 3.4. Figure 3.5 displays the most used lens types and Figure 3.4 displays the relationship between the lens types and image sensor sizes.

(38)

3.2. The Lens 17

FIGURE3.4: Relationship between lens type and image sensor size [12].

FIGURE3.5: Lenses are normally created in these standard sizes [12].

3.2.3 Focal Length

In order to select an appropriate focal length, the application needs to be considered. Different focal lengths result in different fields of view, where the objects of interest reside. A focal length of 2 mm is considered extra small and result in a "fish eye" view. Large focal lengths of 35 mm or more results in "tele" view, being zoomed in. Focal length is defined as, focal length = _G+BB ∗ g, according to Figure 3.6 where the meaning of the variables are explained. The camera manufacturers often provide lens selector tools optimized for their own product line, to get a good match between lens and camera which helps to select focal length. Most lenses have a fixed focal length, but there are some lenses with adjustable focal length. The ability to have a variable focal length takes a toll on the image quality. Lenses with shorter focal lengths often exhibit pronounced distortion. For these fish eye and wide angle lenses a Field Of View (FOV) is typically declared in the data sheet, where the distortions that appears is regarded as acceptable.[13]

(39)

18 Chapter 3. Image-Capturing Hardware and related Standards

FIGURE3.6: Focal length relationships. [11].

3.2.4 Resolution & Pixel Size

Resolution for an image sensor or a computer monitor is often given in a pixel num-ber of rows times columns. For lenses the resolution is the shortest possible distance between two points or lines on an image while still being able to tell them apart. Resolution is either given as linepairs per millimeter or as pixel size in micrometer. They have the following relationship: pixel size(µm) = _{2∗linepairs/mm}1000 . An image sensor which provides 5 Megapixel (MP) resolution can only do so if the lens also supports 5 megapixel resolution. If the lens supports less, the image may become blurred. If the lens supports more, it is possible to achieve a greater contrast but the cost increases. It is important to consider the megapixel label in relation to the sensor size or pixel size to get a good match.

The Modulation Transfer Function describes the level of contrast which is delivered at different resolutions. A 40% contrast rating is considered good and generally a lens labeled with 5 MP will achieve 25% contrast at that resolution. The resolution of the lens also varies across the field of view, being higher near the center of the image [13]. [11]

3.2.5 Aperture or Iris Range

The f-number can be presented in different ways. It gives the relationship between focal length (f) and aperture diameter (d): k = f_d. A larger f-number implies a smaller aperture opening and less incident light. Depending on the lens the aper-ture can be either fixed iris or manual iris, allowing the user to make adjustments. With larger f-number the effect from image distortions decrease and the depth of focus increase. If the f-number is too large blurring may occur due to diffraction and therefore an optimal aperture setting for a machine vision camera is between 1:2.0 (for pixels < 3 µm and very high resolution lenses) and 1:11 (for pixels > 10µm and low resolution lenses). When choosing between two lenses it is recommended to opt for the one with larger f-number [11]. If the f-number is low the image may be blurred for other reasons (spherical aberrations, astigmatism, field curvature, coma

(40)

3.2. The Lens 19 and distortion). The best image results with F/# = 4, 5.6, 8 according to reference [14].

3.2.6 Spectral Range

Most lenses are optimized for the visible spectrum of 400-700 nm light. When color is not needed a special camera called Near Infra Red light camera can be used, which also require a special lens. Some lenses can handle both cases but they are also more expensive.

3.2.7 Operating Distance

The majority of machine vision lenses are optimized for a distance of 50 cm which is the average distance between camera and object in industrial applications. The cameras Minimal Optical Distance is often given in the data sheet. Standard lenses produce the best images for image scales between 1:∞ and 1:10 (sensor size: object size).

3.2.8 Field of View

Field of View (FOV) is a measure of how large area of the scene the camera can cap-ture. The field of view is a measure in length units while Angle of View (AOV) describes the visible scene in degrees, shown in Figure 3.7. In data sheets the AOV is often referred to as field of view. [15]

FOV

Camera system θ z’

FIGURE3.7: Field of view illustrated in one dimension either horizontal or vertical. The θ represents the AOV and z’ is a measure of the distance to a

scene.

The FOV can be calculated as two separate orthogonal triangles according to Equation (3.6).

F oV = z′

2 sinθ

2 (3.6)

(41)

20 Chapter 3. Image-Capturing Hardware and related Standards

3.3 European Machine Vision Association (EMVA) 1288

Cam-era Model

The EMVA 1288 Release 3.1 is based on the camera model in Figure 3.8. The model is described thoroughly in the standard document and a scaled down presentation follows. Starting from a model it is easier to understand how the characterization pa-rameters are related to each other and where noise appears. The 3.1 version of EMVA 1288 covers monochrome and color digital cameras with linear photo response, 1 electron is generated per absorbed photon. One way to compare cameras is to cre-ate and compare SNR graphs for each camera, computed from their characterization parameters [16]. Furthermore the following general assumptions are made:

1. The amount of photons collected by a pixel depends on the product of irradi-ance E (units W/M2) and exposure time texp(units s), i.e., the radiative energy

density Etexpat the sensor plane.

2. The sensor is linear, i.e., the digital signal increases linearly with the number of photons received.

3. All noise sources are wide-sense stationary and white with respect to time and space. The parameters describing the noise are invariant with respect to time and space.

4. The quantum efficiency is wavelength dependent. The effects caused by light of different wavelengths can be linearly superimposed.

5. The dark current is temperature dependent.

The general assumptions describe the properties of an ideal camera, and one part of EMVA 1288 is to describe by how much the camera deviates from that.

Quantum efficency, η System gain, K ADC µp, σ2p µe, σe2 µy, σ2y Disturbances Quantization noise, σ2 q Dark noise µd, σ2d Input Camera Output Digital gray value Photon noise Number of photons Number of electrons

FIGURE3.8: The camera model from EMVA 1288. It contains characteri-zation parameters and noise sources.

3.3.1 Sensitivity, Linearity and Noise

As illustrated in Figure 3.8 a number of photons(µp) hits the pixel area during

ex-posure time and only a fraction of them are electrons(µe), this fraction is called the

Quantum Efficiency and it depends on the wavelength of the incident light: η(λ) = µe

µp. Inside the camera the charge accumulated gets converted to a voltage, amplified and digitized and then converted into a digital signal through an ADC, shown in Figure 3.8. The whole process is assumed to be linear and described by the Overall System

(42)

3.3. European Machine Vision Association (EMVA) 1288 Camera Model 21 Gain K. The mean digital signal results as: µy = K(µe+ µd) where µdis the mean

num-ber of electrons present without light. The dark noise: µy.dark= Kµdis a signal which

appears when no light is incident and it is mainly due to ambient temperature and the exposure time. The Overall System Gain multiplied with the Quantum Efficiency is defined as the systems responsivity.

3.3.2 Noise Model

Inside the camera the charge accumulated fluctuates statistically and the probability is Poisson distributed, which implies that the variance of the fluctuations is equal to the mean number of accumulated electrons. These fluctuations are called the shot noise: σe2 = µe, and are equal for all types of cameras. The second noise source is

related to the sensor read out and amplifier circuits. It is signal independent and normal distributed with the variance: σ_d2. The last noise source comes from the analog digital conversions and is uniformly distributed between the quantization intervals and has a variance of: σq2 = 121. The signal model is linear and thus the

variance of all noise sources can be summed up for a total temporal variance of the digital signal: σy2= K2(σ2d+ σ2e) + σq2. Using the equations for the shot noise: σe2 = µe

and the mean digital signal: µy = K(µe+ µd) the following equation can be created

which is central to the characterization of a camera: σy2= K2σd2+σq2+K(µy−µy.dark).

3.3.3 Signal to Noise Ratio - SNR

The SNR describes the quality of the signal and is defined as: SN R = µy−µy.dark

σy . The SNR has two areas of interest. Using the previous equations and ignoring the small effect caused by the quantization noise, the SNR for low light conditions becomes: SN Rlow(µp) =√ ηµp

σ2

d+σq2/K2. For bright light conditions: SN Rhigh

=√ηµp. The SNR

curve increases linearly in low light conditions and at high brightness it tapers of in a square root fashion. The SNR of an ideal sensor is defined as: SN Rp = √µp

because it has a Quantum Efficiency (QE) of η = 1 and a dark noise σd= 0, and serves

as a reference in SNR comparison graphs.

3.3.4 Signal Saturation and Absolute Sensitivity Threshold

The range of gray values for a k-bit camera is between 0− (2k− 1). The useful range is however limited at the lower bounds by the absolute sensitivity threshold µp.minand

at the higher bounds by the saturation capacity: µe.sat = ηµp.sat, where µp.sat is the

Saturation Irradiation. The limitations arise mainly because of the dark signal and temporal noise. The Absolute Sensitivity Threshold is defined as when the light is ex-actly as strong as the dark noise and SN Ry = 1. Another common camera parameter

can be derived from these measures which is the Dynamic range = µp.sat

µp.min.

3.3.5 Compare Cameras with the EMVA 1288 Standard

As stated in the standard, it is important to remember that the standard does not de-fine what nature of data should be disclosed. It is up to the component manufacturer to decide if they want to publish typical data, data of an individual component or guaranteed data. However the component manufacturer shall clearly indicate what the nature of the presented data is. The three most important parameters when com-paring cameras are:

(43)

22 Chapter 3. Image-Capturing Hardware and related Standards 2. Dark noise σdin [electrons]

3. Saturation capacity µe.satin[electrons]

Additionally these four derived parameters are useful: 1. Maximum signal to noise ratio (SN Rmax) in [dB or bits]

2. Absolute sensitivity µp.minin [photons]

3. Photon saturation capacity µp.satin [photons]

4. Dynamic range (DYN) in [dB or bits]

If the camera data sheet complies with EMVA 1288 the presented parameters are given while in non standardized data sheet only a few parameters might be pre-sented.

FIGURE3.9: Example of SNR graph from [16].

Figure 3.9 displays an example of how these measures creates an SNR graph. The x-axis displays the number of photons, captured during the exposure time. The y-axis displays the SNR value of the image. Both axes uses a base 2 logarithm and the resulting values are in bits. The green diagonal line passing through Origo is the SNR of light unaffected by the QE or Dark Noise and represents the most ideal case. It has a gradient of 1/2 which in a double logarithmic coordinate system corre-sponds to the square root function. The other straight line is transposed one positive bit on the x-axis because it has a QE of 50 %.

Because of dark noise the line starting at µp.min has a gradient of 1 and

asymp-totically turns into a gradient of 1/2 further above. This happens because at low irradiation the dark noise is large relatively the signal. The DYN of the camera is visible between µp.min and µp.sat and it is in this region that the camera provides a

useful image. At µp.sat the SNR curve of a real sensor becomes flat because the

sen-sor saturates and any increase in photons will not produce more electrons, this value is the maximum signal to noise ratio SN Ry.max.

(44)

3.3. European Machine Vision Association (EMVA) 1288 Camera Model 23 Once the SNR curve of a camera has been created that camera can then be used to determine the quantity of light required. The operating point should be config-ured as given in the data sheet, and it typically involves using a minimum of gain and an offset that places the gray value in darkness µy.dar just above the zero level.

Then when the camera is illuminated such that the gray value nearly saturates at µy.sat the camera is collecting µp.satphotons per pixel. Every other gray value scales

(45)

(46)

25

Chapter 4 Image Processing Algorithms

This chapter discusses different algorithms commonly used for object detection. A raw unprocessed image contains abundant amounts of information. The aim of pre-processing is to remove unnecessary information and by doing so enhancing the features of interest. In this chapter the conversion from a full color image to gray scale and histogram equalization are presented. An effective way to reduce com-putational time of preprocessing algorithms is to crop the image to reduce the total amount of pixels and encompass only the interesting areas. The preprocessing algo-rithms in this chapter are presented in the general order of when they are applied in the code followed by the image filter used. Finally different edge detection methods are presented together with Houghs transform

4.1 Image Crop

The image is cropped according to the measurements defined in Table 1.4 in order to remove noise and reduce computation time by reducing the number of pixels.

4.2 Projective Transformation

The projective transform maps lines to lines but does not retain parallelism. It is a compound transformation which combines translation, rotation and scaling matri-ces into a single matrix. The most prominent features of the pallet are marked by the circles in Figure 4.1, with the lines not being either vertical nor parallel. By using the projective transform the features become parallel and more easily processed in the following stages. Depending on the angle of the camera towards the pallet differ-ent transform matrices are required. Figure 4.2 displays the pallet in Figure 4.1 after using the projective transform.

(47)

26 Chapter 4. Image Processing Algorithms

FIGURE4.1: Example image of a pallet shot at an angle, producing tilted features.

FIGURE4.2: Figure 4.1 modified by a projective transform.

4.3 Conversion to Gray Scale

To increase the computational speed images that contain the full RGB spectrum can be converted to a gray-scale spectrum yielding a single color channel instead of three. The sum of the weighted channels are calculated and set as the gray scale value, see Equation (4.1).

Z = w1R + w2G + w3B (4.1)

Z gives the gray scale value and w1 to w3 represent weights which can be adjusted

depending on the application. In MATLAB the weights are defined as w1 = 0.2989,

(48)

4.4. Histogram Equalization 27

4.4 Histogram Equalization

Histogram equalization is used to increase the contrast of an image. By looking at the histogram gray scale levels and adjusting the occurrence of the gray scale levels to yield a more uniform histogram, illustrated in Figure 4.3.

FIGURE4.3: Orginal image and its histogram and histogram equalized image with corresponding histogram .

This is done by assigning a new gray value for each pixel, calculated using Equa-tion (4.2).

eqHist(i) = (L ∗ histcf(i)) − N

2

N2 (4.2)

Where eqHist(i) represents the equalized gray value i, L is the number of gray levels available, histcf is cumulative frequency of each gray level and N2is the number of

pixels. Each gray value i is then replaced with the calculated eqHist(i).[18]

4.5 Gaussian Smoothing Filter

The Gaussian smoothing filter is used for early stage image processing, it is an effec-tive low pass filter. The filter uses a convolution procedure between a mask and the image. For the Gaussian smoothing filter the values chosen in the kernel follow the two-dimensional Gaussian function, Equation (4.3).

g[i, j] = √ 1 2πσ2

−(i2−j2)

2σ2 _(4.3)

where the σ2 represents the variance of the values and the index values i and j represent rows and columns.

(49)

28 Chapter 4. Image Processing Algorithms One common mask used for the Gaussian smoothing operation is the 3x3 matrix is shown in Equation (4.4).[19] 1 16    1 2 1 2 4 2 1 2 1    (4.4)

4.6 Image Downsampling

Downsampling or decimation of an image is the process of retaining the motive, but using less pixels. The scale between the original and the down sampled image is often 0.5, but can be any integer fraction. This operation can be seen as low pass filtering as very fine structures in the image disappear and only the large features remain.

4.7 Gabor Filter

The Gabor filter is used in the process of edge detection. The response of the filter is greater for specific edges based on their angle and sharpness. The most impor-tant feature of the Gabor filter is that it minimizes the uncertainty in its information by minimizing the product of its standard deviation in the space and frequency do-main. The Gabor filter is implemented as a Short Time Fourier Transform (STFT). The STFT can be compared with the more well known Fourier transform. The Fourier transform of a signal yields very high resolution information about the frequency content of the whole signal with no temporal or spatial information. The STFT on the other hand provides variable resolution on both frequency and spatial informa-tion, by adjusting a fixed window sized. The Gabor function is defined as a Gaussian envelope modulated by a sinusoidal according to Equation 4.5 and can be visualized as in Figure 4.4. Where σ is the Gaussian width, θ is the filter orientation, W is its frequency and φ is the phase shift. X and Y define the center of the filter. [20]

G(x, y|W, θ, φ, X, Y ) = exp−[(x−X)2+(y−Y )2]2σ2 ∗ sin(W (xcosθ − ysinθ) + φ)

(50)

4.7. Gabor Filter 29

FIGURE4.4: A Gaussian envelope in 3D space, modulated by a sinusoid.

Figure 4.5 displays a visualization of the resolution for the STFT and Figure 4.6 displays the resolution of the normal Fourier transform. It is not possible to have both high frequency resolution and spatial resolution at the same time, it is a trade off.

Space Frequency

FIGURE 4.5: The STFT achieves resolution in both time/space and fre-quency, the resolution in frequency is less than in a normal Fourier

On visionsystems for pallet identification and positioning for autonomous warehouse vehicles

LiU-ITN-TEK-A--17/016--SE

On visionsystems for pallet

identification and

positioning for autonomous

warehouse vehicles

Elias Olsson

Jerry Sundin

LiU-ITN-TEK-A--17/016--SE

On visionsystems for pallet

identification and

positioning for autonomous

warehouse vehicles

Examensarbete utfört i Elektroteknik

vid Tekniska högskolan vid

Linköpings universitet

Elias Olsson

Jerry Sundin

Handledare Anna Lombardi

Examinator Amir Baranzahi

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Abstract

Acknowledgements

Contents

List of Figures

List of Tables

List of Abbreviations

Chapter 1

Introduction

1.1

History

1.2

Purpose of the Project

1.3

Limitations

1.4

Outline

Chapter 2

Other Studies Related to

Autonomous Pallet Handling

2.1

Vision-Based Autonomous Load Handling for Automated

Guided Vehicles

2.2