Registration of 2D Objects in 3D data

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Registration of 2D-Objects in 3D-Data

Examensarbete utfört i datorseende vid Tekniska högskolan vid Linköpings universitet

av

Benjamin Ingberg

LiTH-ISY-EX–15/4848–SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

(2)

(3)

Registration of 2D-Objects in 3D-Data

Examensarbete utfört i datorseende

vid Tekniska högskolan vid Linköpings universitet

av

Benjamin Ingberg

Handledare: Ola Petersson

Sick IVP

Examinator: Klas Nordberg

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Avdelningen för Datorseende Department of Electrical Engineering SE-581 83 Linköping Datum Date 2015-06-12 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

ISBN — ISRN

Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Registrering av 2D-objekt i 3D-data Registration of 2D-Objects in 3D-Data

Författare Author

Benjamin Ingberg

Sammanfattning Abstract

Inom fältet industriautomation kan stora besparingar realiseras om man känner till posi-tion och orientering av föremål i deras leverade skick. Kunskap om detta tillåter avancerade robotsystem att arbeta med komplicerade föremål. Specifikt 2D föremål är en tillräckligt stor underdomän för att motivera speciallösningar. Traditionellt har det här problemet lösts med stora mekaniska system, som, utöver att de är dyra, tar upp mycket yta och har svårt att hantera ömtåliga objekt även måste konstrueras som en specifik lösning för varje objektstyp. Denna uppsats undersöker möjligheten att använda sig av registreringsalgoritmer baserade på datorseende i 3D-data för att hitta platta föremål. Det finns system som hanterar lokalis-ering av 3D objekt men de har problem med att hantera essentielt platta föremål då deras positionering främst är en funktion av deras kontur. Uppsatsen består av en undersökning av några 2D-algoritmer och deras uttökning till 3D samt resultat från en implementation som fungerar väl.

Nyckelord Keywords

(6)

(7)

Sammanfattning

Inom fältet industriautomation kan stora besparingar realiseras om man känner till position och orientering av föremål i deras leverade skick. Kunskap om detta tillåter avancerade robotsystem att arbeta med komplicerade föremål. Specifikt 2D föremål är en tillräckligt stor underdomän för att motivera speciallösningar. Traditionellt har det här problemet lösts med stora mekaniska system, som, utö-ver att de är dyra, tar upp mycket yta och har svårt att hantera ömtåliga objekt även måste konstrueras som en specifik lösning för varje objektstyp. Denna upp-sats undersöker möjligheten att använda sig av registreringsalgoritmer baserade på datorseende i 3D-data för att hitta platta föremål. Det finns system som han-terar lokalisering av 3D objekt men de har problem med att hantera essentielt platta föremål då deras positionering främst är en funktion av deras kontur. Upp-satsen består av en undersökning av några 2D-algoritmer och deras uttökning till 3D samt resultat från en implementation som fungerar väl.

(8)

(9)

Abstract

In the field of industrial automation large savings can be realized if position and orientation of an object is known. Knowledge about an objects position and orien-tation can be used by advanced robotic systems to be able to work with complex items. Specifically 2D-objects are a big enough sub domain to motivate special attention. Traditionally this problem has been solved with large mechanical sys-tems that forces the objects into specific configurations. Besides being expensive, taking up a lot of space and having great difficulty handling fragile items, these mechanical systems have to be constructed for each particular type of object. This thesis explores the possibility of using registration algorithms from computer vi-sion based on 3D-data to find flat objects. While systems for locating 3D objects already exists they have issues with locating essentially flat objects since their positioning is mostly a function of their contour. The thesis consists of a brief examination of 2D-algorithms and their extension to 3D as well as results from the most suitable algorithm.

(10)

(11)

1

Introduction

Autonomous object picking is a common task in industrial automation. It al-lows complex, heavy and repetitive tasks to be automated with almost no human labour. While using industrial robots to pick up and mount objects solves many issues in this task, the objects have to have relatively precise specific placements for the robot to be capable of picking them up.

1.1 Research Motivation

Objects often get delivered in bins with fairly random orientation. In order for robotic systems to pick up the objects they need to know the orientation and posi-tion of objects in the bin, i.e. their poses. This can be done using pose estimaposi-tion algorithms. With the pose of the objects known a bin could be emptied and it’s content conveyed to the rest of the production line by a robotic system without human intervention and with a minimal space overhead. This problem is known as autonomous random bin-picking (or bin-picking) and is a problem with many practical applications and has several proposed methods described in literature. See figure 1.1 for an illustration of the bin-picking problem.

Finding the orientation and position of a known object is called the registration problem. We are registering the 3D-pose of objects which can often be solved by some implementation of the Iterative Closest Point algorithm (see 2.6.3 for a description). However for this master thesis we will look at objects that are essentially flat. In bin-picking situations ICP performs poorly on objects that are very flat. For example, a perfectly flat surface (such as the wall or floor of a bin) would give zero error for an ICP based solution if matched against a perfectly flat object.

(14)

2 1 Introduction

Figure 1.1: A robotic system set up to perform autonomous random bin-picking. The green rectangle highlights a 3D-camera system (a Sick Scan-ning Ruler) and the red rectangle highlights a bin with objects in random orientations. The industrial application is to use locate the pose of the in-dividual objects from the bin so that the robot arm can pick them up for further processing.

1.2 Purpose

This master thesis explores the process of determining the 3D-pose of essen-tially 2D-objects using calibrated depth images from a 3D-camera system. A 3D-camera system, in contrast to a 2D-camera, calculates a 3D-position of each pixel in the image. There are several different implementations of 3D-cameras, stereo systems, laser triangulation systems, time-of-flight cameras etc. However in this thesis we will not take any special consideration to the underlying system. Similar research was also performed by Wikander [8] where the research was focused on conveyor belt picking. Similarly to Wikander we will attempt to gen-eralize algorithms used for finding 2D-poses however while Wikanders focus was to find 2D-poses of 3D-objects (i.e. the position and orientation of an object on a conveyor belt) this research focuses on finding 3D-poses of 2D-objects.

1.3 Questions

We will begin by formulating a few questions to be answered by this master the-sis:

• Are there any methods for locating 2D-objects that can be applied on 3D-data?

• Which issues does the application of those methods have to take into con-sideration?

(15)

1.4 Limitations 3

• Are the methods capable of performing pose estimations on essentially flat objects in 3D-data for industrially realistic scenes?

• How do the methods behave with regard to the tilt angles of an object? The last question is interesting since tilt is a property that doesn’t exist in 2D-data.

1.4 Limitations

This master thesis is not looking for a general solution to the bin-picking problem but for one that could work well in an industrial setting for the type of objects we are looking for. Specifically we will implement a system and evaluate it with on a given set up with brake pads and cylinders in different scenes.

A description of the scenes together with a motivation on what kind of property the scene illustrates can be found in 3.1.2. A description of the setup of the camera system can be found in 3.2.1.

While choosing methodologies and implementations certain limitations have been taken into account:

• All items are essentially flat as defined in section 2.3

• If objects have rotational symmetries then any rotation among those sym-metries is a valid rotation

• Objects do not deform (i.e. they are rigid)

• The system does not have to determine that there are no objects in the scene nor if there is more than one

• Objects may have tilt angles with regard to the camera system but the ob-jects point roughly towards the camera. See 2.1 for clarification about coor-dinate system and orientations.

(16)

(17)

2

Theory

We are looking for the rigid transform in 3D that gives us the best estimation of the pose of our essentially flat objects. A 3D pose can be described by 6 parame-ters, three describing the position of the object and three describing the orienta-tion of the object.

Formally we will describe our pose with a rigid transform, a rotation describ-ing the orientation of the object with regard to some reference orientation and a translation describing the position of our pose.

Finding the pose of an object is often called the registration, pose estimation or matching problem.

We will attempt to find this pose with a two step system. In the first step the system will perform a grid search over the space spanned by the parameters in order to find promising candidates. In the second step, called the refinement step, the system will perform mathematical optimization over those candidates. A more thorough description of the system can be found in section 3.2.

Since a rigid transform in 3D is a six parameter problem the grid search must have low resolution in the parameter space and each evaluation of the grid search must be computationally cheap. In the following sections we will look at method-ologies that allow us to reduce the dimensionality of the problem as well as con-structing error functions over which we can perform mathematical optimization.

2.1 Coordinate System

In the following texts we will have a 3D-coordinate supplied by our camera as described by figure 3.4. A rigid transform in 3D has six parameters. Three

(18)

6 2 Theory

scribing the position of the object, which we will call x, y, z, and three describing the rotation of the object, which we will call θ, ψ, φ.

The thesis uses the following terminology to describe the coordinate system:

θ, ψ, φ The rotation around the x, y and z-axis respectively

Tilt Rotation described by the θ, ψ parameters Rotation Rotation described by the φ parameter

2.2 Input Data

Our input data are distance images. Not to be confused with edge distance image as described in section 2.6.1. A distance image is an image that describes the distance to each pixel from the focal point of the camera system. These images are in perspective projective space which has the property that objects that are further away appear smaller in the image.

The input to our system are the calibrated distance maps from a Sick scanning ruler. The camera system also supplies us with the calculated x, y and z-coordinates where of each pixel taking lens distortion into calculation.

For performance reasons outlined in section 2.8 we will transform our input data into height map images using the technique described in section 2.8.1. All fur-ther calculations will be done on height map images which are in orthographic projective space.

2.3 Essentially Flat Objects

An essentially flat object is an object which can be described by a surface with low self-occlusion without necessarily being flat. A cylinder is a good example of a non-flat object that can be described by a flat shape, a circle.

A consequence of introducing essentially flat objects is that we can perform pose-estimation without having to take self-occlusion into consideration.

While the system we describe in this thesis do not require the object to be per-fectly flat, it has been constructed with the assumption that it will be used on objects with tilt angles that occur in an industrial setting. This often means that the object should have sufficiently low self-occlusion such that most of the surface profile is visible with tilts smaller than 30 degrees.

2.4 Registration

In the registration problem we have two sets of data for each object to be regis-tered. One set is called the teach set and the other is called the live set. From the teach set we extract a template that describes the object we are looking for.

(19)

2.5 Projection 7

Figure 2.1: A reference picture illustrating a mask that marks center point (green), surface (blue) and edges (red) of the object. Note that in this example pieces of the image that had bad measurements when the picture was taken has been removed.

This template consists of two sparse 3D point clouds describing an essentially flat object’s surface and edges. The task at hand is to label the live data with the transform that best describes the pose of the object in the live set.

In an industrial application there are several sub aspects of this task that has to be resolved.

• The extraction of a template from the teach set

• Determining whether the current live data actually contains the object • Determining if there is more than one object in the live data

• Finding the pose(s) of the object(s) in the live data

In this thesis we will focus on only finding one pose in a given live scene. Even if there are several objects in a scene we will settle with finding only one of them. The template will be created manually from a good reference picture (the teach set thus only consists of this one sample) or generated from a formal description of the object. See figure 2.1 for an illustration of how the template was extracted from a good reference picture.

2.5 Projection

In the following sections of this thesis we will describe methods to evaluate a given rigid transformation based on the properties of projecting the template into images and measuring the errors through the error functions described in section 2.6.

A projection of a template can be done through matrix multiplication. The pro-jected image coordinates of a specific point can be calculated with a 2x4 matrix multiplication

(20)

8 2 Theory "i0 j0 # ="ki 0 0 ci 0 kj 0 cj # ·"R3x3 T3 0 1 # ·             x y z 1             , (2.1)

where R3x3and T3are the rotation and translation of our rigid transform

respec-tively, (ki, kj) are the resolutions (i.e. pixels per length unit of our image) and

(ci, cj) are the center offsets of our image (i.e. the position of the world origin in

the image).

This matrix is only a 2x4 matrix since we will work on orthographically rectified images as outlined in 2.8.

2.6 Error

The mathematical optimization problem we will solve is non-linear least squares optimization problem. Finding the parameters x that minimizes the square sum of the error function ε(x). Where x describes a rigid transform and ε is a vector valued error function we construct to solve our problem.

That is, our error functions receives a rigid transformation x and creates a set of error values for which we attempt to minimize the square sum. A good error function has the following properties:

Wide The function converges to its global minimum from a wide distance, i.e. there are no local minima close to the global minimum.

Deep The global minimum is easy to distinguish from a local minimum, i.e. the error of the global minimum is significantly smaller than for any of the local minima.

Cheap The function can be evaluated quickly See figure 2.2 for an illustration of an error function.

To minimize the error function we use the Levenberg-Marquardt optimization algorithm [4] (LMA). In practice we can not guarantee that a solution from ap-plying LMA is in a global minimum, therefore we attempt to construct our error functions such that if we get stuck in a local minimum it should still be an accept-able solution for our system. There are countless possible methods that could be used as an error function, in the following sections we will bring up a few notable ones.

2.6.1 Chamfer Distance Matching

Chamfer distance matching[2] is a very old method for solving the registration problem in 2D-images. By performing edge detection on our live image with for example the Canny edge detection algorithm we get a set of edge pixels. We can

(21)

2.6 Error 9

1

2

3

Figure 2.2: An illustration of a one-dimensional error function. Point one is a deep minimum that is probably the global minimum. Point 2 and 3 are local minima, point 2 is a wide local minima that could still correspond to an acceptable solution for our system and point 3 is a shallow local minima that should not be regarded as a candidate solution.

then attempt to perform a rigid transformation on the edge pixels of the template described in 2.4 such that the edge pixels resemble the live image edges.

To make the error function more smooth (i.e. wider) we instead transform the edge images into edge distance images (see figure 2.3 and perform our matching on that image. Edge distance images are images where each pixel corresponds to the distance to an edge.

Figure 2.3:An edge image and its corresponding edge distance image gener-ated with sequential distance transforms, i.e. a chamfer distance image. The image has been rendered into an intensity image where bright areas have short distances to edges while dark areas have long distances to edges.

An edge image can be transformed into an edge distance image through the ap-plication of several algorithms. The algorithm we have used is known as chamfer 3-4 and was invented by Gunilla Borgefors [2]. Matching against a chamfer dis-tance image is thus called chamfer matching.

A chamfer distance is a distance received through sequential application of a dis-tance transform. In equation 2.2 we have the kernel corresponding to a chamfer

(22)

10 2 Theory

3-4 distance transform where the distance from the current pixel to the closest edge is the minimum of neighbouring pixels plus a cost described by the ker-nel. This distance transform gets applied sequentially and thus we end up with a chamfer distance.         4 3 4 3 0 3 4 3 4         (2.2)

That is, the distance is the minimum of current distance plus zero, diagonal dis-tances plus four and horizontal/vertical disdis-tances plus three. If we initialize the distance image with infinity for non edges and zero for edges this can be calcu-lated with only two passes through the images using two half kernels as illus-trated in figure 2.4 and the following pseudo code.

Figure 2.4:Illustration of serial chamfer distance mapping. The kernel does two serial sweeps where in each sweep only half the kernel is evaluated.

f o r i i n columns ascending f o r j i n rows ascending d i s t [ i , j ] = min ( d i s t [ i , j ] , d i s t [ i −1 , j ] + 3 , d i s t [ i , j −1]+3 , d i s t [ i −1 , j −1]+4 , d i s t [ i −1 , j + 1 ] + 4 ) ; end end f o r i i n columns descending f o r j i n rows descending d i s t [ i , j ] = min ( d i s t [ i , j ] , d i s t [ i +1 , j ] + 3 , d i s t [ i , j +1]+3 , d i s t [ i +1 , j +1]+4 , d i s t [ i +1 , j − 1 ] + 4 ) ; end end

(23)

2.6 Error 11

This algorithm has a computational complexity that is linear in the number of pixels only having to find the minimum of five values per pixel twice (8 neigh-bours and the origin twice). It also is a very good approximation of the Euclidean distance[2]. The maximum error with regard to the Euclidean distance of this 3-4 distance transform is 8% while the well known taxi-cab distance is 41%. While the Euclidean distance transform has an error of 0% with regard to itself, it is also very expensive: Being quadratic in complexity to the number of pixels. Since the edge positions are subject to noise in the edge detection step it is a waste of effort to calculate the exact distance to noisy positions.

Chamfer Matching Methods

When performing chamfer matching we have explored two different manners of matching the chamfer distances. One is to match chamfer distances against other chamfer distances. We will call this method "chamfer-chamfer matching" and the other is to match a chamfer distances against edges. Which we will call "edge-chamfer matching".

Chamfer-Chamfer Matching When performing chamfer-chamfer matching you use the rigid transformation to project the template edges into an edge image, then you transform that edge-image into a chamfer-distance image and then take the difference between that distance image with the distance image created from the live data.

This essentially gives one potential error per pixel in the rendering. When match-ing chamfers against chamfers we attempt to find a projection of our template edges such that edge free areas should have large distances to edges and edges should have short distances to edges.

This methodology was tested and rejected for several reasons:

Firstly, you need to define how large of a region you should consider. Figuratively, if you are looking for a needle in a haystack and find a needle, you have to accept that there is a lot of hay around your needle that your template didn’t describe. Secondly, it puts a demand on the template edges being described by curves and the ability to separate the inside and outside of a curve. If it is described by a point cloud then you get issues as illustrated in figure 2.5.

Thirdly it is computationally expensive to create a chamfer distance image for each possible projection.

The third issue could be partially solved by discretizing rotation of the system, pre computing those rotations and then sampling a subset of the pixels in the chamfer distance images.

Edge-Chamfer Matching In edge-chamfer matching you use the rigid transfor-mation to project the template edges directly into the chamfer image created by live data. The value at each projected point is an error.

(24)

12 2 Theory

(a) (b)

Figure 2.5:Illustration of a problem with chamfer-chamfer matching. Figure (a) shows a rendering of an ideal circles edge based on a curve. Figure (b) shows a circle based on a point cloud where the points are too sparse. If you attempt to produce a chamfer-distance image from these sparse points you introduce unnecessary errors.

into consideration that empty areas are far away from an edge. However it is much simpler and cheaper to perform. The calculation can be done by multiply-ing the point cloud by a projection matrix which gives sample positions in the chamfer image.

One issue that does arise is that cluttered parts of the live image will have a lot of edges and thus give false positives since any projection in that region will be close to an edge. Therefore doing edge-chamfer matching is not enough by itself.

Tilt-Angle Bias

Another issue that we can take into consideration when we allow tilt angles in chamfer matching is that since our objects are considered to be essentially flat the projected distances will be shorter when the tilt-angle becomes greater. Normally highly tilted objects are not good candidates for the following systems. Therefore we divide the error by a the cosine of the tilt-angle. This moves our bias back to objects facing the screen but creates a singularity for objects with 90 degree tilts. This is not an issue for our application, real flat objects are not supposed to get stuck in such positions and any line of sufficient length would be able to match all edges perfectly.

This step is not necessary if tilted objects are not an issue.

2.6.2 Normal Matching

As described above chamfer-edge matching will have issues with cluttered areas, to compensate for this we introduced normal matching as an additional source of error values. As described by for example Wang, Peterson and Staib [7] we perform normal matching by first differentiating our height image along the x and y axis. We then calculate the 3D surface normal in each pixel. This gives us

(25)

2.6 Error 13

a normal image illustrated in figure 2.6.

We get our errors from normal matching by projecting the template surface di-rectly into the normal image and taking the difference between the rotated tem-plate normals and the rotated temtem-plate normals corresponding to each temtem-plate surface point.

Hypothetically the error values from normal matching should complement the errors from edge-chamfer matching nicely and experimentally the implemented system is unable to solve trivial scenes if you only use one the other of the errors. See section 4.2.

As described above, edge-chamfer matching will give low errors in cluttered re-gions while the normals of cluttered rere-gions will be pointing randomly giving a high error for normal matching. For large flat surfaces, such as for example the floor or walls of a bin, the normals will give low errors while chamfer-edge matching will give high errors due to the lack of edges.

Figure 2.6: An edge image of a scene together with a normal image of the same scene. The normals are of unit length with the x-, y- and z-axes color coded into 0% to 100% of blue, green and red respectively. Pink, being 1.0 red, 0.5 green and 0.5 blue thus represents a normal pointing towards the camera.

Normal matching gives better results on objects with a 3D-structure. Flat sur-faces are common in an industrial environment while a surface having the same 3D-structure as the object we are looking for would be a rare coincidence. How-ever, the 3D-structure must not be too strong since that would violate the assump-tion of our objects being essentially flat.

Normal matching can also be interpreted as matching the derivative of the height signal, which allows us to remove the effect of an unknown height offset[7].

2.6.3 Iterative Closest Point

Iterative Closest Point, ICP, is a point cloud based methodology of solving the reg-istration problem. As described by Zhengyou Zhang[9], the steps are essentially:

1. For each point in the template find the closest point. Assume these points correspond to your template points.

(26)

14 2 Theory

2. Estimate the rigid transformation best moving the template points to these points. This can be done with a closed form solution.

3. Transform the template points.

4. Repeat the process with the transformed points until satisfied. In practice until the effect of the transformation in step 3 is below some threshold. ICP is a conceptually simple method but is essentially a refinement process that requires a fairly good initial position for the templates.

Using ICP as a refinement step instead of our chosen method (Levenberg-Mar-quardt optimization on our previously mentioned error values) could give more accurate results and/or convergence from worse guesses. However it has not been investigated in this master thesis.

2.6.4 Automatic Differentiation

To solve our optimization problem without resorting to numeric or symbolic dif-ferentiation we will make use of automatic difdif-ferentiation by use of dual numbers as done by Agarwal et al[1].

Having the exact derivatives of our optimization problem greatly increases the performance of our optimization step. While doing the symbolic differentiation might be infeasible, automatic differentiation can be done by the compiler with minor implementation effort.

The basic premise of automatic differentiation through the use of dual numbers is to extend your parameter variables analogous to how it is done with complex numbers. While complex numbers consist of a real and an additional imaginary term a + bi, dual numbers consist of a real and an infinitesimal, which we will call , where 2= 0. The infinitesimal term then corresponds to the derivative of the function.

For any differentiable function f we can then calculate f (x) and f0

(x) simultane-ously for any specific x by simply supplying our function with the dual number

x + 1, i.e. f (x + ) = f (x) + f0_{(x). For example, the automatic differentiation of}

the trivial function f (x) = x2+ 2x + 2 for x = 3 is calculated as

f (3 + ) = (3 + )2+ 2(3 + ) + 2 = 19 + 8. (2.3)

With the usage of a programming language that allows operator overloading (in our case C++) the differentiation can be done by simply changing the data type used in the equations from a vector of real numbers to a vector of dual numbers.

2.7 Interpolation

One issue with trying to perform optimization is the pixel grid. The pixel grid represents sampling of a signal with discrete steps. Usage of exact derivatives

(27)

2.7 Interpolation 15

through automatic differentiation assumes that the function is actually differen-tiable. Which so far is not the case for our described functions, since there is a discrete step between the different pixel values.

This is not an issue when doing the grid search. However because of this discon-tinuous step the error function is actually not differentiable and can therefore not use automatic differentiation. If we want to use our proposed refinement step we have to resort to numerical approximation of the gradient for the optimization. This is both more computationally expensive and less robust than having the gra-dient available.

We can however solve this issue if we implement interpolation of our pixel values. The methodology we will use for interpolation will be basis spline interpolation as described by Michael Unser[6].

2.7.1 Basis spline

Splines are piecewise polynomials smoothly connected together at join points called knots. The terminology comes from ship construction where the shape in a drawing was specified by specific knot points.

Figure 2.7:A drawing of an actual spline. Source: Wikimedia

A spline of degree n can be described by a polynomial of degree n with the addi-tional constraint that the full function, as well as the derivatives up to degree n-1, are continuous, in particular at the knot points.

This results in splines being uniquely describable as by repeated convolutions with the rectangular pulse function function β0(x) which is one for |x| < 0.5, 0.5 for |x| = 0.5 and zero otherwise. βn(x) is then given by

(28)

16 2 Theory

βn(x) =β0∗_βn−1_(x). _(2.4)

The continuous signal from our discrete system can be extracted using the func-tion f (x, y) =X k X l c(k, l)βn(x − k)βn(y − l), (2.5)

where c(k, l) is a grid of constants determined by our original function. This is a fairly cheap computation to do since βn_{(x) is zero for |x| >} n+1

2 .

If the constants c(k, l) are the pixel values s(k, l), the continuous function f (x, y) will only be an interpolant for n < 3 and an approximant for greater n. By adding the additional constraints that we should have a perfect fit at the knots we get the equation

X

k

c(k)βn(x − k) = s(k), (2.6)

which, when rewritten as a convolution for n = 3, gives us the separable filter

c(k) = (b3₁)−1∗_s(k) _where _(b3

1)

−₁ z

←→ 6

z + 4 + z−₁. (2.7)

2.8 Performance Considerations & Dimensionality

Reduction

Doing a grid search through the 6-dimensional search space of our rigid trans-formation requires extremely sparse searching since the parameters cannot be separated. To suit the application the system has to run on general purpose hard-ware and take no longer than a few seconds in its final form.

Using order of magnitude calculations, a CPU can perform roughly 1 billion cal-culations per second. In one second we therefore have about a thousand calcula-tions per iteration if we use ten steps in each dimension and about a thousandth of a calculation if we use a hundred steps in each direction. Ten steps in each dimension is too sparse and a hundred steps in each direction is far too slow therefore we have to do some form of dimensionality reduction.

(29)

2.8 Performance Considerations & Dimensionality Reduction 17

2.8.1 Orthographic Rectification

Since our images are already distance images with real world coordinates we can rectify them into actual height maps through the process of orthographic recti-fication. This will move them from being in perspective projective space to an orthographic projective space. We will perform the rectification such that the z-axis is orthogonal to the image plane and the x- and y-z-axises are perfectly aligned to the pixel grid.

Since we have constructed our error functions so that they are not dependent on the absolute height values this will result in a z-component that has no effect on the error and therefore reduce the search to a 5-parameter search.

The rectification can be performed in two different manners, one is to consider the distance image as a point cloud and use bucketing to construct the rectified image.

In our case bucketing means that we create a grid of x-y values where each 3D-point of our distance image gets assigned to the bucket that has the closest x-y value. A height image is then generated by taking the median-value of each each buckets assigned points.

Another is to regard every 2x2 pixel set as two triangles that span the area then paint the resulting triangles into the new image. This is similar to displacement mapping, a methodology used in computer graphics to create 3D-structure using height maps.

Figure 2.8:An illustration of a displacement in one direction. Source: Wiki-media.

The advantage of using the painting method over a point bucketing method is that it preserves the property that the image is connected while the bucketing methodology could lead to holes where you can see through a solid object to other parts of the image. See figure 2.9 for the result.

(30)

18 2 Theory

(a) (b) (c)

Figure 2.9: Illustration of the problems with point cloud bucketing. Figure (a) shows an intensity image of the scene. The green rectangle shows the area of interest while the red rectangle shows a piece of the floor that has the same x and y coordinates as the area of interest. If the bucketing is done too dense we get the rectified result in figure (b) where points on the floor are visible between points on the object. In figure (c) the buckets are sparse enough that the issue does not appear.

2.8.2 Tilt and Rotation Approximation

Since the objects we are working with are essentially flat, this means that the surface normals of the object should all be facing roughly the same direction. By regarding the surface normals in the region we are currently looking at we can get an approximate value of the tilt. By using this approximate value we can reduce the search space by another two dimensions. This leaves us with a 3-dimensional search space over the x, y, φ parameters, the same search space as used in the original chamfer matching problem described by Borgefors[2].

The extension of the object in the image is unknown before we know the tilt and rotation of the object, therefore we will look only in a small region around where we are guessing the object is and use the average of the tilt values as a guess of the tilt.

2.8.3 Calculation at Multiple Scales

Another method of reducing the search space is to do calculation at multiple scales. The constructed height map is calculated at several different scales by performing the rectification described in section 2.8.1 with different resolutions. The grid search is then done on the coarsest scale. The top ni candidates from

the previous scale are then examined refined at a finer scale and this process is repeated with (with each ni+1< ni) until the finest scale is reached[3].

In order to give a reasonable estimate, multi-scale processing assumes that the object can be reliable found in the coarsest scale with a transform that is similar to the correct transformation.

(31)

2.9 Pre-Computations 19

finest scale. In our application we perform the grid-search on the coarsest scale and switch to Levenberg-Marquardt optimization in the finest scale. In our appli-cation we chose n = 10 since increasing it above that did not give any improved results. The final result of the system was the pose that had the lowest residual error after optimization of those ten candidates.

2.9 Pre-Computations

The system has to perform several pre-computational steps, some are done in the setup stage of the system, when the system has been supplied with teach data. And some are done before each image is searched, each time the system has been supplied with live data. We call these

2.9.1 Teach Data Pre-Computations

When we get teach data we will construct the template corresponding to the ob-ject we are searching for. In our case the template image was constructed by manually marking which parts of a scan was the surface of the searched object and which part was the interesting edges as well as an origin point for the object. As illustrated in figure 2.1.

In a real world application the object could come from CAD-drawings or an au-tomatically processed scan where relevant edges and surfaces would be isolated with background subtraction and edge-detections algorithms.

From this labeled data we can construct two point clouds, one describing the edges of the object, the other describing a vector field of the surface normals. After the relevant point clouds have been constructed the clouds are sampled randomly into a sparse set for use in the grid search and a dense set for use in the refinement.

The sparse point clouds are then rotated so that the normal of the templates ori-gin is perpendicular to the x-y plane. This ensues that the tilt and rotation approx-imation described in 2.8.2 actually corresponds to the tiny region around which we search. Naturally this means that the origin of the object has to actually be on a flat surface.

2.9.2 Live Pre-Computations

Given the live image, the system performs rectification as described in section 2.8.1. The rectified height image is then processed into an edge image using the Canny-edge detection algorithm. Chamfer distances are calculated on this edge image as described in 2.6.1 and then made into a continuous image using third degree B-splines as described in section 2.7.

Besides the continuous image and non-continuous chamfer image the rectified image is also turned into a normal image as described in section 2.6.2 and illus-trated in figure 2.6.

(32)

20 2 Theory

The normal image is created by normalizing the vector

n (j, i) =         (sx∗I) (j, i) (sy∗I) (j, i) 4k         , (2.8)

where sxand syare the 3x3 Sobel kernels in x and y directions respectively, k is

the pixel sampling distance, i.e. how many length units there are per pixel in the rectified image and I is the rectified height image.

An alternative but not explored approach for the normal image would be to make the height image continuous and calculate the gradient for this new continuous signal.

The result of this has not been explored but Sobel filtering as a method for cal-culating the derivative contains an implicit denoising that has to be done on the height signal as well.

2.10 Grid Search

The grid search is the most expensive part of the computation and is done over as dense a grid in the parameter space as can be afforded. At each point in the image the tilt is estimated as described in 2.8.2 and the closest pre-computed tilt is used for the grid search. The errors from the chamfer-edge matching and normal matching methods are computed and the rigid transformations corresponding to the smallest errors are saved for the refinement step.

2.11 Refinement

The refinement step is done by performing Levenberg-Marquardt optimization as described in section 2.6 over the error functions. This optimization uses the continuous chamfer image described in the previous steps.

The refinement process, although performing much more complicated calcula-tions than the exhaustive search, only requires a small fraction of the amount of calculations done in the grid search.

(33)

3

Method

A system is implemented and evaluated according to its ability to solve pose esti-mation for bin picking. The methodologies chosen for errors are evaluated based on their performance on scenes representing different industrial problems.

3.1 Evaluation Setup

The system will be evaluated on two categories of scenes. One determines the ability of the system to determine the tilt-angle of and object and the second is the ability to solve some typical industrial scenes.

3.1.1 Ability to Determine Tilt-Angle

A series of real world images has been taken with a robot arm that has tilted an object into several different positions (see figure 3.1). While the robot arm has high precision movement with regard to itself the position of the origin of the object (as illustrated in figure 2.1) is difficult to measure. Nevertheless a best attempt has been made to configure the robot to rotate the object around its origin.

(34)

22 3 Method

Figure 3.1: Intensity image of the scenes captured with the robot holding a brake pad.

These images allow us to see how the system acts on different tilt angles.

3.1.2 Industrial Scenes

A few scenes have been constructed that correspond to different more or less realistic industrial scenarios. The scenarios illustrate difficulties that could arise in industrial settings. The system has been configured to be able to locate objects in the test-scenes and then each scenario has been tested using repeatedly fewer points from the template.

Brake Pad Scenes

A brake pad is a flat steel part cut our of sheet metal. Stamped or cut steel items are common industrial items. These scenes start out with only the brake pads we are looking for and gets increasingly more cluttered with other stamped steel parts. The final scene puts the pads in a pile where separation of objects are difficult.

In a real industrial application, mixing different type of brake pads and separat-ing them would be a costly and unnecessary step. However the similar brake pads are there to show that the system has some robustness against noise, trash and clutter in the scenes.

Cylinder Scenes

Cylinders and other shapes describable by a circle are common enough in indus-trial settings to warrant special attention. They also are a great example of semi-ordered bin-picking. That is, where the objects are packed in a certain manner but their exact positions are not known.

In semi-ordered bin picking the objects can often be so close to each other that the 3D-structure is not visible in the image data. A realistic solution for this problem is to search for the holes that several objects create when stacked together, rather

(35)

3.2 Hardware & Implementation 23

than the actual objects. However, for a bin picking solution, we are only inter-ested in the outermost items since those are the easiest to pick up.

The first scene consists of cylinders that are completely separated (independent pattern). In the second and third scene the cylinders are semi-ordered. The sec-ond scene has them packed in a grid layout and the third scene has them packed in a brick layout. The grid and brick layout are illustrated in figure 3.2.

Figure 3.2: Illustration of semi-ordered cylinders. To the left are cylinders in a grid layout and to the right are cylinders in a brick layout.

For cylinders a brick layout is much tighter than a grid layout and distinguishing individual cylinders become very difficult for the system since two items that are edge to edge can not be be distinguished in our image resolution.

3.2 Hardware & Implementation

In this section we will describe the hardware and implementation used in the project. The project is written in C++ due to the need to interface with certain libraries implementing much of the required functionality. Figure 3.3 is a flow chart describing giving an overview of the implemented system.

Start Match Input preprocessing Perform Image Rectification Calculate Surface Normal Calculate Chamfer Distance Coarse Search Choose Tilt Angles to Evaluate Residuals from Surface Normals Residuals from Edge distances Refinement P O S E S Levenberg Marquardt Optimization Stop Start Teach Create Template model

(36)

24 3 Method

3.2.1 Camera system

The input comes from a distance camera. The camera system we use is a Sick-IVP scanning ruler, which performs a laser plane sweep across the image and calcu-lates at which moment maximum intensity was reached for each pixel. Combined with a model to compensate for lens distortions and a world coordinate calibra-tion it can then use triangulacalibra-tion to calculate the 3D world coordinates of each pixel. See figure 3.4 for an illustration.

Figure 3.4: An illustration of how the camera system works. During the sweep the camera system knows the angle of the laser plane at every given moment and by detecting the laser line in the image it can then triangulate the 3D world position of that pixel.

The laser sweep allows it to reach high robustness and be resistant to environmen-tal factors common in industrial settings e.g. dust and poor lighting conditions. The camera is calibrated to work at a distance of around 1-2 meters and has a precision of around 3 mm in Euclidean 3D-distance.

(37)

3.2 Hardware & Implementation 25

Using a laser sweep is not without its drawbacks. Depending on the scene the laser and the camera might be occluded from each other and depending on the material there is a risk of the laser scattering. These effects can lead to some points in the distance image missing information about the distance. Occlusion is illustrated in figure 3.5.

Figure 3.5: An illustration of occlusion. The system will see that the laser doesn’t seem to sweep past certain pixels and mark these pixels as missing data.

The implemented system does not take any special consideration with regard to the specific camera system. But it is configured to handle reasonable amounts of missing data.

(38)

(39)

4

Results

In this chapter we will describe the results from running the system and our eval-uation. There are several images showing a projection of the mask from figure 2.1 onto the edge image with the pose calculated by the system. Detailed explana-tions about results can be found in the discussion chapter. The system attempts to solve the problem by minimizing errors in the matching domains illustrated in figure 4.1.

(a) (b)

Figure 4.1:Illustration of the different matching domains, figure (a) is a nor-mal surface image as described in figure 2.6. Figure (b) is an illustration of chamfer distances. The algorithms tries to minimize the reprojection errors of the control points into the data represented by these images.

4.1 Reasoning Behind the Chosen Error Method

The system uses a combination of chamfer-edge matching and normal matching in rectified data as error methods. The following is a summary on other methods

(40)

28 4 Results

and why they weren’t used.

Iterative Closest Point

The motivation behind the thesis is that ICP performs poorly on flat objects, us-ing the error from ICP as errors for the grid search in combination with other methods is computationally prohibitive. Using ICP as a refinement step and the other methods in the grid search has not been explored.

Searching in Non-Rectified Data

The rectification of our input images created a domain where the errors were independent of the z-position of our objects. Grid search in non-rectified images thus increases the search space multiplicatively. Even if we supply the algorithm with the correct z-positioning the perspective projection space disallows certain pre-computations. See section 2.9.

Chamfer-Chamfer Matching

Requires extra information about the object for the projection to be correctly done. See section 2.6.1 for a summary of issues for chamfer-chamfer matching.

Chamfer-Edge Matching without Normal Matching

Technically a subset of chamfer-chamfer matching where only edge-points are considered. Fails catastrophically in cluttered environments. See figure 4.2.

Normal Matching without Chamfer-Edge Matching

Has the same issues as ICP has in regard to flat objects. See figure 4.2.

In conclusion, chamfer-edge matching in combination with normal matching can be implemented with the supplied template data while still solving all the scenes described in 3.1.2 using only a three parameter search space.

4.2 Single Method Result

In section 2.6.2 we describe the hypothesis that the edge chamfer distance match-ing and the normal matchmatch-ing methodologies would complement each other, fig-ure 4.2 is a scene without chamfer matching and normal matching respectively demonstrating the issue with relying on only one of the measurements.

While each method can by itself can give good results in some scenes, they are unreliable without each other. They are prone to getting stuck in bad optimas and the system gives greatly improved results when errors from both algorithms have been used.

4.3 Matching Ability

The following results were gathered by using Edge-Chamfer matching and Nor-mal matching while working on rectified data.

(41)

4.3 Matching Ability 29

(a) (b)

Figure 4.2:Result from relying on only chamfer-edge matching or only nor-mal matching. Figure (a) shows the result of relying on only chamfer-edge matching, where the cluttered edges in the middle of the figure creates a lo-cal minimum that the system can get stuck in. Figure (b) shows the result of relying on only normal matching, where the perfectly flat surface of the floor was considered a better match to the perfectly flat object than the actual objects.

4.3.1 Errors in Tilt-Angle Estimation

This section contains the result from the tilt-angle evaluation described in 3.1.1. The results are the differences between the systems approximated tilt and posi-tion as a funcposi-tion of different tilts. Figure 4.3 is a Matlab color plot where the er-ror is rendered as different colors. The tilt erer-ror is the angular difference between the actual surface normal and the calculated surface normal and the positioning error is the euclidean 2D-distance between the approximated (x,y) position and the objects actual (x,y) position. The measurements correspond to rotations of −_{45 : 5 : 45 degrees around both the x-axis and the y-axis.}

One interesting phenomenon is that the errors are centred around zero tilt, espe-cially the positioning error. Instead the error is smaller when trying to estimate tilt towards the negative rotation about the x-axis (see figure 3.4). This is due to the camera setup (see figure 3.4).

The objects we are looking for are essentially flat and objects that are in the xy-plane are preferential for picking. However, the system consists of two parts, a camera and a laser plane, both the camera and the laser have an angle towards the object (when the object has no tilt). But, for this application, the loss of accuracy from having an angle towards the camera is greater than the loss of accuracy from having an angle towards the laser.

Notice that high angles around the x-axis give very poor results. This is because as the angle of the object with regards to the laser plane or the camera comes closer to 90 degrees the possible information drastically reduces (at greater than 90 degree angles the surface is completely occluded).

(42)

30 4 Results

Position error as a function of tilt (mm)

Tilt along sweep direction (deg)

40 30 20 10 0 -10 -20 -30 -40

Tilt orthogonal to sweep direction (deg)

-40 -30 -20 -10 0 10 20 30 40 5 10 15 20 25 30 (a)

Tilt error as a function of tilt (deg)

Tilt along sweep direction (deg)

40 30 20 10 0 -10 -20 -30 -40

Tilt orthogonal to sweep direction (deg)

-40 -30 -20 -10 0 10 20 30 40 0 2 4 6 8 10 12 14 16 18 20 (b)

Figure 4.3: The errors as functions of tilt angles. Illustrated in (a) is the 2D Euclidean positional error. Illustrated in (b) is the difference between calculated normal and actual normal. The blue to green areas corresponds to a regions with error smaller than 10 degrees.

This could be partially solved by mounting the camera system so that the camera image plane corresponds more closely to the xy-plane.

The system has good tilt angle results (here defined as a less than 10 degree error) for an interval of about 75 degrees of rotation around the y-axis and 35 degrees for rotation around the x-axis. This will obviously vary with the camera setup but the large interval for which the system gives good results along the sweep axis indicates that difficulties with determining tilt-angles is mostly a problem with image acquisition.

4.3.2 Scenario Solutions

This section contains the scenario solutions described in section 3.1.2. The sys-tem is configured to solve these scenarios. Then the syssys-tem gets fewer and fewer projected surface and edge points, hereafter called control points, until the mini-mum amount of required points for each scenes is found. This gives us the num-ber of grid search positions possible to test in one second.

(43)

4.3 Matching Ability 31

Scenario Control Points Grid Samples

Lone brake pads 14 240

Brake pads with low clutter 12 240 Brake pads with medium clutter 16 220 Brake pads with high clutter 18 220

Pile of break pads 28 180

Loose cylinders 4 6700

Semi-packed cylinders 4 6700 Tightly packed cylinders 4 6700

Table 4.1:The number of control points necessary before the different sce-narios gives wrong answers and the number of tested positions per dimen-sion in the grid search that can be completed in under one second on the test computer (Athlon II x4 635 2.9 GHz). This does not include time for pre-computations or refinement.

The results are summarised in table 4.1 and an illustration of the actual results from each scene is available in appendix A.

Brake Pad

For the brake pad scenarios around a dozen control points were necessary for the system to produce a correct solution. As the control points become too few the depth of the error functions (see figure 2.2) becomes too shallow to distinguish a true result from a false positive. Often this is because the projection of the control points are close to edges and normals from other objects. This becomes increasingly unlikely with more control points. None of the scenarios failed when the number of control points was above 28.

Cylinders

In the case of the cylinders the system manages to estimate a pose with extremely few control points, the system didn’t fail for any scenario until there was only one surface normal and one edge to match after which it failed spectacularly. The rotational symmetry of cylinders further reduces the search space of the rigid transform to a 2 parameter search. This, in combination with the very sparse control point requirement reduces leads to a very large amount of possible grid search combinations.

(44)

(45)

5

Discussion

In this chapter we will discuss the results and unexplored areas of research that could be interesting.

5.1 Summary of the Answers Presented by this

Thesis

The motivation behind the thesis is the difficulty for industry standard registra-tion algorithms based on ICP to estimate the pose of objects which are essentially flat. Therefore this thesis explores 2D-registration algorithms and their applica-tion on 3D-data.

Are there any algorithms for registering 2D-objects that can be applied on 3D– data? We have explored chamfer matching and identified two significantly dif-ferent methods of performing chamfer matching, chamfer and chamfer-edge matching. See section 2.6.1. We have also explored the usage of matching surface normals which complements the weak points of chamfer-edge matching.

Which issues does the application of those algorithms have to take into con-sideration? When extending the coarse search from a 3-parameter search (rigid 2D-transform) to a 6-parameter search (rigid 3D-transform) the computational complexity grows to impractical levels. Using image rectification and a heuristic for the tilt angles (see section 2.8.1 and 2.8.2) we have reduced this back to a 3-parameter search over the x-y-φ space. We have also identified a bias for larger tilt-angles which can be compensated for by penalizing larger tilt-angles in the coarse search (see heading tilt-angle bias in section 2.6.1).

(46)

34 5 Discussion

Are the methods capable of performing pose estimations on essentially flat ob-jects in 3D-data for industrially realistic scenes? An implemented system based on a combination of chamfer-edge matching and normal matching has been eval-uated based on its performance on estimating the pose of brake pads and cylin-ders in scenes that represent realistic industrial scenarios, see section 3.1.2 for a description. Given enough control points the system is capable of finding accept-able poses for the described scenes.

How do the methods behave with regard to the tilt angles of an object? The tilt angle characteristics are presented in section 4.3.1 and illustrated in figure 4.3. The system behaves well for tilt around the axis that corresponds to the sweep direction (see figure 3.4 for an illustration of the camera setup). While for tilt around the axis orthogonal to the sweep direction the system performs poorly. This indicates that the system can handle fairly large tilts as long as the cameras image acquisition can handle the tilts well. However the system has not been tested on more than one camera system.

5.2 Reviewed Algorithms

In the thesis we bring up chamfer-chamfer matching, chamfer-edge matching and normal matching.

Normal and edge-chamfer matching on rectified data were chosen as algorithms for the evaluation. Rectified data was chosen due to reducing the search space pa-rameter and normal and chamfer-edge matching was chosen due to their ability to complement each others weakness and their relative cheapness of computa-tion. Since chamfer-edge matching is a strict subset of chamfer-chamfer match-ing it could be replaced with chamfer-chamfer matchmatch-ing if the issues with us-ing chamfer-chamfer matchus-ing described in section 2.6.1 are acceptable. One advantage of this is that chamfer-chamfer matching does not have the problems that chamfer-edge matching has with cluttered scenes described in 4.2. However whether or not this would give increased performance has not been explored by this thesis.

The refinement step could have been decoupled from the grid search to work on non-rectified data while still performing the grid search on the rectified data. This could possibly reduce the risk of losing information in the rectification step, however this was not explored.

5.3 Continuation

If trying to generalize the results from this application one has to consider that the search results are greatly data dependent. The system should work on any object that has strong contours and a well behaved surface, however trying to re-move some of the limitations described in the introduction chapter certain things will have be done.

(47)

5.3 Continuation 35

For objects that are almost rotationally symmetric it is sometimes very important to get the actual rotation, perhaps for the purpose of gripping the object, but there is a great risk of getting stuck in one of the rotation pseudo-symmetries. This could be mitigated by passing objects rotated by their symmetries to the refinement step for each chosen candidate from the grid search but it is not at all certain that the refinement can differentiate between the correct rotation and the pseudo-symmetries.

Another important step in the system is the the method by which we get candi-dates for the refinement step. In this thesis we have used a simple grid search over the x-y-φ space with a simple heuristic to figure out the tilt angles. Analo-gous to research done by Paglieroni et al. [5] the grid search could use the results from nearby grid checks to exclude certain orientations thus greatly reduce the search space.

The normal matching method tries to make sure the surface of the region we are looking for is similar to the surface of the template. However it does not take into considerations discontinuities in the height of the object we are looking for. An object with small discontinuities could still be regarded as essentially flat. An object’s discontinuities are not visible in the normals however the discontinuities could be handled through adding additional errors that tries to match the ex-pected height of the surface. While this would mean that the errors are no longer independent of the z-position of the objects the computational complexity might be able to be preserved through estimating the z-position with a heuristic as was done with the tilt angles.

(48)

(49)

(50)

(51)

A

Scenario Result Images

This appendix contains the images from the scenario results illustrating the an-swer when the system fails and succeeds. The images show a projection of the template with the best estimated pose. Each pair of images show a scene with an acceptable pose estimation and a scene with an unaccepted pose estimation (a failed scene). The failure is triggered by having too few control points in the pose estimation.

A.1 Brake Pads

Figure A.1:Lone brake pads. This scenario has failed because the system got stuck in a local optima.

(52)

40 A Scenario Result Images

Figure A.2: Brake pads with low clutter. In this scenario the local optima was found on another object.

Figure A.3: Brake pads with medium clutter. As the amount of clutter in-creases the possible local optimas increase, in this scenario the solver got stuck on an object which is superficially similar to the correct brake pad.

Figure A.4:Brake pads with high clutter. With few control points and a large amount of clutter the combination of two objects can give a local optima that is too difficult for the system to identify.

(53)

A.2 Cylinders 41

Figure A.5: Brake pads in a pile. In the pile there are many visible objects and through sheer coincidence the control points can fall upon several ob-jects which together form a better solution than a correct pose.

A.2 Cylinders

The cylinders are a specialized application that was able to be solved with very few control points. To induce an error in the cylinder scenes the number of con-trol points has to be reduced to 2, one surface normal and one edge distance. Therefore the errors illustrated are simply where one of the points happened to fall on an and the other fell on a surface.

Figure A.6: Sparse cylinders. Visible in these scenes are shadow edges. Edges that are formed between regions where data is available and regions where data is missing due to occlusion.

Figure A.7: Cylinders in grid layout. In this scene the edges of the objects start to become hard to differentiate because the objects are so close to each other.

(54)

42 A Scenario Result Images

Figure A.8: Cylinders in brick layout. In this scene the edges are very dif-ficult to differentiate since the cylinders are maximizing their touched sur-faces.

(55)

Bibliography

[1] Sameer Agarwal, Keir Mierle, and Others. Ceres solver. http:// ceres-solver.org. Cited on page 14.

[2] G. Borgefors. Hierarchical chamfer matching: a parametric edge matching algorithm. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 10(6):849–865, Nov 1988. ISSN 0162-8828. doi: 10.1109/34.9107. Cited on pages 8, 9, 11, and 18.

[3] Tony Lindeberg. Scale-space theory in computer vision. Kluwer Academic, Boston, 1994. ISBN 0-7923-9418-6. Cited on page 18.

[4] Manolis IA Lourakis. A brief description of the levenberg-marquardt algo-rithm implemented by levmar. Foundation of Research and Technology, 4, 2005. Cited on page 8.

[5] D.W. Paglieroni, G.E. Ford, and E.M. Tsujimoto. The position-orientation masking approach to parametric search for template matching. Pattern Anal-ysis and Machine Intelligence, IEEE Transactions on, 16(7):740–747, Jul 1994. ISSN 0162-8828. doi: 10.1109/34.297956. Cited on page 35.

[6] Michael Unser. Splines: A perfect fit for signal and image processing. Signal Processing Magazine, IEEE, 16(6):22–38, 1999. Cited on page 15.

[7] Yongmei Michelle Wang, Bradley S. Peterson, and Lawrence H. Staib. 3d brain surface matching based on geodesics and local geometry. Com-puter Vision and Image Understanding, 89(2-3):252–271, 2003. doi: 10.1016/S1077-3142(03)00015-8. URL http://dx.doi.org/10.1016/ S1077-3142(03)00015-8. Cited on pages 12 and 13.

[8] Student2009 Wikander, Gustav. Three dimensional object recognition for robot conveyor picking [Elektronisk resurs] / Gustav Wikander. Linköping : Univ., 2009, 2009. URL https://lt.ltag.bibl.liu.se/login?url= http://search.ebscohost.com/login.aspx?direct=true&db= cat00115a&AN=lkp.609162&site=eds-live. Cited on page 2.

[9] Zhengyou Zhang. Iterative point matching for registration of free-form

(56)

44 Bibliography

curves and surfaces. International journal of computer vision, 13(2):119–152, 1994. Cited on page 13.

Registration of 2D Objects in 3D data

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Registration of 2D-Objects in 3D-Data

Registration of 2D-Objects in 3D-Data

Examensarbete utfört i datorseende

vid Tekniska högskolan vid Linköpings universitet

av

Sammanfattning

Abstract

Contents

1

Introduction

1.1

Research Motivation

1.2

Purpose

1.3

Questions

1.4

Limitations

2

Theory

2.1

Coordinate System

2.2

Input Data

2.3

Essentially Flat Objects

2.4

Registration

2.5

Projection

2.6

Error

2.6.1

Chamfer Distance Matching

1

2

3

2.6.2

Normal Matching

2.6.3

Iterative Closest Point

2.6.4

Automatic Differentiation

2.7

Interpolation

2.7.1

Basis spline

2.8

Performance Considerations & Dimensionality

Reduction

2.8.1

Orthographic Rectification

2.8.2

Tilt and Rotation Approximation

2.8.3

Calculation at Multiple Scales

2.9

Pre-Computations

2.9.1

Teach Data Pre-Computations

2.9.2

Live Pre-Computations

2.10

Grid Search

2.11

Refinement

3

Method

3.1

Evaluation Setup

3.1.1

Ability to Determine Tilt-Angle

3.1.2

Industrial Scenes

3.2

Hardware & Implementation