Robot Tool Center Point Calibration using Computer Vision

(1)

Robot Tool Center Point Calibration

using Computer Vision

Master’s Thesis in Computer Vision

Link¨oping Department of Electrical Engineering

by

Johan Hallenberg

LiTH - ISY - EX - - 07/3943 - - SE Department of Electrical Engineering

Link¨opings University, SE-581 83, Link¨oping, Sweden

(2)

Distributed by:

Link¨opings University

Department of Electrical Engineering SE-581 83, Sweden

(3)

Robot Tool Center Point Calibration

using Computer Vision

Master’s Thesis at Computer Vision Laboratory

University of Link¨oping

by

Johan Hallenberg

Reg nr: LITH - ISY - EX - - 07/3943 - - SE

Supervisors: Klas Nordberg

ISY, Link¨oping University

Ivan Lundberg

ABB Corporate Research Center, Sweden

Examiner: Klas Nordberg

ISY, Link¨oping University

Link¨oping 2007 - 02 - 05

(4)

(5)

Abstract

Robot Tool Center Point Calibration using Computer Vision By: Johan Hallenberg

MSc, Link¨oping University, February 2007 Examiner: Klas Nordberg

Supervisors: Klas Nordberg and Ivan Lundberg

Today, tool center point calibration is mostly done by a manual procedure. The method is very time consuming and the result may vary due to how skilled the operators are.

This thesis proposes a new automated iterative method for tool center point calibration of industrial robots, by making use of computer vision and image pro-cessing techniques. The new method has several advantages over the manual cali-bration method. Experimental verifications have shown that the proposed method is much faster, still delivering a comparable or even better accuracy. The setup of the proposed method is very easy, only one USB camera connected to a lap-top computer is needed and no contact with the robot tool is necessary during the calibration procedure.

The method can be split into three different parts. Initially, the transforma-tion between the robot wrist and the tool is determined by solving a closed loop of homogeneous transformations. Second an image segmentation procedure is described for finding point correspondences on a rotation symmetric robot tool. The image segmentation part is necessary for performing a measurement with six degrees of freedom of the camera to tool transformation. The last part of the pro-posed method is an iterative procedure which automates an ordinary four point tool center point calibration algorithm. The iterative procedure ensures that the accuracy of the tool center point calibration only depends on the accuracy of the camera when registering a movement between two positions.

(6)

(7)

Acknowledgements

I have had the opportunity to write my Master’s Thesis at ABB Corporate Re-search Center, Mechatronics department, in V¨aster˚as. This period has been a great experience for me and I have really enjoyed working with the project.

I want to thank my supervisor Development Engineer Ivan Lundberg at ABB Corporate Research Center for his great enthusiasm, all the conversations, and for the extremely good supervision throughout the project.

I also want to thank my supervisor and examiner Associate Professor Klas Nordberg at Link¨oping University, for all the great phone conversations, the good tips and all the support and guidance throughout the project.

Special thanks also to PhD. Mats Andersson at the Center for Medical Image Science and Visualization for his great interest in the project and helpfully ideas. Link¨oping, February 2007

Johan Hallenberg

(8)

(9)

Notation

Symbols

• f (x, y) denotes a 2D function or an image. • x, X denote scalar values.

• x, X denote vectors or coordinates. • X denotes a matrix.

Operators

• f (x, y) ∗ g(x, y) denotes the convolution between the image f (x, y) and the image g(x, y).

• f (x, y) ? g(x, y) denotes the cross correlation between the image f (x, y) and the image g(x, y).

• XT _{denotes the transpose of X.}

• X†_{denotes the pseudo inverse of X.}

Glossary

TCP Tool Center Point.

Jog Manually move the robot with joystick. Base frame The robot’s base coordinate system. Robtarget Cartesian target.

DOF Degrees of freedom.

Q.E.D ”Quod Erat Demonstrandum” a latin phrase meaning ”which was to be demonstrated”.

(10)

(11)

Chapter 1 Introduction

This chapter begins with a short background, then the problem is specified and the objectives and delimitations are determined. Finally the disposition of the thesis is presented.

1.1 Background

All robots delivered by ABB are calibrated in the production. When the calibra-tion procedure is done the robot is calibrated up to the last axis and is thereby able to calculate the positions of all axes during a movement. When a tool is mounted on the last axis (the tool flange) the robot needs to know the actual position of the active point of the tool, the tool center point, which for instance can be the muzzle of a spot welding tool. For this reason a tool center point calibration has to be performed every time the tool is changed.

Today the TCP calibration is done manually by moving the robot, letting the TCP brush against a fixed point in the environment. The fixed point is typically the tip of a nail. The tool center point needs to brush against the tip of the nail with great precision and from at least four different angles. Then the coordinate of the tool center point can be calculated by the robot in relation to the robot’s tool flange coordinate system. The method is time consuming and the result may vary due to how skilled the robot operator is. The resulting accuracy of todays method is approximately ±1 mm.

Products for automating the TCP calibration exist, for instance BullsEye for calibration of arc welding tools and TCP-Beam for calibration of spot welding tools. The disadvantage with these methods are that they are specialized to one single type of robot tools.

(16)

1.2 Problem Specification/Objectives

The purpose of this thesis is to investigate whether or not a computer vision method can be used to automate the tool center point calibration process. If it is possible to find a method then the accuracy of the method is also of great in-terest. The objective is to find a method with an accuracy of at least ±1 mm, that automates the tool center point calibration procedure. For development of the method C#, C/C++ and Matlab are allowed to be used, but the final demonstration software shall be programmed in C# and C/C++.

1.3 Delimitation

To deliver a method with the ability to calibrate all kinds of robot tools during a 5 months Master’s thesis is of course impossible. Instead the method is of impor-tance, and any kind of tool is permitted to be used to demonstrate the method.

1.4 Thesis Outline

Chapter 1 Chapter 1 describes the background to the project. The problem specification, delimitations and the thesis outline are also presented.

Chapter 2 Chapter 2 gives a short introduction to robots and the chapter concludes with a description of a four point tool center point calibration algorithm. Chapter 3 In chapter 3 some theory needed to understand the

rest of the thesis are presented. The Chapter is divided in image processing theory, computer vision theory and transformation theory.

Chapter 4 Chapter 4 describes the proposed method in detail and the configuration of the devices is presented.

Chapter 5 Chapter 5 describes in detail how the image segmentation of the robot tool was done.

Chapter 6 Chapter 6 presents the results of several tests performed to investigate the accuracy of the proposed method. Chapter 7 Chapter 7 presents the conclusions and difficulties of

the project. The chapter concludes with a discussion of future development.

Appendix A Appendix A presents some results from the camera calibration, performed with the Camera Calibration Toolbox in Matlab.

(17)

Chapter 2 Robotics

This chapter will give an overview of robots and their coordinate systems. The chapter concludes with a description of a four point tool center point calibration algorithm used in the robots today.

2.1 Overview

Traditionally a robot consists of a mechanical arm and a computer controlling its position and movement. In this project only serial kinematics manipulators will be concerned. The serial kinematics manipulators consist of several axes connected by joints see figure 2.1.

Figure 2.1: The ABB Robot IRB 6600 (Courtesy of ABB).

(18)

Usually the mechanical arm (the manipulator) is divided into an arm, a wrist and an end-manipulator (the tool). All the manipulators used during this project have got six degrees of freedom, making it possible to position the tool anywhere in the robot workspace with a given orientation. Internally the robot end effector orientation is represented by quaternions.

The ABB robots can be manually controlled by using the flexpendant. The flexpendant is a hand held controller connected to the computer controlling the robot. A robot program telling the robot how to move can be written in the program language RAPID. The RAPID programs can be written directly in the flexpendant but it is also possible to develop the RAPID program on a PC and then transfer the program (by a TCP-IP connection or a USB connection) to the computer controlling the robot.

2.2 Coordinate Systems

2.2.1 Tool Center Point, TCP

When a robot movement is programmed by specifying a path of robtargets for the robot to follow, then all the robot movements and robot positions are relative to the Tool Center Point (TCP). Normally the tool center point is defined to be the active point of the tool e.g. the muzzle of a spot welding tool or in the center of a gripper.

Several tool center points can be defined e.g. one for each tool, but only one tool center point can be active at a given time. When the robot is programmed to move along a given path, it is the tool center point that will follow the actual path. It is possible to define other coordinate systems and program the robot to move according to these coordinate systems. The tool center point is then expressed in relation to the coordinate system used in the program [1].

2.2.2 Base frame

The robot base coordinate system is located on the robot base. The origin of the base frame is located at the intersection of axis 1 and the robot’s mounting surface. The x axis is pointing forward, the y axis points in the robot’s left side direction and the z axis coincides with the rotational axis of axis 1. Figure 2.2 illustrates the base frame [1].

(19)

2.2. COORDINATE SYSTEMS 5

Figure 2.2: The base coordinate system (Courtesy of ABB).

2.2.3 World frame

If several robots are working in the same area, a world frame can be set up and the robots base coordinate systems can be expressed in relation to the world coordi-nate system. It is then possible to make RAPID programs telling the robot to move to a certain position in the world frame. If one of the robots is mounted up side down, the definition of a world frame will of course simplify the programming of the robot. Please, see figure 2.3

[1]

Figure 2.3: The world coordinate system (Courtesy of ABB).

2.2.4 Wrist frame

The wrist frame is fixed to the tool flange/mounting flange (the surface where the tool is to be mounted). The origin of the wrist frame is positioned in the center

(20)

Figure 2.4: The wrist coordinate system (Courtesy of ABB).

of the tool flange and the z axis coincides with axis six of the robot. There is a calibration mark at the tool flange, see figure 2.5. The x-axis of the wrist frame is always pointing in the opposite direction of the calibration mark and the y-axis is achieved by constructing an orthogonal coordinate system axis to the x and z axes.

[1]

2.2.5 Tool frame

To be able to define a tool center point, a tool frame is needed. The tool frame is a coordinate system on the tool, and the origin of the tool frame coincides with the tool center point. The tool frame can also be used to obtain information about the direction of the robot movement.

[1]

(21)

2.3. TCP CALIBRATION ALGORITHM 7

2.3 TCP Calibration algorithm

Today, the TCP calibration is done by jogging the robot making the TCP brush against a fixed point in the close surrounding to the robot. The fixed point is for instance the tip of a nail. By jogging the robot, making the TCP brush against the tip of the nail from at least four different orientations, the coordinate of the TCP in relation to the robot base frame is calculated.

Let T1 be the position of the tool flange in relation to the robot base frame,

e.g. the robot’s forward kinematics. Assume N different positions of the robot are known, making the TCP brush against the fixed point in the environment. Then N different transformations T1i are also known, where i ∈ [1...N ]. Let

h

T CPx T CPy T CPz

iT

be the translation from the origin of the tool flange coordinate system to the tool center point.

Let Q be the homogeneous transformation from the tool flange to the tool center point: Q =      1 0 0 T CPx 0 1 0 T CPy 0 0 1 T CPz 0 0 0 1     

It is obvious that equation 2.1 is satisfied, due to the fact that the tool center point is at the same coordinate independent of which of the N robot positions that is examined. T1iQ = T1jQ (2.1) Where i, j ∈ [1...N ] and i 6= j. Denote: T1i =      a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44      =         Ra ta 0 0 0 1         (2.2) T1j =      b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34 b41 b42 b43 b44      =      Rb tb 0 0 0 1      (2.3)

Examine one row of equation 2.1 gives

a11T CPx+ a12T CPy + a13T CPz+ a14= b11T CPx+ b12T CPy+ b13T CPz+ b14

(22)

(a11− b11)T CPx+ (a12− b12)T CPy+ (a13− b13)T CPz = −(a14− b14) (2.4)

For every equation 2.1 a system of three equations like equation 2.4 is retrieved. The system of three equations can be rewritten as

h Ra− Rb i    T CPx T CPy T CPz   = − h ta− tb i (2.5)

By using all distinct combinations i, j of the N robot positions, a system of equations 2.5 is retrieved. h T CPx T CPy T CPz

iT

is then easily calculated as the linear least square solution of the system of equations.

As the rotation part of the transformation from the tool flange coordinate sys-tem to the tool coordinate syssys-tem is of no importance for determine the tool center point, all information needed for determine the coordinate of the tool center point is hereby achieved.

(23)

Chapter 3 Theory

This chapter will describe theories necessary for understanding the rest of the the-sis. The chapter is divided in image processing theories, computer vision theories and transformation theories.

3.1 Image Processing Theory

3.1.1 Threshold

The threshold operation is a grey level/one color channel image processing method resulting in a binary image. Let the image be f (x, y), the resulting binary image b(x, y) and the threshold value T then

b(x, y) =

(

1 if f (x, y) ≥ T 0 if f (x, y) < T

3.1.2 Morfological operations

Morfological operations are methods working on binary images. During this sec-tion the following terminology will be used.

f (x, y) is the binary image.

r(x, y) is the resulting binary image. a(x, y) is a structure element. See [2] for more information.

3.1.3 Structure Element

A structure element is a binary image a(x, y) that defines the erosion or the di-lation operations. The structure element is used as a convolution kernel in the 9

(24)

dilation operation and a cross correlation kernel in the erosion operation. 3.1.3.1 Erosion

The erosion operation is accomplished by setting all object pixels within a certain distance from a background pixel to 0. In practice a structure element is used. The origin of the structure element is then translated to every object pixel. If all of the structure element is accommodated in the object at a certain position the pixel is set to 1 in the resulting image, otherwise 0.

r(x, y) = a(x, y) f (x, y) Where is the erosion operator.

The erosion operation can also be seen as a cross correlation between the struc-ture element a(x, y) and the image f (x, y).

r(x, y) =

(

1 if a(x, y) ? f (x, y) = A 0 if a(x, y) ? f (x, y) 6= A Where A is the number of pixels in the structure element.

Notice, a(x, y) f (x, y) 6= f (x, y) a(x, y) due to the fact that correlation does not fulfill the commutative law

Figure 3.1: Original image before morfological operation.

Figure 3.2: Erosion operation applied to figure 3.1. A 8 × 8 structure element was used.

3.1.3.2 Dilation

The dilation operation is the opposite of the erosion operation. It is accomplished by setting all background pixels within a certain distance from an object point to

(25)

3.1. IMAGE PROCESSING THEORY 11

1. In practice a structure element is used. The origin of the structure element is then translated to every object pixel. In every position a pixel wise OR operation is carried out.

r(x, y) = a(x, y) ⊕ f (x, y) Where ⊕ is the erosion operator.

The dilation operation can also be seen as a convolution between the structure element a(x, y) and the image f (x, y).

r(x, y) =

(

1 if a(x, y) ∗ f (x, y) ≥ 1 0 else

Figure 3.3: Dilation operation applied to figure 3.1. A 4 × 4 structure element was used.

3.1.3.3 Opening

The opening operation consists of one erosion operation followed by one dilation operation, where the distance is the same in both of the operations. When im-plemented with a structure element the same structure element is used in the two operations.

The opening operation will split two objects which border on each other. 3.1.3.4 Closing

The closing operation consists of one dilation operation followed by one erosion operation, where the distance is the same in both of the operations. When im-plemented with a structure element the same structure element is used in the two operations.

(26)

3.2 Computer Vision Theory

3.2.1 Camera System

The lens inside the camera refracts all rays of light from a certain object point to one single point in the image plane. If the lens is thin implying the distortion can be neglected, the lens law is valid.

1 α + 1 β = 1 f (3.1)

Where α is the distance between the lens and the object, β is the distance between the lens and the image plane and f is the focal length. Figure 3.4 illustrates the lens law.

Figure 3.4: Illustration of the lens law.

By the lens law it is obvious that an object at the distance α from the lens will be reproduced with complete sharpness on the image plane. If the distance between the object and the lens differs from α, the reproduction on the image plane will be more or less blurred. How blurred the image will be, is determined by the depth of field s.

s = 2λ f D

!

(3.2) Where D is the diameter of the aperture and λ is the wavelength of the incoming ray of light. The depth of field can also be defined as the interval where the

(27)

3.2. COMPUTER VISION THEORY 13

resolution is greater than _2d1 where the resolution 1_d is defined as following 1 d = 1 λ D f (3.3) [3]

3.2.2 Pinhole Camera Model

One common way of modeling a camera is to use the pinhole camera model. The model performs well as long as the lens is thin and no wide-angle lens is used. In practise the image plane is located behind the lens, but to simplify calculations and relations between the coordinate systems, the image plane can be put in front of the lens. Figure 3.5 illustrates the pinhole camera model with the image plane located in front of the lens.

Figure 3.5: Illustration of the pinhole camera model, image plane in front of the lens to simplify calculations (Courtesy of Maria Magnusson Seger).

Equation 3.4 shows the relation between the coordinate systems.

W    un vn 1   =    U V W   = h R t i      X Y Z 1      (3.4)

Whereh U V W iT are the camera coordinates,h un vn

iT

are the ideal nor-malized image coordinates andh X Y Z iT are the world coordinates.

(28)

Matrixh R t iis the extrinsic parameters and describes the translation and the rotation between the coorinate systems, e.g. how the camera is rotateted and translated in relation to the origin of the world coordinate system.

The real image plane usually differs from the ideal normalized image plane. Equation 3.5 describes the relation between the two planes.

   u v 1   = A    un vn 1    (3.5)

Where A defines the intrinsic parameters.

A =    α γ u0 0 β v0 0 0 1   

Where α and β are scaling factors for the u and v axes.

"

u0

v0

#

are the image coordinates of the intersection between the image plane and the optical axis. γ describes the skewing between the u and v axes.

Observe, the relation in equation 3.5 implies equation 3.4 can be rewritten as

W    u v 1   = kA h R t i      X Y Z 1      (3.6)

Where k is an arbitrary constant. Let s = W_k implies equation 3.6 can be rewritten as s    u v 1   = A h R t i      X Y Z 1      (3.7) [3]

3.2.3 Camera Calibration

Equation 3.7 describes how a point in the environment maps to the image plane up to a scale factor s.

Let:

(29)

3.2. COMPUTER VISION THEORY 15

If N corresponding points in the imageh ui vi

iT

and the worldh Xi Yi Zi

iT

are found and i ∈ [1 . . . N ] then C can be determined to up a scale factor.

C =    C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33 C34    (3.9)

By setting C34 = 1 in equation 3.9 the scale factor is determined.

Let:

c =h C11 C12 C13 C14 C21 C22 C23 C24 C31 C32 C33

iT

To determine c (determine the intrinsic and extrinsic parameters) a system of equations given by the corresponding points can be used.

Dc = f (3.10) D =      

X1 Y1 Z1 1 0 0 0 0 −u1X1 −u1Y1 −u1Z1

0 0 0 0 X1 Y1 Z1 1 −v1X1 −v1Y1 −v1Z1 .. . ... ... ... ... ... ... ... ... ... ... 0 0 0 0 XN YN ZN 1 −vNXN −vNYN −vNZN       f =       u1 v1 .. . vN       c is then given by c =DTD−1DTf = D†f (3.11)

Note, at least six corresponding points are needed to determine the intrinsic and extrinsic parameters.

(30)

3.3 Transformation Theory

3.3.1 Homogeneous Transformation

3.3.1.1 3D translation

Let point P =h x y z iT be translated to a point P0 =h x0 y0 z0 iT by the translation vector t =h tx ty tz

iT

. Then P0 can be expressed as

P0 = P + t (3.12)

3.3.1.2 3D Rotation

Let point P = h x y z iT be rotated θ◦ around the Z axis to a point P0 =

h

x0 y0 z0 iT. Point P0 can be expressed according to [4] as

x0 = x cos θ − y sin θ (3.13) x0 = x sin θ + y cos θ (3.14) z0 = z (3.15) Let: Rz=    cos θ − sin θ 0 sin θ − cos θ 0 0 0 1   

Then P0 can be written as

P0 = RzP (3.16)

3.3.1.3 Homogeneous Transformation Matrix

Translation and multiplicative terms for a three dimensional transformation can be combined to a single matrix. Expand the three dimensional coordinates P and P0 in section 3.3.1.1 and section 3.3.1.2 to four element column vectors as

e

P =h xh yh zh h

iT f

P0 =h x0_h y0_h z_h0 h iT Where h is the nonzero homogeneous parameter.

x = xh h, y = yh h, z = zh h x0 = x0h h, y 0 ₌ y_h0 h, z 0 ₌ z0_h h

(31)

3.3. TRANSFORMATION THEORY 17

For a geometric transformation the homogeneous parameter h can be set to any nonzero value. Suitably h is set to 1 implying e = eh where e ∈ {x,y,z}.

The translation in equation 3.12 can be represented by a homogeneous matrix M as f P0 = MPe (3.17) M =      1 0 0 tx 0 1 0 ty 0 0 1 tz 0 0 0 1     

The rotation in equation 3.16 can be represented by the homogeneous matrix M if M =      cos θ − sin θ 0 0 sin θ cos θ 0 0 0 0 1 0 0 0 0 1     

A complete transformation of a rigid body e.g. a transformation between two different coordinate systems can be expressed as a homogeneous transformation matrix T. T =      r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz 0 0 0 1     

3.3.2 Screw Axis Rotation representation

A rotation matrix R =    a11 a12 a13 a21 a22 a23 a31 a32 a33  

 is completely defined by the axis of

rotationh nx ny nz

iT

and the rotation angle θ as

a11 = (n2_x− 1)(1 − cos θ) + 1 (3.18)

a12 = nxny(1 − cos θ) − nzsin θ (3.19)

a13 = nxnz(1 − cos θ) + nysin θ (3.20)

a21 = nynx(1 − cos θ) + nzsin θ (3.21)

a22 = (n2_y− 1)(1 − cos θ) + 1 (3.22)

a23 = nynz(1 − cos θ) − nxsin θ (3.23)

a31 = nznx(1 − cos θ) − nysin θ (3.24)

a32 = nzny(1 − cos θ) + nxsin θ (3.25)

(32)

Equation 3.26 is called the screw axis representation of the orientation. [5]

Given a rotation matrix R, the angle of rotation θ and the rotation axish nx ny nz

iT

can be obtained by the following equations.

θ = cos−1 a11+ a22+ a22− 1 2 (3.27) nx = a32− a23 2 sin θ (3.28) ny = a13− a31 2 sin θ (3.29) nz = a21− a12 2 sin θ (3.30)

This representation makes it possible to compare two different measuring units which uses different coordinate systems to measure the rotation. In this project the method was used to ensure the camera measured the same rotation angle as the robot, when the robot was told to perform a rotation.

3.3.3 Rodrigues´s formula

One way of finding the rotation matrix R in section 3.3.2 is to make use of the Rodrigues’s formula for a spherical displacement of a rigid body.

Let point P1 rotate around the rotation axis n, resulting in a new position P2

of the point. See figure 3.6. The Rodrigues’s formula is then defined according to equation 3.31

Figure 3.6: Definition for Rodrigues’s formula.

(33)

3.3. TRANSFORMATION THEORY 19

According to [5] equation 3.31 can be rewritten as equation 3.32

r2 =1R2r1 (3.32)

(34)

(35)

Chapter 4 Determination of Tool Center Point

This chapter describes how the problem of finding the tool center point was solved. The chapter starts by describing the proposed method for determination of the camera to tool, tool to robot wrist and camera to robot base coordinate system transformations respectively. The chapter also describes the proposed iterative method for finding the tool center point. The iterative method is used to remove uncertainties due to camera calibration errors and robot calibration errors.

4.1 Overview of the system

Figure 4.1: Overview of the system.

Figure 4.1 shows the arrangement of the system. A computer is connected to a camera by a USB cable and to the robot by a TCP-IP cable.

(36)

4.1.1 Equipment

The robot used during the project was an IRB 1400 ABB robot controlled by an IRC5 computer. A laptop computer with an Intel PentiumM 1600Mhz processor and 1Gb RAM was connected to the IRC5 computer by a TCP-IP cable. A pro-gram was then developed in C# and C/C++ making it possible to control the robot movements directly from the laptop. The laptop computer was also used for im-age segmentation of the tool. The imim-ages were retrieved at 50 frames per second by an uEye (UI221x-M V2.10) CCD grayscale USB camera with a resolution of 640 × 480 pixels.

Except for the equipment, four different transformations T1, T2, X1 and X2

are displayed in figure 4.1.

By defining a tool coordinate system TF (tool frame) with its origin at the tool center point, the problem of finding TCP actually becomes equivalent by finding the transformation X1 from the tool frame TF to the TFF (Tool Flange Frame)

coordinate system fixed at the last axis of the robot manipulator.

To make it possible to determine transformation X1, a closed loop of

homo-geneous transformation matrixes can be written as an equation.

T2X1 = X2T1 (4.1)

Transformation matrix T1 in Equation 4.1 is the robot’s forward kinematics.

Consequently T1 is known, making the problem of finding X1become equivalent

by determine X2and T2

4.2 Determine T

2

By calibrating the camera using the method proposed by Zhengyou Zhang [6] the intrinsic parameters of the camera were determined and the lens distortion coefficients were retrieved, see Appendix A.

Let m = h u v iT denote a 2D image point and M = h X Y Z iT de-note a 3D point. The augmented representations of these vectors are achieved by adding 1 as the last element of the vector. m =f

h

u v 1 iT and M =f h

X Y Z 1 iT. By modeling the camera as a perfect pinhole camera, the re-lation between a 3D point M and its projected image point m is fulfilling equation 4.2

sm = ATf Mf (4.2)

Where A is the intrinsic parameters, s is a scale factor and T is the extrinsic parameters.

T =h r1 r2 r3 t

(37)

4.2. DETERMINE T2 23

By setting up a plane in the world, the homography between the plane and its im-age can be determined. Let the plane fulfilling the constraint of Z = 0. Equation 4.2 is then rewritten as s    u v 1   = A h r1 r2 r3 t i      X Y 0 1      = Ah r1 r2 t i    X Y 1   

Let the homography H = h h1 h2 h3

i

= Ah r1 r2 t

i

. According to [6] this implies the extrinsic parameters can be determined as

r1 = λA−1h1 r2 = λA−1h2 r3 = r1× r1 t = λA−1h3 Where λ = || 1 A−1_h 1|| = || 1 A−1_h 2||

By letting the plane be expressed in the tool coordinate system TF, finding trans-formation T2 in figure 4.1 becomes equivalent of finding the extrinsic parameters

T =h r1 r2 r3 t

i

. From 4.2 it is obvious that T is totally determined by the intrinsic parameters A and the homograpghy H. The camera calibration toolbox for Matlab was used to determine the intrinsic parameters, A, for the camera. See Appendix A. This toolbox uses the same technique as mentioned in this section and in [6]. The homography H = h h1 h2 h3

i

can be obtained according to [6] by minimizing P i  m_i− 1 hT3Mi   hT₁Mi hT₂Mi     T (σ_i2I)−1  m_i− 1 hT3Mi   hT₁Mi hT₂Mi    

Where hj is the j:th row of H, I is the identity matrix and σ is the standard

deviation of the Gaussian noise that is assumed to affect mi. The problem can be

solved as a non-linear least square problem. minHPi  m_i− 1 hT3Mi   hT₁Mi hT₂Mi     2

Due to the fact that a plane is by definition defined of at least three points, at least three corresponding points mi and Mi belonging to the plane needs to be

determined. An image segmentation method for finding three distinct points in the plane lying in the tool coordinate system TF, is described in chapter 5. The tool used for the segmentation was a calibration tool with an awl looking shape. The TCP of the tool was the tip of the awl, and the TCP was one of the points determined by the method.

(38)

4.3 Finding X

1

and X

2

transformation matrixes

When the transformations T1 and T2 were found only X2 remained to be

de-termined. Jintao and Daokui proposed a method in [7] where the transformation X2 was determined by moving the robot to a fixed position e.g. the tip of a nail

with known relation to a calibration pattern. Zhuang et al. [8] proposed a method that solved X1 and X2 simultaneously by applying quaternion algebra to derive

a linear solution. Fadi Donaika and Radu Horaud [9] proposed one closed form method and one method based on non-linear constrained minimization for solving the equation 4.1 of homogeneous matrixes.

The method implemented was first described by Roger Y. Tsai and Reimar K. Lenz [10]. By moving the robot to a second position finding T1 and T2 at the

new location of the robot, a system of two equations 4.3 and 4.4 can be achieved, please see figure 4.2

T21X1 = X2T11 (4.3)

T22X1 = X2T12 (4.4)

By multiply the inverse of equation 4.4 by equation 4.3 equation 4.6 will be obtained. (T22X1)−1T21X1 = (X2T12)−1X2T11 (4.5) ⇔ X−1₁ T−1₂₂T21X1 = T−112X −1 2 X2T11 ⇔ T−1₂₂T21X1 = X1T−112T11 ⇔ AX1 = X1B (4.6)

Where A = T−122T21 and B = T−112T11 are known. Observe, A and B are the

transformations from position 1 to position 2 of the tool flange frame and the tool frame respectively. See figure 4.2

The transformation matrixes A,B,X1 are all homogeneous matrixes. A

ho-mogeneous matrix consists of one rotation matrix R and one translation vector t.

(39)

4.3. FINDING X1 AND X2 TRANSFORMATION MATRIXES 25

Figure 4.2: Transformations AX1 = X1B. RB is the robot base frame, CF is the

camera frame, TF is the tool frame and TFF is the tool flange frame. A and B are the transformations from position 1 to position 2 of the tool flange frame and the tool frame respectively.

X1 = RX1 tX1 0 0 0 1 ! A = RA tA 0 0 0 1 ! B = RB tB 0 0 0 1 !

The rotation matrix R can be written as

R =     n2

1+ (1 − n21) cos θ n1n2(1 − cos θ) − n3sin θ n1n3(1 − cos θ) + n2sin θ

n1n2(1 − cos θ) + n3sin θ n22+ (1 − n22) cos θ n2n3(1 − cos θ) − n1sin θ

n1n3(1 − cos θ) + n2sin θ n2n3(1 − cos θ) + n1sin θ n23+ (1 − n23) cos θ

T    (4.7) Where h n1 n2 n3 i

is the rotation axis and θ is the rotation angle. Obvi-ously it is possible to specify R in equation 4.7 by specifyingh n1 n2 n3

i

and θ.

(40)

By using a modified version of Rodrigues formula, see section 3.3.3, a func-tion Prdepending on

h

n1 n2 n3

i

and θ can be defined as

Pr = 2 sin θ 2 h n1 n2 n3 i , 0 ≤ θ ≤ π (4.8) Let:

PA, PB, PX1 be the rotation axes defined according to equation 4.8

for RA, RB, RX1 respectively. P0_X 1 = 1 q 4 − |PX1| 2 (4.9)

For a vector v = [vxvy vz] let

Skew(v) =    0 −vz vy vz 0 −vx −vy vx 0    (4.10)

By setting up a system of linear equations according to equation 4.11, P0_X

1

can be solved by using linear least square techniques. Skew(PB+ PA)P 0 X1 = PA− PB (4.11) Proof: PA− PB ⊥ P 0 X1 (4.12) PA− PB ⊥ PA− PB (4.13)

By equation 4.12 and 4.13 follows PA− PB = s(PA+ PB) × P 0 X1 (4.14) Where s is a constant PA− PB and (PA+ PB) × P 0

(41)

4.3. FINDING X1 AND X2 TRANSFORMATION MATRIXES 27

Let α be the angle between P0_X₁ and (PA+ PB).

|(PA+ PB) × P 0 X1| = |PA+ PB||P 0 X1| sin α = = , Use equation 4.9 , = |PA+ PB|2 sinθ₂(4 − 4 sin2 θ₂)− 1 2 sin α = |PA+ PB| tanθ₂sin α = |PA− PB|

This implies equation 4.14 can be written as

PA− PB = (PA+ PB) × P 0

X1 (4.15)

By equation 4.15 and the relation a × b = Skew(a)b follows Skew(PB+ PA)P

0

X1 = PA− PB

Q.E.D

Skew(PB+ PA) is always singular and therefor at least two pairs of positions

are needed to create the system of equation 4.11 e.g. three different positions for the robot are needed. When P0_X₁ is determined PX1 can be retrieved by equation

4.16. The rotation RX1 is then determined by equation 4.17

PX1 = 2P0_X₁ q 1 + |P0_X₁|2 (4.16) RX1 = (1 − |PX1| 2 2 )I + 1 2(PX1P T X1 + q 4 − |PX1|2Skew(PX1)) (4.17) Let: TX1 =      1 0 0 | 0 1 0 tX1 0 0 1 | 1 0 0 1      (4.18)

(42)

TA=      1 0 0 | 0 1 0 tA 0 0 1 | 1 0 0 1      (4.19) TB=      1 0 0 | 0 1 0 tB 0 0 1 | 1 0 0 1      (4.20)

To find the translation tX1 of the homogeneous transformation X1, a linear

system of equations 4.21 can be solved by a linear least square technic.

(RB− I)TX1 = RX1TA− TB (4.21) Proof: AX1 = X1B ⇔ (TB+ RB− I)(TX1 + RX1 − I) = (TX1 + RX1 − I)(TA+ RA− I) ⇔ TBTX1 + TBRX1 − TB+ RBTX1+ RBRX1 − RB− TX1 − RX1 + I = −RX1 − TA+ RX1TA+ RX1RA+ TX1+ TA− I ⇔ TB+ TX1 − I + TB+ RX1− I − TB+ RBTX1+ RBRX1 − RB− TX1 − RX1 + I = −RX1+ RX1TA+ RX1RA+ TX1− I ⇔ TB− I + RBTX1 + RBRX1− RB = −RX1 + RX1TA+ RX1RA+ TX1 − I ⇔ RBTX1 + RBRX1 − RB− TX1 =−RX1 + RX1TA+ RX1RA− TB ⇔ ,

Remove all terms not having a translation

, ⇔ RBTX1− TX1 = RX1TA− TB ⇔ (RB− I)TX1 = RX1TA− TB Q.E.D.

At least three different positions of the robot are needed to calculate the equa-tions 4.11 and 4.21 by a linear least square technique. In the implementation, four

(43)

4.4. IMPORTANCE OF MEASURE A AND B WITH 6 DOF 29

different positions were used. The result varied much due to how the positions were chosen. The best estimation of X1 was achieved when the positions were

fulfilling the constraint of completely exciting all degrees of freedom. In practise this constraint is equivalent of ensuring the three first positions differ in both a lin-ear movement along orthogonally axes x, y, z and rotations around each axis. The fourth position was chosen to be a movement in h x y z iT = h 1 1 1 iT and a rotation around the same axis.

4.4 Importance of measure A and B with 6 DOF

Actually, if only five degrees of freedom can be measured in T1or T2, no unique

solution X1 exists to the equation AX1 = X1B. This situation occurs for

in-stance if the tool has an axis of symmetry. Proof:

First four lemmas have to be defined.

Lemma1 :

Matrixes A and B are similar if a matrix X exists such as B = X−1AX

⇔ AX = XB

If A and B are similar then A and B has got the same trace, determinant and eigenvalues.[11]

Lemma2 :

The eigenvalues of a rotation matrix R are λ1 = 1

λ2 = cos θ + i sin θ

λ3 = cos θ − i sin θ

Where θ is the angle of rotation specified by equation 4.7 [12]

(44)

Lemma3 :

A rotation matrix R fulfills according to [13] the orthogonality conditions R−1 = RT

and RT_{R = I}

Lemma4 :

For a matrix R fulfilling the ortogonality conditions,

the trace of R is the sum of the eigenvalues of R according to [12]. T r(R) =P

iλi

Lemma 2 and 3 in Lemma 4 gives

T r(R) = 1 + cos θ + i sin θ + cos θ − i sin θ ⇔

θ = arccos T r(R) − 1 2

!

(4.22) If the robot is told to perform a rotation θ around the axis of symmetry, the robot will measure a rotation θB = θ. However the camera will not be able to distinguish

any difference between the orientation before and after the rotation around the axis of symmetry. This implies the camera will measure a rotation θA= 0.

It is then obvious by equation 4.22 that in general T r(A) 6= T r(B) implying according to Lemma1, there is no solution X1to equation AX1 = X1B.

Q.E.D

Observe, once transformation X1 is determined, the transformation X2

be-tween the camera coordinate system and the robot base frame can be obtained as X2 = T2X1T−11 .

The calibration tool described in chapter 5 had an axis of symmetry making it impossible to find a solution X1 to equation AX1 = X1B. Instead, the non

symmetric chess board in figure 4.3 was used as a robot tool. The TCP was defined as the upper left corner of the chess pattern.

Even though it is impossible to find a solution X1 to equation AX1 = X1B

if the robot tool has an axis of symmetry, it is still possible to determine X1 by

(45)

4.5. ITERATIVE METHOD FOR INCREASING THE ACCURACY 31

Figure 4.3: Chessboard pattern.

4.5 Iterative Method for increasing the accuracy

Although the transformation X1 is known and the coordinate of the tool center

point is found, the accuracy of the method is under influence of errors in the camera calibration and errors in the robot calibration. To ensure a high accuracy an iterative method had to be used. Denote the tool center point coordinate found by the method as TCPguess and the correct coordinate of TCP as TCPcorrect.

Assume the robot reorient around TCPguess. If TCPguess = TCPcorrect

the TCPcorrect would stay at the same point TCPguess during the reorientation,

while TCPcorrectwould move away from TCPguessif TCPguess 6= TCPcorrect.

This phenomena can be used to achieve a great accuracy of the tool center point calibration.

Denote the error vector between TCPguess and TCPcorrectafter a

reorienta-tion around TCPguess as . By measure the coordinate of TCPcorrect after the

reorientation around TCPguess, can be retrieved as

= TCPcorrect− TCPguess (4.23)

Of course the new measurement of TCPcorrect is only a guess, retrieved in the

same way as TCPguess and with the same accuracy. This implies e.g.

= TCPguess2− TCPguess (4.24)

(46)

imply TCPguess = TCPcorrectbut in most cases TCPcorrect 6= TCPguess2

im-plying TCPguess 6= TCPcorrect. Instead TCPcorrect is measured again by the

camera and a new is calculated. This procedure is done iteratively until || < β, where β is the desired accuracy. Observe, to retrieve a faster convergence of the iterative algorithm the robot can be told to move −α, where α is a constant < 1. When the desired accuracy β is retrieved, two positions of the tool are known where TCPcorrect = TCPguess. The orientation of the tool differs between the

two positions, because only linear movements are used when moving the robot −.

By performing reorientations around TCPguessin six different directions and

using the iterative procedure to linearly move TCPcorrectback to TCPguess, six

different positions are obtained. All of the six robot positions ensure TCPcorrect

is at the same certain point TCPguess, but with six different orientations.

Let T1i, i ∈ [1...6] be the homogeneous transformations from the robot base

frame coordinate system to the tool flange coordinate system e.g. the robot’s for-ward kinematics for the six different positions. Let

   T CPx T CPy T CPz   be the translation

from the origin of the tool flange coordinate system to the tool center point. Let: Q =      1 0 0 T CPx 0 1 0 T CPy 0 0 1 T CPz 0 0 0 1     

Due to the fact that the tool center point is at the same coordinate independent of which of the six robot positions that is examined, the algorithm described in section 2.3 on page 7 can be used to determine the true position of the tool center point. By using the iterative procedure and calculating the tool center point as a least square solution described in section 2.3, the accuracy of the TCP calibra-tion method becomes independent of the camera calibracalibra-tion errors and the robot calibration errors. Instead, only the accuracy of the camera when registering a movement between two different positions, affects the final accuracy of the tool center point calibration achieved by the iterative method. This can be assured by considering that the camera is an external observer during the iterative procedure, and the method will iterate until the camera measures (with the desired accuracy) the true tool center point is at the same location as the initial guess point.

(47)

4.6. ALTERNATIVE METHOD FOR ROTATION SYMMETRIC TOOLS 33

4.6 Alternative method for rotation symmetric tools

If the transformation between the camera and the robot’s base frame, X2, is

known, the iterative method for increasing the accuracy will work even if the tool is symmetric.

During the iterative method, the robot performs a reorientation and the cam-era only needs to measure the translation error between the positions before and after the reorientation. This assures the transformation, T2 between the camera

frame and the tool frame only needs to be determined with five degrees of freedom during the iterative procedure.

Although, the direction of the movement - must be expressed in relation to the robot’s base frame. Therefore the rotation part of the transformation, X2between

the camera frame and the robot base frame needs to be determined. The rotation part, RX2, of the homogeneous transformation X2 can be obtained according to

[4] by moving the robot a certain distance in the X,Y and Z directions respectively in the robot’s base frame.

Let u = h ux uy uz iT , v = h vx vy vz iT and w = h wx wy wz iT

be the vectors measured by the camera when the robot moves a specified distance in the X, Y and Z directions respectively. According to [4] the rotation part RX2,

of the homogeneous transformation X2 is then obtained as

RX2 =    ux vx wx uy vy wy uz vz wz    (4.25)

The translational scaling factor between the two coordinate systems is also possi-ble to determine, because the distances the robot moves in the X, Y and Z direc-tions are known and the distances of the same movements measured by the camera are also retrieved. This assures it is possible to find the translation error between TCPguess and the true coordinate of the tool center point TCPcorrect, even if the

robot tool is rotation symmetric. It is thereby possible to perform a tool center point calibration, even if the robot tool is rotation symmetric.

(48)

(49)

Chapter 5 Image Segmentation of Robot Tool

This chapter will describe the methods evaluated during this project for image segmentation of the robot tool.

5.1 Overview

Since one of the goals of this thesis was to deliver a software program made in C/C++ and C# , all of the image processing had to be written in either of these program languages. The HMI part of the program was written in C# and the image processing functions were written in C/C++.

To keep the production costs as low as possible only open source libraries were allowed to be used during this project. Due to this constraint the image processing functions in Open Computer Vision Library 1.0 (OpenCV 1.0) were mainly used for image segmentation.

5.1.1 Open Computer Vision Library (OpenCV)

OpenCV 1.0 is an open source project written in C consisting of a large collec-tion of image processing algorithms. The library is also compatible with Intel’s IPL image package and has got the ability to utilize Intel Integrated Performance Primitives for better performance. [14] [15]

5.2 Creation of binary mask image

To retrieve a rough approximation of the image region where the object is located numerous approaches can be used. In this section two different methods will be evaluated.

(50)

5.2.1 Image registration

The idea of image registration is to find the transfer field v(x) : R2 _{→ R}2_{, x =}

"

x y

#

making a reference image I2(x, y) fit as good as possible in a target image

I1(x, y) e.g. finding v(x) minimizing in equation 5.1

2 = ||I2(x + v(x)) − I1(x)|| (5.1)

The method outlined in this section is described in more detail in [16]. By letting the reference image be a picture of the desired tool and the target image be the image achieved by the camera, this method should iteratively be able to find the location of the tool in the target image. One constraint of the target image have to be fulfilled.

ConstraintI :

The image can locally be described as a sloping plane. I(x, t) = I(x, t) − ∇IT_{(x, t)v(x) + ∆}

t ∆t = I(x, t + 1) − I(x, t) ∇I(x, t) = " ∇xI(x, t + 1) ∇yI(x, t + 1) #

ConstraintI can be rewritten as

I(x, t) = I(x, t) − ∇IT(x, t)v(x) + ∆t

⇔

I(x, t) = I(x, t) − ∇IT_{(x, t)v(x) + (I(x, t + 1) − I(x, t))}

⇔

I(x, t) = −∇IT_{(x, t)v(x) + I(x, t + 1)}

⇔

(51)

5.2. CREATION OF BINARY MASK IMAGE 37 Let: I1 = Ix,t+1 I2 = Ix,t v(x) = B(x)p B(x) = " 1 0 x y 0 0 0 1 0 0 x y # p =     p1 .. . p6    

Then 5.2 can be written as ∇IT 2 B(x)p = I1− I2 (5.3) Let: A =     ∇I2(x1)TB(x1) .. . ∇I2(xn)TB(xn)     b =     I1(x1) − I2(x1) .. . I1(xn) − I2(xn)    

Where n is the number of pixels in I2. Equation 5.3 is then rewritten to equation

5.4.

Ap = b (5.4)

This implies p and v(x) are determined as

p = (ATA)−1ATb = A†b (5.5)

v(x) = B(x)p (5.6)

The new location I2newof the reference image is then obtained by interpolating

I2from v(x). The method is then iterated with I2new as I2until |v(x)| < β, where

β is the desired accuracy.

The method described in this section was implemented and evaluated in Mat-lab. Unfortunately the method was not reliable. The resulting position of the

(52)

reference image I2 was heavily affected by the light condition. Although, when

using the method on synthetic binary test images the method performed very well. Of course it would be possible to threshold the image I1 retrieving a binary

image Ibinary, and then apply the image registration method on the binary image.

Although, finding the threshold value that separates the background completely from the tool is a hard problem to solve.

Of course there might exist image registration methods, better suited for solv-ing the problem. A phase based image registration method would perhaps give a better result than the method described here. However, due to the fact that it is possible to move the robot to different positions, the background subtraction method was instead evaluated.

5.2.2 Background subtraction

If several image frames can be acquired and the object is moving between each frame the background subtraction method can be used to retrieve a smaller search area for the object.

The background subtraction is done by subtracting two images from each other and perform a threshold. If the scene is completely static except for the moving object, all of the background will be eliminated resulting in a binary mask image where background pixels are set to 0 and object pixels are set to 1.

Let

• f1(x, y) be the image acquired at the start position.

• f2(x, y) be the image acquired at the final position.

• b(x, y) be the binary result image. • T be threshold value.

b(x, y) =

(

1 , kf1(x, y) − f2(x, y)k ≥ T

0 , kf1(x, y) − f2(x, y)k < T

The resulting binary mask image will of course have its pixels set to one in both the start location and the final location for the object. To distinguish between the object’s true location and the last location, the object can be moved to a third position and a new image f3can be retrieved.

By applying the background subtraction method to f1 − f3 and f2 − f3

sep-arately and applying a pixel wise logical AN D operator to the resulting binary images, a new binary mask image only giving the last location of the object will be retrieved.

In practise the resulting binary mask image was not perfect, several holes oc-curred in the object region see figure 5.1.

(53)

5.2. CREATION OF BINARY MASK IMAGE 39

Figure 5.1: Binary mask image from background subtraction.

To fill the holes in the mask image, morphological operations were applied to the binary image. By gradually apply dilate and erode operators (cvDilate and cvErode in OpenCV) on the mask image the holes were filled.

However, due to the quadrangular structure element used by the morpholog-ical operators the tip structure of the object mask image was deteriorated. Later on during the segmentation process the damaged tip structure showed to be an obvious drawback.

Instead of using the morphological operators, a new method referred to as the Hough Filling Method (HFM) was invented.

5.2.2.1 The Hough filling method (HFM)

By applying the Progressive Probabilistic Hough Transform (PPHT) described in [17] to the binary mask image in figure 5.1, all probabilistic lines were found. If the permitted gap constraint was set to a suitable value and the minimum distance between pixels which were to be considered as a line was set to zero, all holes were perfectly filled when the lines found by PPHT were drawn on the binary mask image. The HFM method will not increase the mask image boundary as long as the curvature of the boundary remains low. The PPHT algorithm is implemented in the OpenCV function cvHoughLines2 [18].

By applying the background subtraction method and filling gaps and holes with the Hough filling method, almost perfect binary mask images were achieved for all three locations of the tool, see figure 5.2. After the binary mask was used only a small search area was left in the image.

Figure 5.2: Binary mask image from background subtraction after applying the HFM.

(54)

5.3 Edge Detection

An edge can be treated as a locally odd function in an image. f (−x, −y) = −f (x, y)

And such a function will always have a gradient. The easiest way of finding the gradient is probably to convolve the image with the sobel operators.

Gx =    1 0 −1 2 0 −2 1 0 −1   , Gy =    −1 2 −1 0 0 0 1 2 1   

The edge image may then be retrieved as fedge =

q

(f ∗ Gx)2 + (f ∗ Gy)2

Unfortunately this method won’t, according to [2], work properly in noise af-fected environments. Irregular light conditions will also decrease the functionality of this method.

5.3.1 Canny’s Edge Detection

1986 J.Canny presented a more robust technique in [19] for edge detection. In contrast to the previous method, Canny´s edge detection algorithm makes use of the second derivates, which are zero when the first derivates reach their maxima. Let n = √ ∇f (x,y)

f2

x(x,y)+fy2(x,y)

be the unit vector in the gradient direction. Where fi(x, y)

is the image f (x, y) derived in the i direction.

The edge image fcannycan be found by deriving the absolute value of the gradient

image in the direction of the gradient. fcanny = n · ∇k∇f (x, y)k = f

2

x(x,y)fxx(x,y)+2fx(x,y)fy(x,y)fxy(x,y)+fy(x,y)2fyy(x,y)

f2

x(x,y)+fy2(x,y)

To be able to find the contour of the tool, the Canny edge detection algorithm [19] was used on the masked image

f (x, y)masked =

(

f (x, y) , if b(x, y) = 1

0 , if b(x, y) = 0 (5.7)

Where b(x, y) is the resulting image given from the background subtraction and HFM methods. To ensure that all of the tool was left in the masked image fmasked,

the dilation operator was applied to b(x, y) before calculating fmasked according

(55)

5.4. CONTOUR RETRIEVING 41

After applying the Canny edge detection algorithm on the masked image fmasked,

the binary edge image bedge was obtained. The Canny algorithm is implemented

in the OpenCV library function cvCanny [18].

The threshold values in the cvCanny function had to be determined according to the current environment. If the image contained a lot of noise, the threshold had to be set to a high value to ensure the elimination of the noise. Unfortunately the high threshold sometimes resulted in discontinuities of the tool edges in the edge image bedge. To solve the problem of discontinuities of the tool edges the Hough

Filling Method (see section 5.2.2.1 on page 39) was used with an excellent result. Another disadvantage was the presence of a light reflection in the metallic surface of the tool. The contour of the reflection had the same shape as the actual tool, resulting in a possible false hit later in the segmentation procedure. To avoid the influence of light reflections the tool was painted in a non-reflecting colour.

5.4 Contour Retrieving

To retrieve the contours from the binary edge image, bedge, two different

meth-ods were evaluated. This section will illuminate the two methmeth-ods, but first two different contour representations will be described.

Observe, it would be possible to retrieve an approximation of the tool contour from the binary mask image, see figure 5.2. In practise the result became more accurate by retrieving the tool contour from the binary edge image bedgegiven by

the Canny edge detection algorithm.

5.4.1 Freeman Chain Code Contour representation

The freeman chain code representation is a compact representation of contours. By denoting the neighbours of the current pixel with different digits, see figure 5.3, the contour can be stored as a succession of digits. Each digit then contains all necessary information for finding the next pixel in the contour.[18]

Please, look at the example in figure 5.4

(56)

Figure 5.4: An example of the Freeman Chain Code.

5.4.2 Polygon Representation

The polygonal representation of a contour is according to [18] a more suitable representation for contour manipulations. In the polygonal representation the se-quence of points are stored as vertices e.g. the pixel coordinates are stored.

5.4.3 Freeman Methods

The four scan line methods described in [20] are implemented in the OpenCV function cvFindContours [18]. The scan line methods scan the image line by line until it finds an edge. When an edge is found the method starts a border following procedure until the current border is retrieved in the Freeman representation.

The first method described in [20] only finds the outer most contours, while the other methods discover contours on several levels. The methods were evaluated on the binary edge image with satisfactory results.

The methods with the ability to find contours on different levels had a disad-vantage because unnecessary contours were found, increasing the risk of mixing the true object with a light reflection of the same shape.

5.4.4 Active Contour (The Snake algorithm)

Active contours or Snakes are model-based methods for segmentation. The active contour is a spline function v(s) with several nodes.

v(s) = (x(s), y(s)) (5.8)

Where x(s), y(s) are the x, y coordinates along the contour. s ∈ [0, 1]

Each node has got an internal energy and the algorithm is trying to minimize the total energy, E, of the spline function.

E =

Z 1

0

(57)

5.4. CONTOUR RETRIEVING 43

The spline function is affected by internal forces Eint(v(s)), external forces

Eimg(v(s)) and Econ(v(s)).

Esnake = Eint(v(s)) + Eimg(v(s)) + Econ(v(s))

The internal forces affecting the active contour are divided in tension forces and rigidity forces. The tension forces imply spring behaviour of the spline func-tion, while the rigidity forces make the spline function resist bending.

Eint(v(s)) = α(s) dv(s) ds 2 + β(s) d2_v(s) ds2 2

Where α(s) specifies the elasticity and β(s) specifies the rigidity of the spline function.

The external forces consist of image forces Eimgand user specified constraint

forces Econ. The image force is an image, where each pixel value defines a force.

The constraint forces can be used to guarantee that the active contour is not getting stuck at a local minima. The constraint forces might be set by a higher level process.[21]

One can compare the snake algorithm to a rubber band expanded to its maxi-mum when the algorithm is initialized. The rubber band then iteratively contract until the energy function in each node has reached equilibrium.

By using the binary edge image as the external force and defining the initial contour to be the result of applying the Freeman scan line method to the binary mask image, the active contour will successively contract until it perfectly en-closes the object, see figure 5.5

Figure 5.5: The snake contour at initial state to the left and at the final state to the right. The contour is randomly color coded from the start point to the end point of the contour.

In fact the mask image and the initial active contour have to be larger than the object itself to be able to enclose the object. If there are structures or patterns

(58)

behind the object, the active contour might get stuck due to forces in the edge image belonging to these structures see figure 5.6. This drawback together with the fact that the algorithm is iterative and therefore time consuming made the Freeman method better suited for retrieving the contour.

Figure 5.6: The snake contour could get stuck at the background pattern. The contour is randomly color coded from the start point to end point of the contour.

The snake algorithm is implemented in the OpenCV function cvSnakeImage. [18]

5.5 Finding the contour matching the tool shape

When all contours were retrieved by the Freeman method, a process of logics had to be applied for actually finding a contour or a part of a contour matching the wanted tool object. Two different methods were implemented and evaluated.

5.5.1 Convex Hull and its defects

A widely used methodology for finding objects with a shape reminding of a hand or a finger is to analyse the convex hull of the object. The awl shape of the tool object does remind much of a finger shape. In the OpenCV library a function named cvConvexHull2 exists for finding the convex hull of a contour. There is also a function for finding the deepest defects of the convex hull e.g. the points most far away from every line segment of the convex hull. The two functions were used to retrieve all points outlining the convex hull and the points constituting the deepest defects of the hull. All five points defining the tool were found by these functions. By finding the two deepest defect points, two of the five desired points could be determined (the two points closest to the robot’s wrist), see figure 5.7. To determine the three remaining points of the tool, lines were drawn between these points and their nearest neighbours, resulting in four different lines. The four lines

(59)

5.5. FINDING THE CONTOUR MATCHING THE TOOL SHAPE 45

were then compared two by two to find the two lines being most parallel to each other. See figure 5.8. By this method four out of five of the desired points were easily determined. Finding the last point (TCP) of interest, was then becoming easy while it was the only point lying on the complex hull fulfilling the constraint of being a nearest neighbour of two of the points already found. The method was fast and performed well as long as the robot only made reorientations around axis five during the background subtraction method. When more complex robot movements were used, a huge number of defect points in the convex hull were found resulting in that the method became time consuming. Instead the polygonal approximation method was evaluated.

Figure 5.7: The green lines is the convex hull found by the method, and the red dots are the points defining the convex hull and the defect points.

Figure 5.8: The four lines are illustrated in this figure. The green lines are the two lines being most parallel and found by the logics. The red lines are not parallel and will therefore be neglected.

5.5.2 Polygonal Approximation of Contours

To further compress the retrieved contour several methods can be used such as Run Length Encoding compression and polygonal approximation. In this implemen-tation the Douglas-Peucker polygonal approximation was used (cvApproxPoly in

(60)

the OpenCV library). The reason of using a polygon approximation method was due to the possibility to set the approximation accuracy to a level where the tool object only consisted of five points see figure 5.9

Figure 5.9: Five points polygonal approximation of the tool.

5.5.2.1 Douglas-Peucker Approximation Method

The method starts in the two points p1 and p2 on the contour having the largest

internal distance. The algorithm loops through all points on the contour to retrieve the largest distance from the line p1p2. If the maximum distance is lower than the

desired accuracy threshold, the process is finished. If not the point p3at the longest

distance from the line p1p2is added to the resulting contour. The line p1p2is then

split into the two line segments p1p3 and p3p2. The same methodology is then

applied recursively until the desired accuracy constraint is fulfilled.[18]

5.6 Geometric constraints

When the five points were found some logics were applied to make sure the five points fulfilled the geometry of the tool shape. The angles α1, α2and α3, between

the four different lines L1, L2, L3 and L4 of the shape, illustrated in figure 5.10,

were determined and the area A of the tool was calculated. The longest sides L1

and L2 of the tool had to fulfill the constraint of being parallel and the length of

the long sides L1 and L2 had to be at least twice the length of the short sides L3

and L4. If all these constraints were fulfilled, the five points were classified as part

of the tool shape. To ensure only one tool was to be found, the constraints were set to be very tough. If the method did not find a tool when the toughest constraints were used, the constraints were gradually slackened.

Robot Tool Center Point Calibration using Computer Vision

Robot Tool Center Point Calibration

using Computer Vision

Master’s Thesis in Computer Vision

Link¨oping Department of Electrical Engineering

by

Johan Hallenberg

Robot Tool Center Point Calibration

using Computer Vision

Master’s Thesis at Computer Vision Laboratory

University of Link¨oping

by

Johan Hallenberg

Reg nr: LITH - ISY - EX - - 07/3943 - - SE

Supervisors: Klas Nordberg

ISY, Link¨oping University

Ivan Lundberg

ABB Corporate Research Center, Sweden

Examiner: Klas Nordberg

ISY, Link¨oping University

Link¨oping 2007 - 02 - 05

Abstract

Acknowledgements

Notation

Symbols

Operators

Glossary

Contents

Chapter 1

Introduction

1.1

Background

1.2

Problem Specification/Objectives

1.3

Delimitation

1.4

Thesis Outline

Chapter 2

Robotics

2.1

Overview

2.2

Coordinate Systems

2.2.1

Tool Center Point, TCP

2.2.2

Base frame

2.2.3

World frame

2.2.4

Wrist frame

2.2.5

Tool frame

2.3

TCP Calibration algorithm

Chapter 3

Theory

3.1

Image Processing Theory

3.1.1

Threshold

3.1.2

Morfological operations

3.1.3

Structure Element

3.2

Computer Vision Theory

3.2.1

Camera System

3.2.2

Pinhole Camera Model

3.2.3

Camera Calibration

3.3

Transformation Theory

3.3.1

Homogeneous Transformation

3.3.2

Screw Axis Rotation representation