• No results found

Part Detection in Oneline-Reconstructed 3D Models.

N/A
N/A
Protected

Academic year: 2021

Share "Part Detection in Oneline-Reconstructed 3D Models."

Copied!
89
0
0

Loading.... (view fulltext now)

Full text

(1)

International Master’s Thesis

Part Detection in Online-Reconstructed 3D Models.

Carlos Gil Camacho

Studies from the Department of Technology at Örebro University

(2)
(3)
(4)
(5)

Studies from the Department of Technology

at Örebro University

Carlos Gil Camacho

Part Detection in

Online-Reconstructed 3D Models.

Supervisor: Todor Stoyanov, Henrik Andreasson, Robert Krug

(6)

© Carlos Gil Camacho, 2016

Title: Part Detection in Online-Reconstructed 3D Models.

(7)

Abstract

This thesis introduces a system to identify objects into a 3D reconstructed model. In particular, this is applied to automatize the inspection of an engine of a truck by detecting some parts in an online reconstructed 3D model. In this way, the work shows how the use of the augmented reality and the computer vision can be applied into a real application to automatize a task of inspection. To do this, the system employs the Signed Distance Function for the 3D rep-resentation which has been proven in other research as an efficient method for 3D reconstruction of environments. Then, some of the common processes for the recognition of shapes are applied to identify the pose of a specific part of the 3D model.

This thesis explains the steps to achieve this task. The model is built using an industrial robot arm with a depth camera attached to the end effector. This allows taking snapshots from different viewpoints that are fused in a same frame to reconstruct the 3D model. The path for the robot is generated by applying translations to the initial pose of the end effector. Once the model is generated, the identification of the part is carried out. The reconstructed model and the model to be detected are analysed by detecting keypoints and features descriptors. These features can be computed together to obtain several instances over the target model, in this case the engine. Last, these instances can be filtered by the application of some constrains to get the true pose of the object over the scene.

Last, some results are presented. The models were generated from a real engine truck. Then, these models were analysed to detect the oil filters by using different keypoint detectors. The results show that the quality of the recognition is good for almost all of the cases but it still presents some failures for some of the detectors. Keypoints too distinctive are more prune to produce wrong registrations due to the differences between the target and the scene. At the same time, more constrains make the detection more robust but also make the system less flexible.

(8)
(9)

Acknowledgements

I want to send my best thanks to my supervisors for their support and their as-sistance during the development of my project. Specially, I would like to thank to Todor Stoyonov by his patience and his guidance during the entire project since without his advices the development of the project would have not been possible. I Also want to thank to Martin Magnusson to give the opportunity to do this project and to ARHO and Volvo because they treated me very well while I was making tests at their facilities. Last I want to thank to my family and my friends for their support and contributions.

(10)
(11)

Contents

1 Introduction 1

1.1 Project Objective . . . 2

1.2 Project Outline . . . 2

2 State of the Art 3 3 System Overview 7 3.1 Description of the Software . . . 7

3.1.1 Robot Operator System . . . 7

3.1.2 Gazebo . . . 8

3.1.3 Point Cloud Library . . . 8

3.2 Description of the Hardware . . . 9

3.3 Integration of the System . . . 10

4 Building the SDF Model 13 4.1 SDF Model . . . 13

4.2 SDF Model From Simulation Data . . . 14

4.2.1 Replication of the lab in Gazebo . . . 15

4.2.2 Motion Planning . . . 16

4.2.3 SDF Tracker Integration . . . 17

4.3 SFD model from Lab Scenario . . . 19

4.3.1 Design of the mount of the camera. . . 19

4.3.2 Camera Calibration . . . 21

4.3.3 Motion Planning . . . 22

4.3.4 Adapting the previous work for the lab Scenario . . . 23

4.3.5 Collecting Data at ARHO . . . 24

5 Object Recognition 27 5.1 Segmentation . . . 29

5.2 Surface Normals Estimation . . . 31

5.3 Keypoints Extraction . . . 32

(12)

vi CONTENTS

5.3.1 Uniform Sampling . . . 33

5.3.2 Intrinsic Shape Signatures . . . 33

5.3.3 Harris 3D . . . 34

5.3.4 SIFT 3D . . . 35

5.4 Feature Descriptors . . . 36

5.4.1 Signature of Histograms of Orientations (SHOT) . . . . 37

5.5 Matching Correspondences . . . 38

5.6 Correspondences Grouping . . . 38

5.7 Pose Detection . . . 39

5.8 ICP refinement and Validation . . . 39

6 Results 43 6.1 Resulting 3D Models . . . 43

6.2 Result of the Identifications . . . 48

7 Conclusions 61 7.1 Future Work . . . 62

References 63 A Instructions to run the packages 69 A.1 Simulation Mode . . . 69

(13)

List of Figures

3.1 UR10 from Universal Robots. . . 9

3.2 Asus Xtion Pro Live. . . 10

3.3 Diagram of the system. . . 11

4.1 2D example of the Signed Distance Function. . . 14

4.2 Virtual representation of the lab in Gazebo. . . 16

4.3 MoveIt! Motion planner in RViz. . . 17

4.4 CAD model of the mount of the camera. . . 20

4.5 Final Design modified by Robert Krug. . . 21

4.6 Diagram of the calibration of the linear errors. . . 22

4.7 Diagram of the path planning. . . 23

4.8 Representation of the lab in RViz. . . 24

4.9 Robot arm over a movable platform in ARHO. . . 25

4.10 Simplified representation of the environment at ARHO. . . 25

5.1 Real oil filters and their 3D model. . . 27

5.2 Virtual scan of the oil filters 3D model. . . 28

5.3 Identification process for the input pipeline. . . 29

5.4 Region of interest to be analyzed. . . 30

5.5 Segmentation of the engine point cloud. . . 31

5.6 Example of Uniform Sampling. . . 33

5.7 Representation of the 2D SIFT algorithm. . . 36

5.8 Visual representation of the support structure used by SHOT. . . 37

6.1 Simulated model engine. . . 44

6.2 3D models of a simulated engine. . . 44

6.3 Object in the lab and the 3D model generated. . . 45

6.4 Generated 3D Model Engine 1. . . 46

6.5 Generated 3D Model Engine 2. . . 46

6.6 Generated 3D Model Engine 3. . . 47

6.7 Generated 3D Model Engine 4. . . 47

(14)

viii LIST OF FIGURES

6.8 Comparison between the real engine and the generated model. . 48

6.9 Number of correspondences depending of the radius of the SHOT descriptor. 49 6.10 Number of correspondences depending of the size of the leaf. . . 50

6.11 Wrong registration process performed using Uniform Sampling. 52 6.12 Reorientation of the Oil Filter. . . 53

6.13 Reorientation of the Oil Filter. . . 53

6.14 Registration with a good position but a wrong orientation. . . . 54

6.15 Diagram explaining how the instances are discarded. . . 55

6.16 Models found over the whole engine. . . 55

6.17 Image of a positive registration. . . 56

6.18 Examples of wrong registrations. . . 58

(15)

List of Algorithms

1 Segmentation algorithm. . . 30

2 surface normals estimation algorithm. . . 32

3 Matching Correspondences Algorithm. . . 38

4 ICP refinement algorithm. . . 40

5 Verification of instances algorithm. . . 41

(16)
(17)

List of Tables

6.1 Table of the error for the generated model. . . 46 6.2 Constant Parameters. . . 48

6.3 Translational Errors. The eRMST is the total error measured from the errors along the three

6.4 Rotational Errors. The eRMST is the total error measured from the errors along the three

6.5 Result of the experiments in 10 engines. . . 59

(18)
(19)

Chapter 1

Introduction

The automation of the industrial processes is one of the main subjects in the field of robotics [6], [3], [1]. A well implementation of an automation system relies on an increment of the productivity and efficiency of a process such as a reduction in the costs. This research proposes a study of different techniques to identify parts in a 3D reconstructed model in order to apply them to an auto-mated process of inspection of the engine of a truck. The trucks are intended to travel long distances and to transport heavy goods. Occasionally, they need to be checked to avoid possible accidents. A maintenance is necessary for all the elements of the tracks and one of the main components is the engine. This process of maintenance is usually a repetitive task which makes it suitable for automation.

During the inspection of the engine, one of the tasks lies in changing the containers of oil for new ones. This project aims to automate that process of changing the oil. This is done in two steps. The first step is the building of a 3D model of the engine. The model is built using a depth camera attached to a robot arm. The robot arm is used to place the camera into different positions to reconstruct the 3D model with the different viewpoints. The model is computed using a Truncated Signed Distance Function (SDF) which merges the data from the different viewpoint into a voxel grid structure with a common frame.

Once the 3D reconstruction is done, the second step is carried out. Dur-ing this step, the pose of the oil filters in the reconstructed model is computed. The pose is calculated by registering a model of the oil filters into the previ-ous reconstructed engine. The registration is done by matching point of interest belonging to both models. During this research, some of the most common techniques to extract and to match these keypoints are shown. Finally, the cal-culation allows sending to the robot arm the position of the filters relative to its base. Then, the robot can reach to the oil filters and grab one of the filters to change it.

(20)

2 CHAPTER 1. INTRODUCTION

1.1

Project Objective

The main objective of the project is to calculate the pose of known parts of the engine relative to the base of the robot. This is done taking into account some initial assumptions. The robot must be located under the engine since in real truck the engine only can be checked from the bottom. The pose of the engine is also known with a given uncertainty (e.g. +/- 20cm and +/- 15 degrees). For the maintenance, the truck is situated over an elevator with a fixed position relative to the robot but the truck can suffer small modifications due to small movements. The CAD models of the parts to be detected are given by the company of the trucks. And in addition, a rough location of the parts with respect to the engine is known since the engine always is going to have the same configuration.

1.2

Project Outline

The rest of this document is organised as follows.

Chapter 2 gives a brief description about other related works.

Chapter 3 explains the software and the hardware utilized in the system. Chapter 4 details the methods used for the reconstruction of the 3D models. Chapter 5 gives an explanation of the method used for 3D object recognition. Chapter 6 shows the results of the reconstructed models and the localization

of the parts into the model.

Chapter 7 provides some finals conclusions about the previous results and

(21)

Chapter 2

State of the Art

This chapter is going to describe some related researches that use similar tech-niques to the one used in this project and at the same time it will explain the motivations to use the methods chosen for this thesis.

The process of inspection in this research relies on the construction of a 3D model to identify the different parts. Currently, there exist multiple instrument for 3D scanning [4]. Some of them use range distance sensors to acquire the data and build the model [10], [36], [31]. Some of these sensors perform very robust 3D models but they may have a relatively hight investment due to their cost [21]. Others, as the proposed in [32], reduce the cost of the sensor at the expensive of being less flexible and needing more time for their developed. In addition, some of them requires a powerful hardware in order to be able to compute the 3D data acquired for the sensor. The accumulation of acquired 3D points for the reconstruction of the 3D structure makes necessary to have a expensive hardware for the computation [17].

The models, built for this research, are done using the previous work pro-posed in [8]. This work provides a method to build 3D Models fusing the infor-mation from a depth sensor in different viewpoints. The models are done using the Signed Distance Function which has been proved to be very efficient for this kind of task. The work of Curless and Levoy [11] was one of the first em-ploying this kind of 3D representation. Later, Newcombe [26] improves these techniques of reconstruction with the KinectFusion algorithm. The algorithm uses the Kinect sensor to track the pose of the camera and reconstruct the en-vironment scene. The technique employed in this thesis is therefore, similar to these methods, allowing the employment of another depth sensors different of the Kinect.

In [7], a comparison in time computation between some of the most famous techniques of this kind of 3D reconstruction is done. The comparison confirms the efficiency of the algorithm used in the present research, the SDF Tracker.

(22)

4 CHAPTER 2. STATE OF THE ART

Thus, this method allows to use a cheap sensor that can be acquire directly in the market. At the same time, it allows the reconstruction of the 3D model using a compact structure that avoid the accumulation of point of some methods review in the previous paragraph. The simplicity of the structure makes possible the use of a cheaper hardware to compute and analyse the 3D data.

At the same time, to accomplish the objective proposes for this research, some elements in the reconstructed 3D model of the environment must be iden-tified. This requires a matching and a registration of the points between real CAD models and reconstructed models. This field also represents an important topic in the field of the automated industrial applications. There are several methods for detecting object. Some of them, makes use of a brute search ap-proach to identify the models in the scene. In [20] is proposed a method to compare CAD models with scanned models represented by a point cloud. This is done by means of a segmentation of the models into surface patches and then using techniques base on iterative closest point to match these individual patches. The application of techniques base on brute force approach can get accurate results in simple environment. For environments with a lot of noise or multiple parts, this technique may be inefficient.

Other methods use neural networks and databases for the detection process. For example, [19] explains a method that also divide the models into different parts. Then it performs a classification of the parts based on previous training with a local database. The drawback of these methods lies in the need of a large database to perform the training of the neural network. In our case, we do not have a database of models which makes not possible the application of these techniques.

Alternatively, other works proposes the recognition of the object based on different features of the models [12], [23]. These methods perform a recogni-tion and a matching between global or local distinctiveness features between the model and the scene. This makes that this methods are suitable with en-vironments with many elements. Of special concern is the work proposes in [22], which uses a similar method to the one used in this project. The authors also build a SDF model of the scene and then they convert it to point cloud to perform the identification task. They employ a 3D/2D corner detector and compute the descriptor using an unsigned distance function generated from the database mesh. Then, the system performs a matching of the corners between the objects and the scene to locate the elements in the environment.

In our case the techniques of feature detection and brute force matching are applied together to have a more robust system. In this way, the feature detection is applied to find several keypoints that characterizes the scene and the object. Then these keypoints are matched to have several possibles instances of the model in the scene. This instances are used as initial positions for the brute

(23)

5

force approach which performs the registration and find the model that best fit in the scene.

(24)
(25)

Chapter 3

System Overview

In this chapter, the different elements involves in the research are going to be described. First, a brief description about the software is done in 3.1. Then the hardware used is introduced in 3.2. These descriptions provide the background necessary to understand the system.

3.1

Description of the Software

3.1.1

Robot Operator System

This software is a set of libraries and tools employed for robots applications [29]. The Robot Operator System (ROS) is used to coordinate and to control the components in the system. The processes that perform the computations are called nodes in ROS. These nodes can be combined and they can establish communication with the others to share information.

The software in ROS is organized using packages. These packages may con-tain nodes, an independent library, configuration files, third-party software, or other modules that can provide useful functions. During this project several ROS package are utilized, and in order to help to understand the approach adopted for the research, three of them needs to be explained in more detail. These packages are the “SDF tracker” [30] and the “MoveIt!” [28]. The visu-alization of data in ROS is done with the application Rviz.

SDF tracker Package

The SDF Tracker is a package developed by University of Örebro [8]. The pack-age provides an implementation of the truncated Signed Distance Function with an automated tracking of the sensor. This package will be used to build the 3D

(26)

8 CHAPTER 3. SYSTEM OVERVIEW

model of the engine that is used for the identification task. A more detailed ex-planation about the truncated Signed Distance Function can be found in section 4.1.

MoveIt! Package

This software provides the necessary tools for mobile manipulation, motion planning, navigation control and 3D perception in robotics applications. MoveIt! works over a wide variety of robots. The platform is used in the project to exe-cute the motion plan in a smooth way. It also allows imposing constrains in the robot during the path planning.

3.1.2

Gazebo

The use of a simulator makes it possible to test algorithms in a controlled en-vironment with similar properties to the real one. Gazebo is an open source 3D simulator which allows simulating robots in complex indoor and outdoor environments [27]. It offers tools to design robots and to perform physics simu-lations. In Gazebo it is possible to find, multiple physics engines, a large library of robots models and environments, a wide variety of sensors and it has a friendly graphical interfaces.

Beside this, Gazebo offers integration with ROS by using the necessary packages. It provides interfaces to simulate a robot in Gazebo using ROS mes-sages, services and others ROS features.

ROS uses XML files called Universal Robotic Description Format (URDF) to describe all the elements of a robot. Gazebo allows the use of this format with some modifications and is able to represent the elements from the description file into a 3D simulated environment.

For these reasons, the simulator was used during a phase of the research to test the algorithm without the real components of the system.

3.1.3

Point Cloud Library

The Point Cloud Library (PCL) is a set of tools for 2D/3D image and point cloud processing [16]. It is open source and free for commercial a research use. It also offers integration with ROS. The library offers a lot method used for the recognition of 3D objects into point cloud. It also offers some implementations for tree search which are useful working with 3D data. The ease of use of its tools and its wide variety of implementations make this library a good choice for the recognition part.

(27)

3.2. DESCRIPTION OF THE HARDWARE 9

3.2

Description of the Hardware

The objective of this section is to describe the hardware that is necessary to perform the inspection of the engine. The main components involved during the inspection are the industrial robot and the sensor. It is clear that a computer to manage the system and performs all the calculations is also necessary but since this can have many different configurations its description is going to be skipped. Next, the other two components previously mentioned are described in more detail:

• Industrial Robot: the robot used in this case is the model UR10 from Universal Robots [42]. It is a collaborative industrial robot with 6DOF and it has a CB3 controller which is connected to the computer through Ethernet connection. This kind of robots is usually used to automatize industrial process as packaging, palletized, assembly and pick and place. More specifically, this model is able to perform tasks that weigh up to 10 kg. There is a “MoveIt!” package that contains the necessary drivers to control the robot by using ROS. Figure 3.1 shows the robot.

Figure 3.1: UR10 from Universal Robots [42].

• Sensor: the sensor used is the camera ASUS Xction Pro Live [2]. This model comprises two sensors, one of them is a RGB camera and the other a depth camera. The depth sensor works in an optimal fashion between 0.8 and 3.5 meters. The sensor is placed in the end effector of the robot through a 3d printed mount. The sensor also has drivers integrated with ROS. The depth camera is the one that is going to be used during the project. The RGB camera can also be used to acquire useful information that help the recognition process. The depth camera uses an infrared sen-sor to project a light pattern which is uses for the camera to relate its

(28)

10 CHAPTER 3. SYSTEM OVERVIEW

appearance to a distance. This results in a depth image which is a two-dimensional array of elements where each element contains the distance to the closest surface measured by a ray passing through that element.

Figure 3.2: Asus Xtion Pro Live.

3.3

Integration of the System

This part of the document is going to describe briefly how the previously ex-plained software and hardware are integrated together to create the whole sys-tem.

In this system, the robot arm, which is connected to the central agent by means of an Ethernet connection, is sending the information about its current state. This information is used by the central system to compute new move-ments commands using “MoveIt!”. At the beginning of an inspection process, the robot send to the central computer its current pose. Then a path plan is calculated to send the robot to several positions to take depth images from different viewpoints. When the robot starts the plan, the camera is sending the information of the depth images to the central agent which is reconstructing the 3D scene by fusing all the depth images with the “SDF Tracker” package. Once the robot reaches all the poses, the model is completed and the identification process starts.

At this point, the camera stop the streaming of images. Then, the sdf model is converted to a point cloud and the PCL library is used to detect a specific part of the engine. The analysis with the PCL library gives as a result one or more instances of a model of the oil filers registered into the engine. This instances represented in point clouds are refined by the use of the “SDF Tracker” and then a verification is done to reject the wrong identifications and to get the final result. The next image illustrate the connections of the system.

(29)

3.3. INTEGRATION OF THE SYSTEM 11

(30)
(31)

Chapter 4

Building the SDF Model

In this chapter, the building of the SDF Model is going to be described. This model represent a 3D reconstruction of the environment. First, the SDF models are explained briefly. Then the different steps to adapt the work to the different environments are explained.

4.1

SDF Model

As mention before, the main task of the project lies in identifying different parts in a 3D representation of an engine model. The 3D representation em-ployed during this project is called SDF model. To build these models, a depth camera is used. This section provides an explanation of how these models are generated.

These SDF models are built by means of the truncated distance function (TSDF) which is an implicit function defined over a 3D space using discrete samples on a cubic lattice also called voxels.In other words, this structure can be described as 3D matrix of small cubes (voxels). These voxels stores an ap-proximate distance to the nearest surface observed by the depth camera. This distance value is calculated as the difference in depth between the coordinate of the voxel and the coordinate of the nearest measured surface point. The com-parison is made in the frame of the sensor. Therefore, the value of a voxel is smaller when it is close to a surface. This output value is positive if the voxel is located in front of the surfaces and negative if it is located behind the observed surface (taken the sensor as reference). The region that represents the actual surface of the object is defined as the “zero distance”. The distances are trun-cated to a pre-defined maximum and minimum value which define if the value represent the actual surface. Figure 4.1 shows a 2D example of the TSDF.

(32)

14 CHAPTER 4. BUILDING THE SDF MODEL

Figure 4.1: 2D example of the Signed Distance Function.

In the image, the black line represents the observed surface, and the coloured circles are the voxels. Positive distances are dark green to light green and nega-tive distances are dark red to light red. The zero cross value is indicated in grey. The zero cross values indicate the estimation of the surface.

Therefore, the TSDF representation provides an accurate 3D representation where the voxels can be used to compute information such as surface position, orientation and curvature. This surface is also fast to build and at the same time, it avoids the accumulation of points that other 3D representations may produce by making a more compact structure. For example, in the case of using directly a point cloud the number of elements in the structure (points) would be larger. This information will be used for the identification of the parts in the 3D model.

4.2

SDF Model From Simulation Data

Once the concept of SDF is explained, the steps to build the 3D model of the engine can be explained. The first step in this phase lies in the replication of the real settings in a simulated environment. In this case, the robot and its envi-ronment were represented in the simulator Gazebo. This allows building a 3D model of the simulated object with simulated sensors. The Gazebo simulator

(33)

4.2. SDF MODEL FROM SIMULATION DATA 15

also allows simulate the movements of a robot establishing the connection with the controller of MoveIt.

4.2.1

Replication of the lab in Gazebo

In this section, the integration of the different models to replicate the real sit-uation is going to be explained. As mention before, the simulations were per-formed using Gazebo. First, the model of the robot, UR10, was represented by using the package Universal Robotic for ROS. This is one of the packages that compound the ROS-Industrial project and it contains all the files necessaries to load the model and the controllers of the UR10 into Gazebo. The files used to represent the model of the robot are called URDF and they are in .xacro format which is an XML macro language used to simplify the syntaxes of the file. The description file contains all the measures of the joints of the robot and the kinematic and collision properties.

This description file was modified to include a simple model of a table under the robot. This model was done directly using Gazebo since it allows to create simple shapes. The robot was attached to a table to represent a more real situa-tion. Each time a new element is added to the URDF file, a joint and a link need to be defined. In the joint, a parent and a child links must be defined and this is used to attach the child link to the parent. In this joint, you also can specify the position that the child link has with respect to the parent link. On the other hand, the link is used to define the properties of the element, in this case, the visual properties, the collision geometry and the physical properties. The links are also used to publish in a ROS topic (called tf) the location of each frame in the scene. This can be used to get the transformation matrix from one frame to another which can be used for the tracking of the pose of the camera.

Similarly, the model of a Kinect camera was included also in the scene to represent the depth camera used in the project since both has similar specifica-tions. The camera was attached to the end effector of the robot as in the real robot. The plugin of the depth camera was also utilized in order to get depth images of the scene. This plugin simulates a real depth camera by publishing the necessary topics in ROS. The plugin allows specifying all the properties of the camera as the resolution or the frame rate.

In addition, a model of an engine was introduced in the scene to make the simulations close to the real case [37]. The representation of the simulated environment is shown in the Figure 4.2.

(34)

16 CHAPTER 4. BUILDING THE SDF MODEL

Figure 4.2: Virtual representation of the lab in Gazebo. The engine is the model in gray. The table is represented in white colour and the robot is placed above it.

4.2.2

Motion Planning

In this section, a simple motion planning to control the robot arm in the sim-ulator is going to be explained. This will help to understand how “MoveIt!” works. This plan involves a simple point-to-point trajectory. These points are represented by the joint configuration that the robot needs to reach each point. Then, the robot was placed in different locations using Rviz and the state of the joints was read to get the input points.

All this was performed with the package MoveIt! for ROS. This package has controllers for a large quantity of industrial robots, one of them, the UR10. The mentioned Universal Package for ROS has the necessary files to control the robot by using MoveIt!. The control of the robot can be done for a real robot or for a simulated one by activating one of the input parameters. In Figure 4.3, the MoveIt! motion planner in RViz is shown.

(35)

4.2. SDF MODEL FROM SIMULATION DATA 17

Figure 4.3: MoveIt! Motion planner in RViz.

In order to build the SDF model of the environment the robot needs to be moved to different poses to have different viewpoints. A simple “MoveIt!” planer follows the structure specified next. First, the joint that ar going to be moved are declared into an object structure. In this case the structure contains all the links that compose the robot arm from the base to the end effector. Then the poses or objectives have to be specify. This can be done as joint configura-tions or directly as Cartesian points. “MoveIt!” can calculates a plan with that and if a plan is computable, it sends the movement movements instruction to the robot. This needs to be done for all the poses that are going to be used in the path to build the SDF model.

Furthermore, a ROS publisher was added to publish a flag that stops the registration of the points. This is done to stop the reconstruction process au-tomatically when the robot reach to the last pose allowing to start with the recognition process.

4.2.3

SDF Tracker Integration

Once the path of the robot is defined, the package “SDF Tracker” has to be used to build a 3D model of the environment. This package uses the depth images from the sensor to build a 3D model using the signed distance function. Then, it tracks the position of the camera in each moment by estimating the transformation matrix between the last two depth images. Next, the pixels of the depth images are converted to 3D using the parameters of a pinhole camera

(36)

18 CHAPTER 4. BUILDING THE SDF MODEL

and they are transformed with the previous estimated transformation matrix. Last they are query into the current SDF model and the corresponding voxel to that point is updated with a distance value. The algorithm iterates over all the pixels of the depth image to produce the signed distance function representation of the model.

The SDF tracker package tracks automatically the position of the camera using the transformation between depth images. This means that the robot arm only can move in small intervals between one and the next position. If the robot movements are more than approximately 50 cm then the registration of the points may accumulate large errors producing wrong registrations. Similarly, significant changes in the orientation of the camera also may produce failures in the registration. To avoid this, the SDF Tracker was modified to track the movement of the camera using inverse kinematics, generating more robust reg-istrations. This also requires less computation since the transformation matrices to make the tracker can be calculated with simple calculations.

The SDF tracker needs the pose of the camera respect to the centre of the object which is going to be modelled. This transformation can be calculated using the transformation matrix of the camera relative to the world (the origin of our environment) and the transformation matrix of the object respect to the world. Both of these matrices are known (since they are published by the tf topic in ROS). The calculation is explained in the next diagram in more detail.

(37)

4.3. SFD MODEL FROM LAB SCENARIO 19

At the beginning, the SDF tracker was modified to stop the registration during the robot movements. This means that the SDF was updated only when the robot was stopped in one of the poses of the path. This modification was done to avoid coordination problems because the robot uses Ethernet and the camera uses USB connection. But during the test, it was proved that updating the information during also during the movements does not affect to the quality of the result so this change was not applied.

4.3

SFD model from Lab Scenario

The next step in the research was to adapt the previous work into a real en-vironment with a real robot. The next section describes the different task per-formed to achieve this objective.

4.3.1

Design of the mount of the camera.

The camera needs to be attached to the robot arm to acquire the depth images from different viewpoints. The movements of the robot from point to point can be executed in a high velocity so the camera has to be fixed to avoid movements

(38)

20 CHAPTER 4. BUILDING THE SDF MODEL

that may produces failures in the reconstruction of the environment. In partic-ular, a different position of the camera to the one specified in the URDF file makes invalid the transformation matrix, since is not representing the correct pose of the camera, which will produce an inaccurate 3D model.

To solve this, a CAD model of a mount for the camera was designed and then printed in a 3D printer. The model has been built in two pieces. One of the pieces is a flange plate that is screwed to the end effector of the robot. The camera is screwed to this part using the hole to attach the support (Figure 3.2). The other piece is a little clamp to keep the camera more fixed. A first concept of this model is shown in Figure 4.4.

Figure 4.4: CAD model of the mount of the camera. The flange plate is the big rectangle piece. The clamp is attached to the plate by using two screws.

Some modifications were added to the previous model to allow to the 3D printer creating the model. The used printer is a basic model which does not use supporting material so it needs to print layer over layer. Figure 4.5 illustrates the final version of this mount that saves material by reducing the dimensions. This mount represents a provisional version since the final version used in the whole inspection of the truck will be an interchangeable mount that allows changing the camera for another tool.

(39)

4.3. SFD MODEL FROM LAB SCENARIO 21

Figure 4.5: Final Design modified by Robert Krug. The dimensions of the clamp are significantly reduced. The flange plate is also modified to save material and to make the printing faster.

4.3.2

Camera Calibration

A calibration of the sensor is essential in order to take more accurate measures from the sensor. A basic “manual” calibration of the camera pose was done in order to reduce the errors . To do this kind of calibration, the values that define the pose of the link of the mount of the camera are slightly modified to offsetting the linear and the angular errors.

For the linear error, the distance from the lens of the depth camera to a flat surface (a table) was measured (Figure 4.6). This was done by subscribing to the topic publisher by the depth sensor. This topic contains a 2D matrix with the values of the distance from the lens to the point of the surface that represents the pixel of the image. The area of interest of the depth image was reduced to a 10 by 10 area centred on the lens of the camera to discard noise. Then, the mean of the distance value in these pixels is taken and compared with the real value. Then, the difference between these values is the offset that needs to be included in the camera mount link. This is done by averaging over different points of the surface.

(40)

22 CHAPTER 4. BUILDING THE SDF MODEL

Figure 4.6: Diagram of the calibration of the linear errors.

For the angular error, the camera was rotated a specified angle along one of the axis. Then the point cloud was saved and the normal to the point cloud was calculated. This normal allows computing the angle between the reference axis and the plane of the point cloud. Then a transformation is applied to the point cloud to check if the point cloud matches with the previous point cloud before the rotation of the camera. This needs to be done rotating the camera along the three axes to correct the error of the pitch, roll and yaw angles.

The results of the calibration can be checked in the SDF Models. Errors in the angle can produce distortions in the created 3D models. For these reasons, the corners and the flat surfaces are a good indication of the quality of the calibration.

4.3.3

Motion Planning

During this section, the motion planning was modified to allow the robot to find a suitable path in any situation. The previous explained motion planning uses pre-defined poses for the robot. This positions need to be changed if the environment is modified making the process inefficient and less flexible. This section proposes a path which is generated dynamically from the current pose of the robot.

(41)

4.3. SFD MODEL FROM LAB SCENARIO 23

To do this, a node was created to make a grid of poses around the initial pose of the end effector which is the pose that the robot has at the time of executing the algorithm. Figure 4.7 illustrates the process:

Figure 4.7: Diagram of the path planning. The image to the left represents the end effector and its plane in the space. The image to the right shows the generated poses (circles) by the algorithm. Initial position is indicated in red, the others in gray. The parameter d indicates the distance between poses and the parameter M and N indicates the number of poses, in this case 3x3 poses.

The points into the grid of positions are generated in the same plane of the end effector. This allows that an user can place the robot in a random initial position and then the grid of positions is going to be generated along the plane of the end effector independently of the pose of the robot. To calculate the points, a translations is applied to initial pose. This is used to generate the first row. Then the others row are done by applying another translation (forward or backward) to all the points of the first row. The orientation will be the same for all the positions and the camera will point to the normal of the plane of the end effector.

4.3.4

Adapting the previous work for the lab Scenario

Last, some changes were performed to adapt the previous work to the new environment. The URDF of the robot was modified to add the new elements present in the laboratory. A model of the box was created to represent the aluminum profile where the robot is located. Also the position of the robot was adjusted. In the real laboratory, a wall close to the robot limited the robot movements and for this reason it was included also in the URDF. The update of the description files are important for the path planning and for the 3D reconstruction of the environment. On one hand, the representation of these objects specified in the URDF file allows to take them into account for the path planning. In this way, the planner know the location of the object in the

(42)

24 CHAPTER 4. BUILDING THE SDF MODEL

environment and it can avoid the possible collisions with the them. On the other hand, the correct location of the objects (specially the robot) allows the correct calculation of the transformation matrices. Figure 4.8 shows the situation in the real environment and in the URDF represented by Rviz.

Figure 4.8: Representation of the lab in RViz.

The 3D model of the sensor was modified to have a model with the same size than the camera Asus Xction Pro used in the real experiments. A simplify 3D model of the camera mount was also introduced in the description file of the robot. The measures of these models were adjusted to be as close as possible to the real objects. As mention before, more accurate measures allow having more accurate estimations for the tracking of the camera. After doing these changes different SDF models were taken in the laboratory. Some results are illustrated in the chapter 6.

4.3.5

Collecting Data at ARHO

The final test of this phase of the project was done at the company ARHO with a real engine. The description file of the robot was modified to update the new environment. The robot arm was placed in a movable low aluminium table to inspect the low part of the engine. Figure 4.9 shows the robot over the movable platform with the engine in the back.

(43)

4.3. SFD MODEL FROM LAB SCENARIO 25

Figure 4.9: Robot arm over a movable platform in ARHO.

For this reason, the model of the previous table was modified and two walls were added to the URDF to represent the limited work space of the robot. The limited space produces that the robot to only can perform the movements starting from an specific initial pose. The robot has to be placed in a intial position that ensures that the robot does not reach one of its joint limits. A 3D model of an engine was also introduced to have it as a visual reference in the simulator. In Figure 4.10, the representation of the environment is shown in RViz:

Figure 4.10: Simplified representation of the environment at ARHO.

(44)
(45)

Chapter 5

Object Recognition

During this section the process for recognizing the parts is going to be de-scribed. Once the model is built, the process of identification can be carried out. The oil filters are the parts to detect during this research. The recognition of this part requires an individual model of the oil filters to have a template and find it into the engine model. For this reason, a simple representation of these filters was created in software of 3D modelling. The representation is composed by three cylinders because in the real engine, the three oil filters are always to-gether and with the same separation. Figure 5.1 shows the real filters in the engine and the created model.

Figure 5.1: The real oil filters at the left image and their 3D model to the right.

The model was created in STL format which is a stereolithography file for-mat used for 3D representation [33]. After the creation of the model, a virtual scan in the simulator is done in order to acquire the point cloud that represents the model. The point cloud format makes the computation of the registration process more accurate than a mesh format since it makes possible the analy-sis in a smaller scale, in this case individual points. At the same time it allows

(46)

28 CHAPTER 5. OBJECT RECOGNITION

employing the methods available in the Point Cloud Library which require this format for the analysis. Figure 5.2 shows the acquired point cloud with the virtual scan.

Figure 5.2: Virtual scan of the oil filters 3D model.

The conversion of the model to a point cloud allows the identification of this part into the engine model. There are several methods to recognize parts into a reconstructed scene. The simplest way lies on a brute force matching using iterative closest point (ICP) to register the points directly into the SDF model [9]. This method needs less computation than applying ICP to the point cloud due to the compaction of the Grid Voxel representation. The drawback of this method lies in the need of a good initial guess for the point cloud of the model. This method will converge into the true position if the initial guess is close to this position. This is because it performs the registration along the points close to the initial estimation. This makes it necessary to find a method to get this initial guess.

The method involves a recognition process using keypoints and features descriptors. For simplicity, the SDF model will be converted to a point cloud and the PCL library is going to be used to accomplish the identification process. The keypoints with the descriptor is used to match several correspondences. These correspondences are filtered to reject the ones that are wrong. Then, the transformation matrices to get the good correspondences are calculated. The models transformed with the matrices, are used as initial guess for the ICP over the SDF and the registration is refined. Last a process of verification is done to reject wrong identifications. Figure 5.3 illustrates the different process involved in the identification of the oil filters.

(47)

5.1. SEGMENTATION 29

Figure 5.3: Identification process for the input pipeline.

The next sections describe the different sub-processes in more detail.

5.1

Segmentation

After converting the SDF model to a point cloud, a segmentation is done to discard those parts of the model that are not relevant for the analysis. Segmen-tation is the process of partitioning a 3D model into different sub-models to simplify the representation of the object. In this case, the point cloud will be divided into several sub-sets of points.

The segmentation process is applied into our model to discard the metal structure under the engine. This structure is used to keep the engine elevated but it is not part of the engine. The exclusion of this part discards the keypoints over this area. This facilitates the matching over the area around the good position of the filters.

Also, the side of the engine that does not contain the oil filters is going to be discarded to simplify the detection process. The segmentation is done following the next steps. First, the centroid of the point cloud is calculated. Then, the points that are under the centroid plus and offset are discarded. The offset value can be adjusted to discard only the structure under the engine. The same is done for the side of the engine that does not contains the filters. In this case the volume of the region of interest to be analyzed has approximately a value of 1.2 x 0.45 x 0.4 m (See Figure 5.4. Algorithm 1 summarizes the segmentation process.

(48)

30 CHAPTER 5. OBJECT RECOGNITION

Figure 5.4: Region of interest to be analyzed. The right image shows the bottom part of the engine. The left image shows one of the sides of the model.

Algorithm 1 Segmentation()

Ensure: Segmentation of the point cloud.

1: Define a new point cloud C2.

2: Compute centroid C of the input cloud. 3: for all points p in the input point cloud do

4: if p(i)Y < CY+ offSet and p(i)Z< CZ+ offSet then 5: Add p(i) to the point cloud C2

6: end if

7: end for

Figure 5.5 shows the three point clouds labeled with different colors af-ter applying the segmentation algorithm (Alg. 1) over the point cloud of the engine.

(49)

5.2. SURFACE NORMALS ESTIMATION 31

Figure 5.5: Segmentation of the engine point cloud. The structure that holds the engine is labeled in blue. The part of the engine that contains the oil filters is labeled in green. The rest of the engine is labeled with red color.

5.2

Surface Normals Estimation

The next step in the registration process involves calculation of the normal sur-faces [34].Surface normals are one of the more important features in computer graphics since they can be used for multiple purposes (e.g. shadows and light generation). In 3D recognition, the surface normals are commonly used for the computation of features descriptor. Surface Normals in point clouds can be computed using two approaches. The surface normals are computed directly over each point of the point cloud.

The surface normal is estimated by analyzing the eigenvectors and eigenval-ues of a covariance matrix created from the nearest neighbours of the current analyzed point. Mathematically this is defined as follow. For each point pi, the

covariance matrix C is:

C = 1/k k X k=1 (pi− pc) · (pi− pc)T (5.1) C· ~vj = λj · ~vj, j ∈ {0, 1, 2} (5.2)

Where k is the number of neighbours of pi, pcis the 3D centroid of the nears

neighbour and, λ is the j-th eigenvalue of the covariance matrix and ~vjthe j-th

eigenvector.

In the PCL library the surface normals can be computed as explained in Algorithm 2. The algorithm requires to adjust the parameter k that determines

(50)

32 CHAPTER 5. OBJECT RECOGNITION

the number of nearest neighbours of the current point. A small number of k will produce more accurate normals because it takes into account smaller details. At the same time, the computational cost will be higher since it is necessary to iterate over a large number of points.

Algorithm 2 NormalComputation()

Ensure: Computation of the surface normals.

1: Declaration of k as a fixed number of neighbours.

2: for all points p in the input point cloud do

3: for k neightbours of p do

4: Compute centroid of the neighbour. 5: end for

6: Compute the covariance matrix C as explained previously.

7: Analise the eigenvector of the eigenvalues to estimate the surface normal

n.

8: if n is consistently oriented towards the viewpoint then

9: add n to the normal point cloud

10: else

11: Flip n and add it to the normal point cloud.

12: end if

13: end for

5.3

Keypoints Extraction

The next step during the detection process lies in the calculation of keypoints [35]. The keypoints are points in the point cloud that have two main properties: • Repeatability: it must be possible to detect the keypoint over several frames even if the environment is captured from different angles or if the scene presents noise.

• Distinctiveness: the keypoint should be enough descriptive to allow an easy matching. This characteristic also depends of the chosen descriptor. The use of keypoints in combination with a local feature descriptor can be used to form a compact and descriptive representation of the point cloud. This representation can be used to match two models with similar geometry characteristics.

There are a wide variety of keypoints detectors [39], some of them are specifically proposed for 3D points cloud and others are derived from 2D key-points detectors. In [13] a comparison between the most common techniques

(51)

5.3. KEYPOINTS EXTRACTION 33

to extract the keypoints is done. Next, those detectors that have been tested during the evaluations are going to be briefly described.

5.3.1

Uniform Sampling

This detector performs a downsample of the point cloud and distributes the points uniformly with the same Euclidean distance between its centroid. Then, these points are taken as keypoints. The uniform sampling employed in the PCL library uses a 3D voxel grid created over the input point cloud data. This 3D voxel grid can be seen as a set of small 3D boxes in space. Then, all the points inside each voxel are approximated by their centroid. The size of the voxels can be modified to change the amount of keypoints. A higher number in the size of the voxel means less keypoints as result. Therefore, this size is a tunable parameter which is named “leaf size”.

Figure 5.6: Example of Uniform Sampling. The mug to the right shows the result of applying a uniform sampling to the mug of the left. The size of the voxel was 0.1 m. The model of the mug was taken from [37].

5.3.2

Intrinsic Shape Signatures

Intrinsic Shape Signatures (ISS) is a method created for 3D point clouds which is based on region-wise quality measurements [43]. The method uses eigen-values to perform the extraction. The smallest eigeneigen-values are used to include points with large variations along each principal direction. At the same time, the ratio between two successive eigenvalues is used to exclude the points that have similar spread along principal directions.

(52)

34 CHAPTER 5. OBJECT RECOGNITION

More specifically, the algorithm has two main steps. Taking the eigenval-ues in decreasing magnitude order as Λ1, Λ2, Λ3, a pruning step is perform to

discard points with similar spreads along the principal directions. The points are discarded using a threshold as follow at equation 5.1. In this case a smaller value for this threshold will filter a higher quantity of weaker keypoints.

Λ2(p) Λ1(p) < T h12 ∧ Λ3(p) Λ2(p) < T h23 (5.3)

During the second step a Non-Maxima Suppression over saliency is applied. The saliency, defined as ρ(p), is the value of the third eigenvalue, so ρ(p) = Λ3(p). Thus, only the points which show a large variation along each principal

direction are considered as keypoints. This Non-Maxima Suppression step also use a threshold as a tunable parameter to discard a lower or higher number of keypoints. As in the previous step, smaller values for this parameter will discard more keypoints.

5.3.3

Harris 3D

This detector is derived from the 2D implementation and is based on detec-tion of corners and edges [18]. The method uses changes in the intensity over the horizontal and vertical directions. The 3D version employs the surface nor-mals to calculate the covariance matrix (Cov), around each point in a 3 x 3 neighborhood.

The method defines a term called keypoints response r(x, y, z) which is mea-sured at each point as:

r(x, y, z) = det(Cov(x, y, z)) − k(trace(Cov(x, y, z)))2 (5.4) The parameter k is a positive real value. It is used as a lower bound for the ratio between the magnitude of the weaker edge and the one of the stronger edge.

Once the keypoint responses are computed, a non-maximal suppression procces is applied to suppress weak keypoints. Then a thresholding process is applied. This prevents for having too many keypoints close to each other.

There are multiples variants of this keypoint detector. During this research, the Kanade-Lucas-Tomasi implementation [38] is used. This version of the al-gorithm has the same basis as the normal Harris 3D detector. The main differ-ences lies in the calculation of the covariance matrix. It is calculated directly in the input value instead of the normal surface. The calculation of the keypoint response is also different. In this case, the first eigenvalue of the covariance matrix is used. More specifically, this value is evaluated around each point in

(53)

5.3. KEYPOINTS EXTRACTION 35

a 3x3 neighborhood.Then the keypoints with the smallest eigenvalues are dis-carded by a threshold parameter. Smaller value for the threshold will discard more keypoints. This variation of the Harris 3D algorithm was used due to the slightly better results during the comparison made in [13].

5.3.4

SIFT 3D

The Scale Invariant Feature Transform (SIFT) is a method originally proposed for 2D data [23]. In this method, the features are represented by vectors that represent local cloud measurements. The version of the algorithm for 3D data [15], employs a 3D version of the Hessian to select the keypoints. In particular, a density function f(x, y, z), is approximated by sampling the data regularly in the space. Then a scale space is computed over this density function. Last a search is done for local maxima of the Hessian determinant. The algorithm is explained below more in detail.

First a convolution over the input point cloud is done. Using a number of Gaussian filters whose standard deviations {σ1, σ2, ...σn}differ by a fixed

scale factor where σi+1 = kσi and k is the fixed scale factor. The result of

these convolutions are smoothed point clouds defined as G(x, y, z, σi). Next, a

subtraction between adjacent smoothed point clouds is done to produce a small number (between 3 and 4) of Difference of Gaussian clouds D(x, y, z, σi).

These steps are repeated until a number of Difference of Gaussian clouds (DoG) over the scale space is acquired. Once this is computed, the keypoints are detected as local minima/maxima of the DoG clouds. To do this, each point in the DoG clouds is compared with its neighbour at the same scale and with nine corresponding neighborhood points in each of the neighborhood scales. If the value of the point is the maximum or minimum among all compared points, the point is selected as possible keypoint.

Last, the previous calculated keypoints are examined to eliminate the less stables. More specifically, keypoints are discarded if the two local principal curvatures of the intensity profile around the keypoint exceed the threshold value. In this case, the local principal curvatures are the maximum and the minimum curvature values of that intensity profile. Figure 5.7 illustrates the application of the SIFT in 2D system.

(54)

36 CHAPTER 5. OBJECT RECOGNITION

Figure 5.7: Representation of the 2D SIFT algorithm. For each scale, the input image is convolved with different Gaussians that differ in fixed scale value. Then the gaussian images are subtracted to get the different of gaussian images. Last, the keypoint is detected by local minima or maxima between its neigbours in the same scale.

5.4

Feature Descriptors

In computer vision the features description characterizes visual features such as shape, colour or texture of the data to be analysed. In point clouds, they represent the signature of a point which contains a significant quantity of in-formation about the surrounding geometry. This allows identifying keypoints in the point cloud in spite of the presence of noise, changes in the resolution or transformations.

Nowadays, the number of features descriptors is large. They have their own methods for computing unique signatures for a keypoint. For example, some use the distances between the points while others use the difference between the angles of the normal of the points and its neighbours. This means that some of them are better or worse for certain situations.

Once the value that defines the unique signature is calculated for a keypoint, the result is binned into a histogram to reduce the descriptor size. To perform

(55)

5.4. FEATURE DESCRIPTORS 37

this, the value range of each variable that constitutes the descriptor is parti-tioned into n subdivisions called bins and the number of occurrences in each bin is counted. The value of each bin is incremented in one when there is an occurrence between the partitions that divide that bin.

Descriptors can be divided in two main categories, local and global descrip-tors. The process of recognition is different depending on the used descriptor. Global descriptors are applied directly after a segmentation process. Alterna-tively, local descriptor needs the extraction of keypoints before its computation. In this way, a global descriptor is used for each segmentated part while multi-ple local descriptors are used to do the matching. The use of several descriptors make the process of matching more robust as not all the descriptors need to be matched. For this reason, local descriptors have been used during this work. Next, local descriptor is going to be described.

5.4.1

Signature of Histograms of Orientations (SHOT)

SHOT is a descriptor that creates a sphere as a support structure [40]. This sphere is centred at the keypoint that is being computed for the descriptor, with a given search radius. This structure is divided in 32 bins or volumes, with 8 azimuth divisions, 2 elevation divisions and two radial divisions. See Figure 5.8.

Figure 5.8: Visual representation of the support structure used by SHOT. Only 4 azimuths are represented. Figure taken from [40].

One dimensional local histogram is computed for every volume. To compute the histogram, the cosine between the normal of the keypoint and the current point within in the volume is used. Then, a final descriptor is computed by mixing all histograms together in the same local reference frame. The local reference frame provides a unique orientation for each keypoint. This improves

(56)

38 CHAPTER 5. OBJECT RECOGNITION

the stability of the descriptor, it reduces its size and it makes the descriptor rotation invariant.

5.5

Matching Correspondences

After computing descriptors for the model of the filter and the engine, a match-ing between them is performed to find point-to-point correspondences. This is done using the Euclidian distances of the descriptors. The correspondences are composed by each descriptor of the engine and its nearest neighbour in the oil filters. A matching threshold, MinDist , is used to discard weak correspon-dences. The search is done using a k-d tree as data structure. Specifically, the fast approximate nearest neighbours (FLANN) algorithms is used to speed up the calculations [25].

Algorithm 3 Matching()

Ensure: matching of the descriptors.

1: for all keypoints descriptor pengin the engine. do

2: Find nearest neighbour into the oil filter model keypoints descriptor

cloud pfilter.

3: if Squared descriptor distance between peng and pfilter is less than

MinDist then

4: add the couple of keypoints as a good correspondence into a vector of correspondences.

5: end if

6: end for

5.6

Correspondences Grouping

The previous step generates a list of correspondences between keypoints in the engine and keypoints from the filters. This list may contain wrong cor-respondences so in order to filter out the wrong ones; a process usually called Correspondence Grouping is applied. During this process, correspondences are discarded by enforcing geometrical consistency between them. This is done by grouping the set of correspondences between the target and its instances in the engine into subsets. Each subset contains the consensus for a specific translation and rotation that fit the target into the scene. Then, the value of the consensus is compare with a threshold and if the value is less than the threshold the subset is discarded.

A wide quantity of methods can be applied to find the geometry consis-tency as proposed in [41] or [24]. In this case, the used algorithm performs an iterative approach based on simple geometric consistency between pairs

(57)

5.7. POSE DETECTION 39

of correspondences. More specifically, the algorithm takes a correspondence ci = {pfili , p

eng

i }, and it iterates over all correspondences that are not yet

grouped cj = {pfilj , p eng

j }. Then the correspondence cjis added to the group

if:

kpfil

i − pfilj k2 − kpengi − p eng

j k2 < ε (5.5)

where pfil is a point of the oil filter point cloud, peng is a point of the engine

point cloud and ε is a threshold parameter.

5.7

Pose Detection

The previous step discards a high number of geometrically inconsistent corre-spondences but it does not guarantee a unique 6-DoF pose of the oil filters. This implies the necessity for applying an additional process to reduce the number of selected correspondences by eliminating those which are not consistent with the same pose. This is done using random sample consensus (RANSAC) [14]. Then, the algorithm takes the previous set of correspondences between the filters and the engine and it determines the rotation matrix ˜R and the 3D translation

vec-tor ˜T which define the transformation that best fit them. This transformation is computed using least square minimization. For a set of N exact correspon-dences c1,c2,...,cN, where each correspondence contains the correspondences

points between oil filters and the engine, ci = [pfili , pengi ], the transformation

is calculated as: argminRT N X i=1 ||pengi − R· pfil i − T ||22 (5.6)

5.8

ICP refinement and Validation

In some cases, the clustering made during the correspondence grouping step can result in the detection of only one good instance placed in the correct position. But normally several instances of the model are going to be found in the engine, some of them close to the good position and the others in a wrong location. This step registers all the instances found in the SDF by refining the pose and then, the correspondence with a better location is selected as the good one.

The refinement process is done using the technique iterative closest point (ICP) which calculates the transformation necessary to minimize the distance from the oil filters to the engine [9]. In this case, to make the calculation faster a variation of the classical ICP is applied which is directly performed over the SDF structure. The method is described in detail in [8]. The process consists

(58)

40 CHAPTER 5. OBJECT RECOGNITION

of iterating through each point pertaining to the oil filters model to the engine model. The iteration is done at least once and the SDF is evaluated at each individual point. If all the evaluations for each point of the SDF result in a value equal to the maximum value of the truncated distance Dmax, it means that the oil filter is not close to a surface and the registration is skipped. Thus, only if the model is near to a surface the registration process is done. The time for each registration is proportional to the number of points of the object to be detected. Algorithm 4 illustrates this process.

Algorithm 4 ICP()

Ensure: refinement of the pose of the instances.

1: for all point clouds ¯P in the list of found instances over the engine model.

do

2: for all points ¯p in point cloud ¯Pido

3: Compute the value D of the closest surface in the SDF.

4: if D > Dmax for all the points in ¯Pithen 5: ¯p is not close to a surface, go to the next point.

6: else

7: Compute closest point q to ¯p in the SDF model.

8: Estimate transformation and rotation to minimize distance between p and q.

9: end if

10: end for

11: Transform the point cloud ¯Piwith the obtained point cloud. 12: end for

The ICP gives as a result a list of instances registered into the SDF. Let call these instances ¯P. To select the good instance, a score value S is calculated. This value represent the quality of the registration by measuring the distance value for each point of the instance in the SDF representation. Therefore, the score S is the sum of all of the D values calculated for each point. This means that a good registrations has a low score since the D value represent the distance to the closest surface of the model. Then, the best registered instance can be taken by computing the one with the minimum score value. The drawback of this method is that it can calculate instances in a wrong region of the engine which still has a low score value. This is due to the fact that some regions of the engine may present the same geometry that the model to be registered. Also the occlu-sions may produces that some regions of the filters are not visible in the engine model. To solve this, a threshold value was calculated through experimenta-tion since the registraexperimenta-tions around of the correct posiexperimenta-tion in the engine, always shows score values between the same range. Then, the algorithm discards those instances with a lower score than the threshold value. This threshold can be

(59)

5.8. ICP REFINEMENT AND VALIDATION 41

tuned to discard those instances that may have lower score values than the ones that are found in the true position. If the score value is set to 0, the algo-rithm simply takes the found instance with the lower score value. The process of verification is described more detail in Algorithm 5.

Algorithm 5 Verification()

Ensure: good registered instance into the engine.

1: for all point clouds ¯P in the list of registered instances do

2: for all points ¯p in point cloud ¯Pido

3: Compute the value D of the closest surface in the SDF. 4: end for

5: Compute the score Sias the sum of all the DPivalues.

6: if Si< Minimum S value and Si> T hr then 7: Minimum S value = Si

8: Store index i since it is the current best model.

9: end if

(60)

References

Related documents

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating

But even though the playing can feel like a form of therapy for me in these situations, I don't necessarily think the quality of the music I make is any better.. An emotion

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar