Evaluation of a Bisensor System for 3D Modeling of Indoor Environments

(1)

International Master's Thesis

Evaluation of a Bisensor System for 3D Modeling of Indoor Environments

Athanasia Louloudi

Technology

(2)

(3)

(4)

(5)

Studies from the Department of Technology

at Örebro University

Athanasia Louloudi

Evaluation of a Bisensor System for 3D Modeling of

Indoor Environments

(6)

© Athanasia Louloudi, 2010

Title: Evaluation of a Bisensor System for 3D Modeling of Indoor Environments

(7)

(8)

(9)

Abstract

Three dimensional (3D) sensing, analysis and reconstruction of objects and scenes in unknown environments has become an important area of research in several dis-ciplines. Recently, Time-of-Flight (ToF) cameras have attracted attention because of their advantageous ability to generate full 3D range distance information at video frame rates. However, several challenging fluctuations like systematic or random er-rors can occur and thus decrease the quality of the acquired data. Therefore, the us-ability of such systems is very much dependent on an accurate camera calibration. In addition, ToF cameras have still limited resolution (e.g. 176x144) and they can only provide monochromatic (grayscale) information.

This work concentrates on the fusion of 2D vision and 3D range data in order to obtain accurate and visually appealing representations of real indoor environments. In particular two major fields of research are treated. The first concerns the compensation of the common errors that the ToF cameras present. This is achieved by performing a calibration schema that additionally sets the ground towards the creation of a common reference frame for both sensors. The second task is to incorporate these two different sensors into one system that exhibits improved accuracy and high quality output data. The calibration process addresses both photogrammetric and depth correction tech-niques. Moreover, due to the fact that the real world scenes increase the level of diffi-culty that such a system has to confront, a validation of the ToF cameras performance emerges as an important need. For this purpose, a novel approach to evaluate and get a better insight on the accuracy of the ToF camera is introduced, based on a 3D Normal Distribution Transform.

Finally, as sensor fusion plays an important role in this work, multi-sensor appli-cations are further investigated based on the incorporation of such a biform sensory system. Principle goal is to communicate the color information and enhance the over-all result by the use of 3D surface reconstruction approaches.

(10)

(11)

Acknowledgments

I would like to extend my gratitude to my supervisor Todor Stoyanov, who offered me the chance to work on this study. I am grateful for his patient guidance, trust and encouragement during this project.

Many thanks to Achim Lilienthal and the Learning Systems Lab “family”, for being always available in offering help, knowledge and valuable equipment! Thanks also to all the people at AASS, for their ideas and comments.

I am very thankful to my close friends and classmates for the delightful time we spent together all these two years.

Finally I want to warmly thank my parents, Athina and Dimitris, for their bound-less and continuous support through all the steps in my life. Thank you!

Athanasia Louloudi July 2010, Örebro

(12)

(13)

List of Figures

2.1 The MESA Imaging SR4000 ToF camera. . . 23

2.2 Principle of the phase shift measurement . . . 25

2.3 SR4000: Image outputs . . . 26

2.4 SR4000: Available types of Outputs (Images and Depth Maps) . . . . 27

2.5 The Image Source DFK 41F02. FireWire color CCD camera . . . 27

2.6 Sensory System and Sensors’ Output Relation . . . 29

4.1 Radial Distortion . . . 34

4.2 Tangential Distortion . . . 35

4.3 Multiple Reflection . . . 37

5.1 Overview of the calibration procedure . . . 42

5.2 Time-of-Flight and Standard RGB camera Calibration Datasets . . . . 43

5.3 SR4000 Distortion Correction . . . 44

5.4 DFK 41F02 Distortion Correction . . . 45

5.5 Effect of ambient light . . . 46

5.6 Jump edge filter main functionality . . . 47

5.7 Jump Edges Correction . . . 48

5.8 Depth Filtering . . . 49

5.9 The four step procedure towards Stereo Vision . . . 50

5.10 Left and Right Images, after rectification (with horizontal epipolar lines) 50 6.1 3D Normal Distribution representation of the test indoor environment 54 6.2 Scenery test setups . . . 57

6.3 Lighting conditions setups. . . 58

6.4 Pointcloud Matching . . . 58

6.5 Setup 1: Evaluation and results . . . 59

6.10 Scan Performance . . . 64

6.11 Filter Performance . . . 65

7.1 Color Fusion Task . . . 68

7.2 Coloring the pointcloud. . . 69

(16)

(17)

List of Tables

2.1 SR4000 SPECIFICATIONS . . . 26

2.2 DFK 41F02 SPECIFICATIONS . . . 28

4.1 OPTIC ABERRATIONS . . . 34

4.2 LENS DISTORTION PARAMETERS . . . 35

4.3 ToF ERRORS AND ERROR HANDLING . . . 40

5.1 ToF INTRINSIC PARAMETERS . . . 44

5.2 DFK 41F02 INTRINSIC PARAMETERS . . . 45

5.3 SENSORY SYSTEM’S FINAL PARAMETERS . . . 51

(18)

(19)

Chapter 1

Introduction

Perceiving the world in all three dimensions is an important task in the fields of Com-puter Vision and Image Analysis. Three dimensional representations of the environ-ment have been proved to be an important tool in several fields, like in the discipline of robotics. Much of the research focus is pointed on the 3D analysis and reconstruc-tion of objects and scenes based on range sensing. Digitized scenes can provide useful information for a wide variety of tasks, such as object recognition, position determi-nation and localization, mapping or even the retrieval of semantic information. Ac-companied by a great variety of sensors better information for the environment can be obtained.

There are several techniques that provide the ability to perform 3D imaging. In all cases, the sensor technology plays an important role. Apart from the conventional imaging sensors (digital cameras) which can perceive the world projected in two di-mensions, there are sensors known as range finders which are capable of acquiring distance information of an observed scene. A range finder is a range sensing device which can measure the distance to a specific object in its field of view. It may depend on technologies such as sonar or laser, or it may be based on geometrical reasoning in order for the range from a scene to be determined.

In recent years, a new generation of range finders has been developed as an alter-native to common laser scanners and stereo vision setups. Compact and low-priced, these types of sensors are only available the last decade and they are based on the Time-of-Flight principle (ToF). Having not more than a few thousand pixels to char-acterize their resolution capabilities, they have captured much attention due to the advantageous way of measuring the distance, offering great potential for real-time measurement of both static and dynamic scenes.

Time-of-Flight cameras enable interesting applications but unfortunately, several error sources limit their performance and thus their full metric abilities cannot be re-alized without a proper calibration. However even a well calibrated ToF sensor suffers from low resolution and absence of color information. A multi-sensor data fusion ap-proach can increase such a system’s performance and robustness, addressing in this way the problem of combining data from multiple heterogeneous sensors.

1.1 Motivation

By definition, sensor fusion is the combination and permutation of data originating from homogeneous sensors or disparate sources, such that the final proceeded infor-mation is improved in some sense when compared to the initial [1].

In the context of this thesis, the term indicates the process of integrating data from distinctly different imaging sensors for estimating and enhancing several parameters and exchange information. The overall goal is to obtain better information of the en-vironment and enrich in this way the final visual 3D outputs. This goal emerges as an

(20)

important requirement especially when the target application is demanding in means of precision and high resolution true color textures (e.g. indoor environment).

The principle objective of the work presented in this thesis is the use of vision and 3D range data in order to obtain accurate and visually appealing representations of indoor environments. This attempt implies two major fields of research. The first deals with the compensation of the common errors that the ToF cameras present by the use of several calibration techniques. The second task is to incorporate these two different sensors into one system that is capable to interpret information from diverse vision systems, fuse the information with other sensor modalities and make use of it so as to build 3D models of unknown environments.

This work is based upon the use of the Swiss Ranger SR4000 Time-of-Flight cam-era and a standard color camcam-era. The focus lies on the investigation of the Time-of-Flight camera performance in real world scenes. As it is a quite new type of 3D imag-ing sensors, most research is grounded on controlled environment. Therefore a need to evaluate such a camera’s performance in real environment setups is crucial. The spe-cific model belongs to the latest generation; thereby not much information regarding its performance is reported in literature.

1.2 Contributions

The main contributions of the work presented in this thesis are:

• Distance Calibration and Error Handling - An investigation over the errors that can be found in SR4000 are presented along with their possible corrections. • Novel evaluation of the SR4000 ToF camera accuracy - A useful procedure

to evaluate the ToF performance is applied based on 3D Normal Distribution Transform (3D NDT) surface representations.

• Toolbox (open source) for ToF calibration - This work presents a first attempt to obtain a custom made calibration toolkit for ToF cameras as a single sensor as well as a system that enables data fusion.

1.3 Thesis Outline

The presentation of this work is distributed in eight chapters. In more detail, it is organized as following:

Chapter 2 describes the sensory equipment used in this work, consisting of two imaging sensors; a Time-of-Flight camera and a standard RGB camera. An overview of the general characteristics and specifications is presented, companied by a discus-sion over the functionality of the sensors when performing as a system (setup, avail-able outputs, similarities and differences which need to be considered when using this biform sensory system).

Chapter 3 deals with a description of the past research work over the two main aspects of this project (ToF calibration and Sensor Fusion). In addition, it is reported an overview of the work related to the SR4000 model which dominates the investigation of this work

Chapter 4 is a report based on literature that introduces the common errors with respect to each sensor. The detection and a deep knowledge of these fluctuations is important before proceeding to the systems calibration because they can influence the overall performance.

(21)

Chapter 5 presents an investigation and the calibration schema which is respon-sible for the error correction and in a more general view, the sensors’ functional en-hancement. Two main types of calibration underlying this work are a single camera calibration and a stereo calibration.

Chapter 6 deals with the experimental evaluation of the Time-of-Flight camera ac-curacy. An explanation of the experiment along with the methodology and the arising results are presented.

Chapter 7 presents applications of the sensory system towards the enhancement of the 3D representations and always with respect to its performance under a common framework. The use of a Time-of-Flight (ToF) camera is investigated along with its possible enhancement when collaborating with a high resolution digital camera. Chapter 8 summarizes the thesis, discusses the limitations of the system and pro-posed directions for future work.

(22)

(23)

Chapter 2

Sensory System Description

One of the main ideas which implicitly underlies the sensor fusion in this work, is the 3D vision enhancement using 2D/3D sensors. The sensory system which is chosen for this project is composed of a 2D digital camera and a 3D range sensor. The prin-cipal goal is the 3D reconstruction of indoor environment through the incorporation of these two sensors to a biform system that communicates and fuses information. Therefore, each single sensor is presented in this chapter in order to better understand their characteristics and utilities.

2.1 SR4000

The first sensor to be described, is the MESA Imaging SR4000 presented in Fig. 2.1. It is a Time-of-Flight camera (ToF) which means that it is a range measuring device that can be used for creating 3D models of the real world.

In a more general point of view, the ToF cameras are 3D imaging sensors which only became available nearly a decade ago [2]. Surrounded by the well accounted range finders (e.g. lidars, laser triangulation systems, stereo vision systems etc.), they seem to be quite promising and they are attracting attention in many fields. Amongst the Time-of-Flight sensors, this specific model is appertained to the latest generation; thereby not much information regarding its performance is reported in literature.

Figure 2.1: The MESA Imaging SR4000 ToF camera.

It is meaningful to have a deep understanding of the sensor’s characteristics and utilities before proceeding into discussing their possible applications. Therefore, the following section introduces the working principle of this type of cameras along with the design specifications of the SR4000.

(24)

2.1.1 Time-of-Flight

Time-of-Flight as a term, describes a variety of methods which measure the amount of time that it takes for an object, to travel through some distance. In contrast to laser scanners which also provide a sampled representation of a scene, the ToF do not need to make a series of range measurements in uniform angular augmentations, but instead they can capture a scene at once. Therefore they can be named as scannerless exactly because of the fact that they do not need to scan the environment for being able to represent it. Two basic variations of the ToF principle have been implemented:

1. Time-of-Flight systems which use the time-correlated single-photon counting technique to produce three-dimensional depth images.

2. Time-of-Flight systems which use amplitude modulated light.

In ToF systems which use the time-correlated single-photon counting technique to produce three-dimensional depth images, the distance is calculated based on single-photon detection. By using arrays of single-single-photon avalanche diodes (SPADs) the depth information can be obtained by means of direct measurement of the runtime of a traveled light pulse [3].

Contrarily, in the second type of ToF systems, those which use amplitude mod-ulated light, the distance is measured via the phase difference between the emitted (reference) and reflected signal. This ranging system normally uses frequency mod-ulated LEDs (Light Emitting Diodes) as active illumination source and CMOS/CCD sensor as demodulator and detector[4].

The effect of the SPAD technology was not broad. The low frame rate and the com-plex readout schemes limited the use of SPAD arrays. Contrarily, ToF systems which use amplitude modulated light, became already quite popular despite their limitations [5] which are going to be discussed in a later section.

Phase-Shift Measurement Principle

In Time-of-Flight systems, for achieving the measurement, the illumination LEDS are modulated at a few tens of MHz, and the CCD/CMOS imaging sensor measures the phase of the returned modulated signal at each pixel. Given the speed of light c, the depth value at each pixel is determined as a fraction of the one full cycle of the modulated signal, where the distance corresponding to one full cycle is given by:

D = c

2 fmod

(2.1)

where c is the speed of light and fmod, the modulation frequency of the emitted light

signal.

In Figure 2.2, the principle of the phase shift measurement is depicted. The re-ceived signal comes along with an offset in intensity because of the background light (the total light incident on the sensor: background plus modulated signal)[6].

The received signal is sampled four times in each wave cycle, with phase angleπ₄.

If we assume the four samples A0rec, A1rec, A2rec, A3rec, we can easily calculate the

parameters Arec, Aemand the phaseφof the returned modulated signal at each pixel so

as to measure the distance in equation (2.1)[5]. In practice, the information that every pixel holds is a combination of two parameters: amplitude (or in some cases intensity) and depth.

Arec=

√

(A0rec− A2rec)2+ (A1rec− A3rec)2

2 (2.2)

Aem=

A0rec+ A1rec+ A2rec+ A3rec

4 (2.3) φ= arctan ( A0− A2 A1− A3 ) (2.4) 24

(25)

Figure 2.2: Principle of the phase shift measurement

Thus, from Equations (2.4) and (2.1), the distance value D can be derived as follow-ing:

D = c

4πf_mod ·φ (2.5)

The Field of View (FoV) of ToF vision systems depends on the modulation fre-quency. A high modulation frequency leads to short measuring range and vice versa.

2.1.2 Specications and Characteristics

The SR4000 (Figure 2.1) is manufactured by the MESA IMAGING1_{and it belongs to}

the 4th generation of Time-of-Flight cameras designed by the company.

The SR4000 sensor, illuminates the observed scene with modulated near-Infrared light and then computes the distance in every pixel by measuring the phase shift of the returned modulated light. For every pixel in the field of view the amplitude value (or

the intensity in the case of conversion to grayscale2images and the depth information

are available. Naturally, this entails at least two output images and a full frame depth map which is used to produce the pointclouds.

With respect to the 2D imaging information that this sensor provides, the ampli-tude, the grayscale and the range image as well are illustrated in Fig. 2.3. The resolu-tion is low (176x144) and the radial distorresolu-tion of the lens is obvious in this sensor.

Apart from the 2D images, the SR4000 has the ability to output range information as well. In Fig. 2.4 the available types of depth maps in the form of pointclouds, are depicting the same scene. The pointclouds, combine both the depth information and the 2D image one. In that sense, in the first raw, the two pictures seem to be quite similar, however they are not. The reflected illumination of an object decreases with the square of the distance from the camera to the object. Therefore, the amplitude signal is much lower for more distant objects, as it is clearly illustrated in the first of the pictures Fig. 2.4a. As previously mentioned, the SR4000 comes with a property to generate a grayscale image based on the amplitude signal. Moreover, the Distance Image contains distance values for each pixel. The raw distance information as calcu-lated in equation (2.5) is depicted in Fig. 2.4c. Finally, the last picture, Fig. 2.4d, is

1_{The MESA IMAGING at: http://www.mesa-imaging.ch/index.php} 2_{The amplitude A}

rec(see equation 2.2) may additionally be used as a measure of quality of the distance

(26)

(a) Amplitude Image (b) Intensity (grayscale) Image (c) Range Image

Figure 2.3: SR4000: Image outputs

a representation of the confidence map which is available too as output information from the camera too.

The SR4000, by using its property to generate a “Grayscale Image” overcome this problem of displaying the dynamic range. This effect can be removed by multiplying the Amplitude by a factor which is proportional to the square of the distance. The result is a “Grayscale Image” (Fig. 2.3b) with similar apparent illumination of near and distant objects (near and distance objects have similar bightness). This representation of the amplitude signal is more convenient to be displayed (compared with the raw amplitude data) just because it is visually similar to an image taken from a normal camera but in much lower resolution.

Table 2.1 lists some of the basic specifications of the camera. The image resolution is low (176x144 pixel) but the fact that it produces almost normal video with full frame range measurements offers an advantage compared to other 3D imaging devices such as the lidars.

Table 2.1: SR4000 SPECIFICATIONS

General Characteristics

Resolution: 176x144

Frame rate: up to 54 fps

Color mode: Monochromatic

Connection: USB

Operating Environment: Indoor

Optical Interface

Pixel Array Size: 176x144

Pixel pitch: 40µm Angular Resolution: 0.23◦ Field of view: 43, 6◦(h) x 36, 4◦(v) Mechanical Interface Dimensions: H: 50.6 mm, W: 50.6 mm, L: 50 mm Mass: 265 g Other Modulation Frequency: 29 MHz

Non Ambiguity Range: 5.0 m

(27)

(a) Amplitude (b) Intensity (grayscale)

(c) Raw Distance (d) Confidence Map

Figure 2.4: SR4000: Available types of Outputs (Images and Depth Maps)

2.2 DFK 41F02

One of the standard sensors used on mobile robots to support vision tasks, is a dig-ital camera. A camera provides rich information about the environment. Due to the low capacity on resolution of the range-finder, it seems to be important to use a high resolution camera instead of normal resolution one. For this reason, the DFK 41F02 (Figure 2.5) is selected. It is a digital FireWire color RGB camera which employs a charge coupled device (CCD) as image sensor.

(28)

2.2.1 Specications and Characteristics

Manufactured by the Imaging Source3, it is equipped with a charge coupled device

(CCD) as image sensor and it is able to produce high resolution digital outputs with frame rate of up to 15 images/sec.

It is controlled via the FireWire bus using the standard protocol DCAM (IEEE 1394). DCAM determines the way that the image data stream is structured as well as the set of characteristics that define the camera’s performance (e.g. brightness, shutter, white balance, etc.). The adjustment of these parameters can maximize the quality of the image and thus it is important for several purposes like visualization or for image processing.

From a pure technical point of view, the color mode of the CCD sensor is based on bilinear interpolation. In practice this means that the color in each pixel is based on the mean values of neighboring pixels. It provides 256 gray levels for every pixel and alternating 256 color values for every fourth pixel. However, it allows color interpola-tion to be switched off, giving in this way the freedom to the user to simply select the color format that suits for his needs.

It is worthy to mention that this model, due to the use of a CCD chip, is sensitive to near infrared light. The DFK41F02 makes use of an IR cut filter so as to correct the predominance of red color. The following table (Table 2.2) lists some of the basic specifications of the camera.

Table 2.2: DFK 41F02 SPECIFICATIONS

General Characteristics

Resolution: 1280x960

Framerate: up to 15 fps

Color mode: RGB

Connection: FireWire (IEEE1394)

Operating Environment: Indoor & Outdoor

Optical Interface Sensor specification: 1/2 " CCD Pixel size: H: 4.65 ¸tm, V: 4.65 ¸tm Mechanical Interface Dimensions: H: 65 mm, W: 65 mm, L: 68 mm Mass: 470 g

2.3 Physical Setup and Output Relations

A binocular camera setup is employed by mounting the standard 2D RGB camera DFK 41F02 next to the ToF SR400 range-finder. Using these two cameras, a sensory system able to access 3D range data and high resolution visual outputs is set. However, in order to deploy these two diverse sensors into one working system, it is necessary to build a common reference framework. Therefore, the need to detect and understand their relations in means of functionality and output information data, appears to be crucial.

There are several physical differences between the two cameras’ outputs that need to be considered. Parameters like the image resolution, color mode, output format which reveal the relations between the sensors are important when fusing data.

3_{The Imaging Source at: http://www.theimagingsource.com}

(29)

As depicted in Fig. 2.6 the difference in resolution is great. If the color information and the physical sensor outputs are added, then a biform sensory system is considered which from one hand, exploits the ToF sensor that is able to produce monochromatic images of 176x144 pixels and on the other hand the RGB camera which produces high resolution color images of 1280x960 pixels. The diversity is significant.

In this sense, the underlying idea of fusing the data of these two sensors is to en-hance the low resolution monochromatic Time-of-Flight camera with high resolution color information in order to achieve the final goal, which is to obtain visual appealing and accurate 3D spatial representations of the indoor environment.

1280x960 176x144 Amplitude Image Colour Image 176x144 Point Cloud Sensory System DFK 41F02 Output SR4000 Output

(30)

(31)

Chapter 3

Related Work

Various studies considering range sensing have been published in literature that apply in several disciplines of interest. The past few years most of research work was related to an investigation over the principal functionalities of ToF sensors and that is natural if considering that it is a new technology. Beside the hardware and sensor calibra-tion, lately most publications focus on geometric calibration and data fusion. Several configuration setups are reported from monostatic to biform mutli-sensor approaches. Both principle paths are important for this work, thereby a brief report over the past work on the fields is introduced in the rest of this section.

ToF Calibration As shown in many previous works, past calibration approaches were centralizing their attempts towards the identification of the errors and their pa-rameters in order to gain a metric performance of ToF cameras. They were trying to model the errors as a necessary precursor to approximate in a more precise way the real world. In that sense, there are approaches reported in literature that deal with the determination of the intrinsics and extrinsic parameters but still the focus is mainly on the distance related errors which seem to be more difficult to be handled.

Lindner and Kolb [7], [8], presented a two step approach for lateral and depth calibration which essentially used uniform B-splines as a global correction function and linear function for a per-pixel adjustment so as to model the distance related error. Robbins et al. [9] also performed a two-step calibration procedure in their studies of the SwissRanger SR3000.

Kahlmann et al. [10], performed distance calibration for a single, central pixel of the range camera sensor array over a high-accuracy baseline. Additionally, they showed that the integration time can cause a systematic shift in the range measure-ments based on an investigation of the influence of internal and external tempera-ture. On a later work, Kahlmann et al. [11] focused on the geometric calibration of range imaging sensors and developed a tracker of moving people based on a recursive Bayesian filter.

Fuchs and May [12], presented a method for surface reconstruction with ToF cam-eras in which they described a two-step calibration method. Instead of a checkerboard pattern, they used a planar object to calibrate their range measurement for cyclic errors and signal delays.

Fuchs and Hirzinger [13], used a robotic arm in order to position the range camera. They also based their work on the use of higher-accuracy laser scanner measurements so as to calibrate their range measurements.

Finally, Lindner et al. [14], introduced a calibration approach which combines different demodulation techniques for the ToF camera’s reference signal. In this paper they actually proved that the combined demodulation approach cannot keep up with the B-spline approach they presented in [7], but in contrast it is independent from the number of reference images.

(32)

Image Sensor Fusion ToF sensor technology is quite recent, hence not much has been done in the field of sensor fusion. So far, there exist only a few notable ap-proaches that combine ToF distance measurements with high resolution RGB image information. In most cases there is a distinguishable difference in setups, hardware and software implementation.

Guan and Pollefeys [15], proposed a unified calibration technique for a heteroge-neous sensor network of two video camcoders and two Time-of-Flight cameras. They based their calibration on the use of a spherical target which was moved around the commonly observed scene. In this way they managed to extract the sphere centers in the captured images and to recover the geometric extrinsics for both sensors.

Prasad et al. [16], presented a procedure to increase the image resolution of the 3-D sensor by a combination of a CCD and a 3D sensor in a monocular device using an optical splitter. Due to the monochromatic setup, no special color mapping trans-formation between both image outputs was necessary as both were capturing the same view.

Guomundsson et al. [17], suggested a sensor fusion approach which employs Stereo Vision and Time-of-Flight imaging techniques. In their work, they linked the three dimensional TOF depth data to the image pairs by mapping the TOF depth mea-surements to stereo disparities.

Lindner et al. [18], proposed an algorithmic approach that combines high reso-lution RGB images and low resoreso-lution PMD distance data. The fusion is based on projective texture maps and relies on parallel Graphics Processing Units (GPU) re-sources.

Reulke [19] described another approach of fusing a 2D and 3D imaging sensors based on orthophoto generation.

Finally, Kim et al. [20], [21] proposed to utilize more than one synchronized imag-ing systems (multiple color cameras and multiple ToF sensors) to gain multi views for the reconstruction of dynamic scenes.

SR4000 With respect to the specific model related to this work that espouses the Time-of-Flight principle, the following papers are available in literature. Both previ-ous categories are considered. However, the gained information is not deep thus it is crucial to proceed in a detailed investigation of this camera performance.

Lichti et al. [22], proposed a method for the self-calibration by bundle adjustment of range imaging sensors which allows a simultaneous calibration concerning the spa-tial distortions and the ranging system.

Cang Ye et al.[23], used SIFT (Scale- Invariant Feature Transform) features on the intensity data along with the range data produced by the ToF sensor in order to estimates the camera’s ego-motion.

Chiabrando et al. [5] performed a two-step calibration procedure(photogrametric and distance calibration). By evaluating the camera’s out-of-the-box functions, they demonstrated in their work, that there is a negligible variation of the distance mea-surement precision based on the surface inclination.

The next chapters present the calibration approach which is adopted for the needs of this project. A first attempt to obtain a custom made calibration toolkit for ToF cameras as a single sensor as well as a system that enables data fusion. The proposed calibration methodology, does not require any special equipment such as a calibration baseline or a robotic arm.

(33)

Chapter 4

Errors and Error Handling

As this sensory system is highly depended on the precision of both sensors outputs, it has to be somehow assured that the the observed information is correctly produced.

Independently of the imaging sensor type, errors due to the lens or to exposure time can occur, leading to inaccurate and unreliable results. To achieve a high level of accuracy, the first step is to detect these issues which can limit the cameras per-formances. Knowing the errors, their adverse effect can be eliminated. It becomes a matter of correction through calibration or filtering techniques. These errors are going to be the main subject of discussion in this section.

4.1 Errors in Digital cameras

A camera consists of an image plane and a lens which provides a transformation be-tween the 3D world space and the 2D image space. Under a theoretical consumption, it is possible to define perfect lenses but in practice lenses introduce distortions. Thus this transformation cannot be perfectly described. Points of the real world are not cap-tured with absolute correspondence in image space, yielding some optic deficiencies, called optic aberrations. These distortions can be modeled by approximating the real relationship.

Aberrations can be categorized into two classes: monochromatic and chromatic. The monochromatic appear as a result of the geometry of the lens and they occur both when the light is reflected and refracted. This type is not associated with the cap-tured color information; it appears even when using monochromatic light. In contrary, chromatic aberrations are totally related to color. They are caused by dispersion. This means that all colors don’t come to focus in exactly the same place. The main effect is on the image sharpness (reduced) and the appearance of color fringes on bright edges. There are several lens aberrations commonly found in simple lenses (see Fig. 4.2). In this context, only the image distortion will be examined as it is the most common to be found and they affect the geometry of the image thus they play an important role in this project.

Two main lens distortions will be examined, the radial and the tangential. Radial distortion is a deficiency in straight lines transmission and it arises as a result of the shape of lens, whereas tangential distortion arises from the assembly process of the camera as a whole. In contrary to the tangential, the radial distortion, introduces in some cases a rather significant impact on the geometry of the image.

(34)

Table 4.1: OPTIC ABERRATIONS

Monochromatic Aberrations Chromatic Aberrations

◃ Defocus ◃ Axial

◃ Spherical aberration ◃ Lateral

◃ Coma ◃ Astigmatism ◃ Field curvature ◃ Image distortion

4.1.1 Image Distortion

Radial Distortion

The effect of radial distortion on images can be characterized by a positive or a nega-tive displacement. The posinega-tive displacement which is also named as pincushion dis-tortion [24], occurs when points are displaced further away from the optical axis. On the other hand, the negative displacement, also known as barrel distortion, occurs when points are moved from their correct position towards the center of the image plane. Fig. 4.1 represents the two displacement types of radial distortion and gives some intuition over the effect of this error on the image’s geometry.

Figure 4.1: Radial Distortion

In the center of the optical center, the distortion is 0. In general, the radial location of a point on the image plane can be rescaled according to the following equations [25]:

xcorrected= x(1 + k1r2+ k2r4+ k3r6) (4.1)

y_corrected= y(1 + k1r2+ k2r4+ k3r6) (4.2)

where x, y are the coordinates of a point in the world space, xcorrected, ycorrectedthe

corresponding points in the image plane after the correction and k1, k2, k3are the first

three terms of a Taylor series expansion around r=0. The coefficient k3is typically

used in highly distorted cameras such as fish-eye lenses. Finally the r2, r4and r6are

the corresponding coefficients. In this case, we want to expand the distortion function as a polynomial in the neighborhood of r = 0. This polynomial takes the general form

f (r) = a0+ a1r + a2r2+ ..., but in this case the fact that f (r) = 0 at r = 0 implies

a0= 0. Similarly, because the function must be symmetric in r, only the coefficients

of ev en powers of r will be nonzero. For these reasons, the only parameters that are 34

(35)

necessary for characterizing these radial distortions are the coefficients of r2, r4and

sometimes r6[25].

Tangential Distortion

The second main common aberration is the tangential distortion. The principal source for this type of distortion lies in the camera’s assemply. More specifically, it appears due to manufacturing imperfection (e.g. the lens not being exactly parallel to the imag-ing plane). This defect is illustrated in Fig. 4.2. The points exhibit an elliptical dis-placement as a function of location and radius [25].

Figure 4.2: Tangential Distortion

This error is expressed by two additional parameters, t1and t2, such that:

xcorrected= x + [2t1y + t2(r2+ 2x2)] (4.3)

y_corrected= y + [t1(r2+ 2x2) + 2t2x] (4.4)

Therefore, there are in total five principle distortion coefficients that should be determined in order to achieve a more accurate sensor visual output. These are sum-marized in the following Table 4.2. In a later section, the way these are used to correct the geometrical dependencies between the World space and the Image space, will be discussed (see Photogrammetric Camera Calibration).

Table 4.2: LENS DISTORTION PARAMETERS

Radial Distortion Tangential Distortion

k1 t1

k2 t2

k3

4.2 Errors in Time-of-Flight cameras

The ToF cameras, as systems, might seem to have a technological advantage but still they come with several limitations which have to be overcome in order to extract precise and reliable 3D metric information of the environment.

They use what is called an active illumination in order to capture the 3D image in its field of view. The light is reflected off of objects and sensed by a pixel array. Let us

(36)

recall the equation D =_{2 f}c

mod (Equation 2.1) which expresses the main working

prin-ciple of Time-of-Flight cameras. It calculates the distance from the observed object for a single pixel.

Just because of the fact that in a ToF camera there are both an imaging sensor array (e.g. CCD) and the active illumination source, we can primarily detect two sources of errors which from now on are going to be called “random errors” and “systematic errors”.

Within the following sections, this error model and all its parameters will be intro-duced so as to better understand the need for enhancing the working performance of such sensors.

4.2.1 Random Errors

These are the errors that are statically found when sensing with ToF cameras. In lit-erature it is also common to find them under the name “non-systematic errors”. They can be characterized as random statistical fluctuations “noise” in the 3D image output. A list of such errors and their effect follows in this section.

Signal-to-Noise Ratio

It is a measurement that quantifies the amount that the signal is corrupted by noise, usually denoted as SNR. Images with low noise, tend to have a high SNR and con-versely, images with higher level of noise, have lower SNR [25]. Typically, a ratio higher than 1:1 indicates more signal than noise.

In signal processing, there are several different noise sources such as the pho-tocharge conversion noise, the quantization noise, the photon shot noise [26] and the dark current noise. All of these noise sources are responsible for creating distortions to the measurements so that they cannot be suppressed. However they can be reduced or eliminated by different signal processing techniques and filtering.

Photocharge Conversion Noise

This is the noise added in the process of converting the light input information in an analogous signal.

Quantization Noise

When a sensed image is converted into digital form, the number of bits used to rep-resent the measurement determines the maximum possible signal-to-noise ratio. This occurs because the minimum possible noise level constitutes the error caused by the quantization of the signal, sometimes called Quantization noise. Quantization noise can be expressed as an analog error signal summed with the signal before quantiza-tion.

Photon Shot Noise

The photon shot noise, commonly called “Shot Noise” or ”Poisson Noise”, can be described as a basic physical characteristic of each light source. It is a type of electronic noise that occurs when the number of photons is small enough to give rise to several statistical fluctuations in a measurement.

The number of photons that a detector can collect is not stable, it varies and can be expressed by a Poisson distribution [26]. Shot noise is often a problem which presents small currents or light intensities but it dominates all other noise sources and hence limits the effective signal-to-noise ratio for higher illumination levels. In terms of signal-to-noise ratio, a solution for a camera is to approach the shot-noise limit. In other words, when the average number of photons is very large, the signal-to-noise ratio is very large as well. It can be seen that this type of noise becomes more evident when the number of photons collected is small.

(37)

Dark Current Noise

Electron-hole pairs are generated inside the sensor (e.g. CCD) no matter the light-ing conditions. This means even when the light is absent. The generation rate depends exponentially on temperature so that the thermally generated electrons do not have the ability to be separated from photo-generated photons. They also obey as the Photon Shot Noise, the Poisson distribution.

Multiple Reection

Electron-hole pairs are generated inside the sensor (e.g. CCD) no matter the lighting conditions. This occurs even when the light is absent. The generation rate depends exponentially on temperature so that the thermally generated electrons do not have the ability to be separated from photo-generated photons. They also obey as the Photon Shot Noise, the Poisson distribution.

This phenomenon, where the light travels by the direct and also indirect paths, is quite common when measuring objects structured in a concave set (e.g. when the scenery is set in a corner, between two walls). In such a situation, a large amount of infrared light is reflected from one surface to the other and then back to the camera. Eventually, that distance measurements are over-estimated as illustrated in Fig. 4.3. Maximum overestimation is in the region where multiple reflection path are both, maximum in number and in intensity.

Figure 4.3: Multiple Reflection

Until now there is no solution that in some degree solves this problem. Instead, in order to avoid as much as possible this multiple reflection error, the focus is pointed only to the setup of the camera. The camera should be placed away from reflective objects. In the ideal case, only the objects that have to be measured should be included in the camera’s Field of View.

Light Scattering

Range-finders are measuring simultaneously the distance from the object in each pixel. This yields that in every pixel, the intensity information of the corresponding point on the object’s surface is mapped. The problem rises due to the fact that dependencies between the pixels are not considered in this simple model [27].

Objects with weak reflected signal may have false distance data because of the interference from objects which have greater signal and which are adjacent to them. For instance, near bright objects superpose measurements from the background. This is caused by a small amount of the light from the bright object being scattered to surrounding pixels. Hence, a pixel may also contain information which belongs to neighboring pixels, leading to incorrect amplitude values and consequently invalid distance measurements.

(38)

A possible solution to avoid the effects of this phenomenon, is to reposition the camera or to threshold the amplitude.

Jump Edges

In point clouds, the structure of the scene usually includes a type of error that is com-monly found as “jump edges”.

Jump edges are defined as discontinuities in depth values. They appear as smooth transitions between the shapes. Such edges “separate” one object from the other. They can be eliminated by filtering in depth.

4.2.2 Systematic Errors

Systematic errors appear as offsets from the true value of a parameter. When they are detected and known, a repairing process can be applied to eliminate their effect. High-accuracy corrections are usually accompanied by an analysis of the sources of error. A list of several common types of such errors is presented following in this section. Circular Distance Error

Circular Distance error or as commonly used “Wiggling effect”, is a systematic error resulting from the non-harmonic correlation functions. Measuring phase shifts assume perfectly sinusoidal signals (harmonic), yet in reality a variation due to the modulation of the LED signal (optical and reference signal) provokes a phase delay and a periodic depth error respectively.

The wiggling effect, increases with the discrepancy between the reference and the real signal. This distance related error can be corrected, if the response of the LED is known. Given than the LED signal is reproducible, the error is deterministic and thus it can be compensated by a look up table [13].

Amplitude Related Error

Due to nonlinearities of the semiconductor and in account of imperfect separation properties a different number of incident photons at a constant distance causes dif-ferent distance measurements. Furthermore, this amplitude-related error changes also with the distance [13],[28].

Latency Related Error

The phase-delay originates partially from latencies on the sensor due to signal prop-agation delays and semiconductor’s properties. Additionally, considering that the re-mitted and ere-mitted signals are directly correlated on the sensor array (pixels), it is important to consider different latencies for every pixel [28].

Fixed Pattern Noise

This error is related to latencies and material properties in each sensor. As effect, it is very similar to fixed pattern noise found in 2D cameras. In ToF, this effect stands due to the fact that each pixel has a capacity: the reference signal needed for on-chip correlation is delayed by the charge time of the capacitor in each pixel. This delay is equivalent to shifting the reference signal by another small phase sift, which results in shifting the whole correlation function by this value [29].

Since the pattern doesn’t depend on exposure time or amplitude, this effect can be eliminated easily with all the known techniques for removing fixed pattern noise in 2D cameras.

(39)

Non-Ambiguity Range Error

Also known as ”back-folding effect”, this error appears due to the periodicity of the signal that is used for the depth measurement. If objects in the field of view could be present in the scene at distances with difference bigger than the distance corresponding to a full modulation period, then the measurement of their position is considered to be ambiguous.

When the background is unlimited, the folding back effect of far and bright ob-jects might cause errors. Based on the assumption that far obob-jects (standing on the background) have significantly less reflective properties than the measured objects in the front, the solution is to filter those values out by setting an amplitude threshold.

(40)

Table 4.3: ToF ERRORS AND ERROR HANDLING

RANDOM ERRORS SYSTEMATIC ERRORS

1. Bad Signal-to-Noise Ratio 1. Circular Distance Error

Creates distortion to the measurements The asymmetric NIR-LED signal

so that they cannot be suppressed. response, causes a non-harmonic

Solution: sinusoidal illumination of the scene.

◃ Increase the exposure time and Solution:

amplify the illumination respectively. Given that the LED signal is known,

◃ Filter out low amplitudes. thus reproducible, the error could

be compensated by a look-up-table.

2. Multiple Reflection 2. Amplitude Related Error

The light travels by direct and indirect Due to non-linearities of the

paths (multi-path) so that the distance semiconductor and its imperfect

measurements are over-estimated. separation properties, objects with

Occurs also in concave structured different reflectivity properties

objects like corners or hollows making positioned at a constant distance,

them look round. provide different depth values.

Solution: Solution:

N/A - Consider the physical setup. Amplitude filtering

3. Light Scattering (close objects) 3. Latency Related Error

Influences the depth measurements of Latencies on the sensor propagates

pixels which belong to far objects. a phase shift due to the delays in

Neat bright objects, superpose signal-propagation and the

measurements from the background. semiconductor’s properties.

Solution: Solution:

N/A Since the emitted and remitted

signals are correlated to the sensor’s array, consider different latencies for each pixel.

4. Jump Edges 4. Fixed Pattern Noise

Multimodal measurements which appear Error related to latencies and

as smooth point transitions between the material properties.Similar to the

objects edges. fixed pattern noise found in 2D

Solution: cameras

Filtering Solution:

Filtering

5. Non-ambiguity range

When the background is not limited, the folding back of far and bright objects might cause problems Solution:

Amplitude Thresholding

(41)

Chapter 5

System Calibration and

Error Handling

The overall performance of the sensory system is highly affiliated by the occurrence of several factors. Essentially, all measurements are subject to various sources of er-rors, exhibiting random or systematic fluctuations as it was discussed in the previous chapter. One common technique to eliminate such issues is to calibrate the sensors [16].

By definition, calibration is the process of determining the performance parame-ters of an artifact, instrument, or system by comparing it with measurement standards. Adjustment may be a part of a calibration, but not necessarily. A calibration assures that a device or a system will produce results which meet or exceed some defined criteria with a specified degree of confidence [30].

The goal of this chapter is to introduce and give an insight into the calibration model which was used for the Time-of-Flight camera accompanied by the RGB imag-ing sensor. An overview of the methodology will be described in a later section. Then, each calibration procedure will be discussed before presenting the overall results.

5.1 System's Calibration Process

The considered system is composed of two cameras (multi-camera system); a low res-olution ToF camera and a high resres-olution RGB standard camera. Due to the great num-ber of the influencing parameters that should be explored, the calibration procedure is divided into two basic steps: a single camera calibration and a system calibration. In a more detailed view, based on the fact that the overall procedure involves ampli-tude information, raw distance data and pixel coordinates, it can be expressed through three calibration types; photogrammetric, distance and stereo vision calibration. The schema in Fig. 5.1 represents an overview of the calibration approach.

As a first step, each single sensor should be calibrated by itself. For standard cam-eras the pinhole camera model extended with some corrections for the systematically distorted image coordinates is adopted in order to determine the internal geometric and optical characteristics, also known as intrinsic parameters of an optical sensor (photogrammetric calibration).

The pinhole model is based on the collinearity principle, where each point in the world space, is projected by a straight line through the projection center into the im-age plane [25]. Therefore, despite the nonidentical functional characteristics with the standard camera, the same model is adopted for the ToF camera too, based on the sim-ilarity of their projection processes. However, this doesn’t mean that when referring to single camera calibration the same process for both sensors is implied. The calibration of the ToF camera follows a more articulated line. Apart from the photogrammetric

(42)

Photogrammetric Calibration Intrinsic Parameters Radial / Tangential Distortion Compensation Stereo Vision Calibration Extrinsic Parameters Photogrammetric Calibration Intrinsic Parameters Radial / Tangential Distortion Compensation Distance Calibration Random Error Correction Systematic Error Correction Step 1

Single Camera Calibration

Step 2

System Calibration

Figure 5.1: Overview of the calibration procedure

calibration, it needs to compensate the random and systematic fluctuations that affect the distance information.

Once the two single cameras calibration is fulfilled, the need to create a common reference frame for the cameras, so as to function as a system arises. This relation between the two sensors can be expressed through the extrinsic camera parameters. The extrinsic parameters, hold information such as the camera’s relative 3D position and orientation in space (rotation and translation).

The second step of calibration can give access to this relation through a stereo vision approach. Making use of a binocular setup and mounting the two cameras side by side on top of a tripod, a stereo pair can be determined. Then it is easy to follow a stereo calibration and rectification procedure in order to compute the geometrical relationship between the two cameras in space and then with rectification to achieve a nice row alignment of the two imaging planes.

With the completion of the second step, the overall calibration of the sensory sys-tem is concluded. The sensors are then expected to perform with precision and accu-racy, both individually but as a system too. Let’s not forget that the sensor fusion by means of information exchange and of course correlation, is very much depended on the quality of the overall calibration. The better the sensors are calibrated, the better sensor fusion can be attained.

It is important to mention here that all the required intrinsic parameters, distortion coefficients as well as extrinsic parameters, have been computed using Intel’s com-puter vision library OpenCV [25].

5.2 Photogrammetric Camera Calibration

The principal concept underlying this calibration type is to target the camera towards a known structure which has many evident and individual points (e.g. pattern). By view-ing this structure through a variety of angles and distances, it is possible to calculate the geometrical location of and orientation of the camera on each captured frame, as well as the intrinsic parameters. Mainly four sets of parameters have to be determined:

1. horizontal and vertical focal lengths ( fx, fy)

2. image center of projection, known as principle point (cx, cy)

3. radial distortion parameters (k1, k2, k3)

(43)

4. tangential distortion parameters (t1,t2) -when the centers of curvature of lens

surfaces are not collinear

5.2.1 Calibration Dataset

The calibration object was set to be a chessboard, a pattern of black and white squares. The dimension of the board is 1.5x1.0 meters with 6x8 inner corners.

Several sets of images were used for the calibration. One such dataset is com-posed of fourteen chessboard pictured in several rotational and translational positions as depicted in Fig. 5.2.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 5.2: Time-of-Flight and Standard RGB camera Calibration Datasets

SR4000

Because of the image low resolution the calibration algorithm could not detect the chessboards corners properly thus the output images were not as expected. Therefore a scaling over the amplitude image was necessary. After testing and in order to get the best calibration result, the appropriate scaling factor was set to x3. The Fig. 5.3 depicts the first attempt to calibrate the sensor using raw data in pictures (5.3a) and (5.3b) as well as the final scaled undistorted output when scaling the grayscale image x3 in pictures (5.3c) and (5.3d).

(44)

(a) Raw image with corner detection (b) Incorrect Undistorted Output Image

(c) Scaled image with corner detection (d) Scaled Undistorted Output Image

Figure 5.3: SR4000 Distortion Correction

The relevant calibration parameters are listed in the following table (Table 5.1) Table 5.1: ToF INTRINSIC PARAMETERS

Coefficient Value

focal length fx 719.206

fy 716.542

principle point cx 255.700

cy 226.797

radial distortion parameters k1 -0.953

k2 2.364

k3 -7.761

tangential distortion parameters t1 -0.011

t2 0.009

DFK 41F02

For this camera, the procedure was straight forward. Running the photogrammetric calibration algorithm, the expected output was achieved. Fig. 5.4 illustrated the cali-bration results.

The RGB standard camera’s calibration parameters are listed in the following table (Table 5.2).

(45)

(a) Raw image with corner detection (b) Undistorted Output Image

Figure 5.4: DFK 41F02 Distortion Correction Table 5.2: DFK 41F02 INTRINSIC PARAMETERS

Coefficient Value

focal length fx 629.536

fy 628.038

principle point cx 263.998

cy 226.533

radial distortion parameters k1 -0.385

k2 2.655

k3 -13.247

tangential distortion parameters t1 0.004

t2 -0.002

5.3 Distance Calibration

This section handles the calibration of the Time-of-Flight camera, over the distance measurements. Despite that the photogrammetric calibration reduces or even removes possible aberrations and fluctuations of the data in the two dimensions, the third di-mension remains almost intact. Random and systematic errors are depended on the distance measurement thus the distance needs to be also calibrated.

The calibration method mainly focuses on depth correction parameters. The fol-lowing indicated calibration results are part of the investigation over the accuracy of this specific sensor, which primarily follows the findings covered in literature. These results will be evaluated in the next chapter. During the calibration, it was evident that some of the systematic errors were not present or the camera already handles them by giving the ability to the user to use a variety of functions [6].

5.3.1 Ambient Light

As the camera is used for an indoor application, automatically, the ambient light from sources like the sun is limited. The calibration procedure showed though that the cam-era’s performance can be affected even with variations of the lighting conditions dur-ing the day time.

It was noticed that when experiments where performed during the night, period in which the only light that illuminates the scene is an artificial one, hence stable, the stability of the ToF was increased.

(46)

(a) Grayscale pointcloud affected by daylight (b) Grayscale pointcloud during the night

(c) Grayscale pointcloud affected partially by sunlight

(d) Grayscale pointcloud calibrated

Figure 5.5: Effect of ambient light

5.3.2 Amplitude Thresholding

Amplitude thresholding seems to be an error correction technique which can cover several types of deflection. Thresholding can discard primarily data resulting from objects with higher distance or with low reflectivity. It can be also considered in the case where objects are located at the peripheral area of the measurement scene and they are able to interfere in camera’s performance [31].

The SR4000 comes along with a confidence map. The confidence map is calcu-lated using the Amplitude of the moducalcu-lated signal, the distance data, and their spatial and temporal variation [6]. It represents a measure of probability or ”confidence”that the distance measurement for each pixel is correct. With respect to that it filters out in-correct points but of course this has an effect to the number of points that can be used for the representations of the captured scene. If it is not properly used, the available output data can be very much decreased which means less information. In this setup, the confidence map was not used whereas it was preferred to threshold the incoming amplitude signal such that even the “back-folding”phenomenon to be at some level controlled.

5.3.3 Light Scattering

In SR4000, the user has the choice to use a conversion to grayscale module which converts the raw amplitude into a value which is independent of distance and position in the image array. The Grayscale image is produced by multiplying the Amplitude data by a factor which is proportional to the square of the distance. The result is that near and distance objects have similar brightness and the scattering effect is limited in some degree.

(47)

5.3.4 Jump Edges

Jump edges is an error type which involves multimodal measurements which appear as smooth point transitions between the objects edges (when the transition from one to another object appears disconnected due to occlusions) as it can be seen in Fig. 5.7. One characteristic to consider when thinking about object or shape separation is that the distance between the shapes exhibits a sudden alteration, and this is exactly what the ToF camera cannot detect. In order to correct this error, an approach based on Fuchs et al.[12], [28] was adopted. The methodology can be described as following:

Assuming a set of 3D points P ={pi | pi∈ R3, i = 1, ..., Np

}

, we can easily

com-pute by means of triangulation the angles and distances that every point picreates with

each of its eight adjacent points Pn={pi,n| n = 1,...,8} for every pi(see Fig. 5.6).

Figure 5.6: Jump edge filter main functionality The jump edges can be eliminated in a double check procedure:

1. Ifξi,nis the angle of the triangle spanned by the the points pi,pi, n and the focal

point f = 0, the jump edges can be filtered by comparing the angle to a threshold value set by the user.

ξi= maxarcsin ( ∥pi,n∥ ∥pi,n− pi∥· sinφ ) (5.1)

whereφis the apex angle between two neighboring pixels.

JE ={pi|ξi>ξthreshold} (5.2)

2. If the distance between the magnitude∥pipi,n∥ = d is more than a predefined

by the user threshold then this point is filtered out.

This effect of the error can be seen in Fig. 5.7a, along with the filtering results in Fig. 5.7b.

(48)

(a) Raw Amplitude Pointcloud (b) Corrected Pointcloud

Figure 5.7: Jump Edges Correction

5.3.5 Depth Renement and Noise Reduction

As already mentioned, the ToF cameras produce noisy outputs. Smooth surfaces ap-pear to be rough and thick.

Therefore, it was thought to be reasonable to apply a smoothing filter over the range image and later on to convert this output back to the radial coordinate system so that the new values are available for use in the pointcloud. Such an example is depicted in Fig. 5.8. Several smoothing filters were applied:

1. Gaussian Filter (3x3) 2. Median Filter (3x3)

3. Bilateral Filter (3, 3, 20, 5) [32]

Each one of these will be later on evaluated compared to the raw depth informa-tion. However, based on a preliminary visual estimation it is evident that all three filters affect positively in some degree the way that the points are produced. The sets of points that represent straight surfaces are well structured with and the noise level is decreased.

Furthermore, according to the manufacturer, the SR4000 offers by default other refinement modules such as a hardware-implemented noise filter. This filter is by de-fault on and it combines amplitude and distance information in a 5x5 neighborhood so as to reduce the noise but the most important is that it preserves details such as edges or small structures.

The results will be evaluated with respect to their accuracy and precision to the overall task in the next chapter. For the moment a first outcome is based on a visual estimation.

5.4 Stereo Calibration

As the single camera calibration is complete, the next step is to find a way to relate the ToF camera (right) to the RGB standard camera (left). For this purpose the two cameras are considered as a stereo pair and thus a stereo calibration approach is em-ployed.

Stereo vision refers to a methodology which used multiple images in order to extract 3D information and reconstruct a scene. Features in two (or more) images, each acquired from a different viewpoint in space, are matched based the corresponding features on both image outputs. Moreover the differences can be analyzed to yield a depth information in the form of a disparity map, which is inversely proportional to the distance to the object [25] but this is not of use in this work.

(49)

(a) Raw Pointcloud (b) Median Filter Pointcloud

(c) Gaussian Filter Pointcloud (d) Bilateral Filter Pointcloud

Figure 5.8: Depth Filtering

5.4.1 Stereo Vision

The principle task of this procedure is to find correspondences between points that are seen by one imaging sensor and the relevant points as seen by another [26].

In practice a binocular stereo vision typically involves four steps towards the cal-culation of the systems extrinsic parameters as depicted in Fig. 5.9 (we do not consider the reprojection step that outputs a depth map):

1. Calculation of intrinsic camera parameters 2. Distortion compensation (radial, tangential) 3. Image rectification

4. Image correspondence

The first two steps are already discussed in the previous sections. The Rectification step deals with the adjustment of the rotation and translation between the two cam-eras so that their image planes become row-aligned. The final step, a process known as correspondence, is responsible of matching the same features in the two cameras image planes.

To accomplish this task, the standard OpenCv Stereo Calibration is used, deploy-ing the Bouguets algorithm for the step of stereo rectification [25]. The procedure is simple and straight forward. However, the difference of the two cameras outputs is a limitation. As already mentioned, the low resolution ToF images do not output nice calibration results. Therefore, a new image size for both the cameras was adopted. The ToF images were upscaled three times whereas the RGB images were cropped and downscaled to match in size (525x432).

Once again fourteen pairs of chessboard views were used to run the stereo cali-bration. The intrinsic parameters from the single camera calibration were loaded to the Bouguet’s algorithm in order to calculate the Rotation and Translation matrices (R, T ). These two matrices are very important because through them the relation of these two sensors can be estimated.

The system’s extrinsic parameters are listed in the following table (Table 5.3). The values seem to be correct with respect to the baseline and the two cameras positioning. Fig. 5.10 presents the output result of a pair from the dataset, when stereo calibration and rectification is applied. After rectification the epipolar lines are printed on the two images as a group of yellow rows which depict the matching location in

(50)

Figure 5.9: The four step procedure towards Stereo Vision

Figure 5.10: Left and Right Images, after rectification (with horizontal epipolar lines)

the right and left images.The red rectangles are the two regions of interest (roi1 and roi2) which contain only the valid pixels. The error for the epipolar lines is less than a pixel, 0.426345 to be precise. This is dispelled over all the images of the dataset used for the calibration.

5.5 Discussion

The goal of this chapter was to investigate and give an insight into the calibration of the sensory system. With respect to the digital cameras, one could argue that the problems are more or less solved. Contrarily, when discussing about 3D vision, there are several issues, even common errors, which are still unsolved and they have become a major topic of research interest.

The calibration procedure has set the necessary prerequisites for an accurate and effective sensor fusion. The next remaining step involves the evaluation of the results with respect to the system’s accuracy and precision.

Evaluation of a Bisensor System for 3D Modeling of Indoor Environments

International Master's Thesis

Evaluation of a Bisensor System for 3D Modeling of Indoor Environments

Athanasia Louloudi

Technology

Studies from the Department of Technology

at Örebro University

Athanasia Louloudi

Evaluation of a Bisensor System for 3D Modeling of

Indoor Environments

© Athanasia Louloudi, 2010

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1 Motivation

1.2 Contributions

1.3 Thesis Outline

Chapter 2

Sensory System Description

2.1 SR4000

2.1.1 Time-of-Flight

2.1.2 Specications and Characteristics

2.2 DFK 41F02

2.2.1 Specications and Characteristics

2.3 Physical Setup and Output Relations

Chapter 3

Related Work

Chapter 4

Errors and Error Handling

4.1 Errors in Digital cameras

4.1.1 Image Distortion

4.2 Errors in Time-of-Flight cameras

4.2.1 Random Errors

4.2.2 Systematic Errors

Chapter 5

System Calibration and

Error Handling

5.1 System's Calibration Process

5.2 Photogrammetric Camera Calibration

5.2.1 Calibration Dataset

5.3 Distance Calibration

5.3.1 Ambient Light

5.3.2 Amplitude Thresholding

5.3.3 Light Scattering

5.3.4 Jump Edges

5.3.5 Depth Renement and Noise Reduction

5.4 Stereo Calibration

5.4.1 Stereo Vision

5.5 Discussion

2.1.2 Specications and Characteristics

2.2.1 Specications and Characteristics

5.3.5 Depth Renement and Noise Reduction