Full frame 3D snapshot : Possibilities and limitations of 3D image acquisition without scanning

(1)

Full frame 3D snapshot

Possibilities and limitations of 3D

image acquisition without scanning

Examensarbete utfört i Datorteknik

av

Björn Möller

LITH-ISY-EX--05/3734--SE

Linköping 2005

(2)

(3)

Full frame 3D snapshot

Possibilities and limitations of 3D

image acquisition without scanning

Examensarbete utfört i Datorteknik

vid Linköpings tekniska högskola

av

Björn Möller

LITH-ISY-EX--05/3734--SE

Handledare: Mattias Johannesson, Henrik Turbell

Examinator: Dake Liu

(4)

(5)

Avdelning, Institution Division, Department Institutionen för systemteknik 581 83 LINKÖPING Datum Date 2005-03-24 Språk

Language RapporttypReport category ISBN

Svenska/Swedish X Engelska/English

Licentiatavhandling

X Examensarbete ISRN LITH-ISY-EX--05/3734--SE

C-uppsats

D-uppsats Serietitel och serienummer _{Title of series, numbering} ISSN

Övrig rapport

____

URL för elektronisk version

http://www.ep.liu.se/exjobb/isy/2005/3734/

Titel

Title

Helbilds 3D-avbildning

Full frame 3D snapshot - Possibilities and limitations of 3D image acquisition without scanning Författare Author Björn Möller Sammanfattning Abstract

An investigation was initiated, targeting snapshot 3D image sensors, with the objective to match the speed and resolution of a scanning sheet-of-light system, without using a scanning motion. The goal was a system capable of acquiring 25 snapshot images per second from a quadratic scene with a side from 50 mm to 1000 mm, sampled in 512×512 height measurement points, and with a depth resolution of 1 µm and beyond. A wide search of information about existing 3D measurement techniques resulted in a list of possible schemes, each presented with its advantages and disadvantages. No single scheme proved successful in meeting all the requirements. Pulse modulated time-of-flight is the only scheme capable of depth imaging by using only one exposure. However, a resolution of 1 µm corresponds to a pulse edge detection accuracy of 6.67 fs when visible light or other electromagnetic waves are used. Sequentially coded light projections require a logarithmic number of exposures. By projecting several patterns at the same time, using for instance light of different colours, the required number of exposures is reduced even further. The patterns are, however, not as well focused as a laser sheet-of-light can be. Using powerful architectural concepts such as matrix array picture processing (MAPP) and near-sensor image processing (NSIP) a sensor proposal was presented, designed to give as much support as possible to a large number of 3D imaging schemes. It allows for delayed decisions about details in the future implementation. It is necessary to relax at least one of the demands for this project in order to realise a working 3D imaging scheme using concurrent technology. One of the candidates for relaxation is the most obvious demand of snapshot behaviour. Furthermore, there are a number of decisions to make before designing an actual system using the recommendations presented in this thesis. The ongoing development of electronics, optics, and imaging schemes might be able to meet the 3D snapshot demands in a near future. The details of light sensing electronics must be carefully evaluated and the optical components such as lenses, projectors, and fibres should be studied in detail.

Nyckelord

Keyword

Snapshot 3D imaging, depth measurement, triangulation, interferometry, time-of-flight, CMOS image sensors

(6)

(7)

Abstract

An investigation was initiated, targeting snapshot 3D image sensors, with the objective to match the speed and resolution of a scanning sheet-of-light system, without using a scanning motion. The goal was a system capable of acquiring 25 snapshot images per second from a quadratic scene with a side from 50 mm to 1000 mm, sampled in 512×512 height measurement points, and with a depth resolution of 1µm and beyond.

A wide search of information about existing 3D measurement techniques resulted in a list of possible schemes, each presented with its advantages and disadvantages. No single scheme proved successful in meeting all the requirements. Pulse modulated time-of-flight is the only scheme capable of depth imaging by using only one exposure. However, a resolution of 1µm corresponds to a pulse edge detection accuracy of 6.67 fs when

visible light or other electromagnetic waves are used. Sequentially coded light projections require a logarithmic number of exposures. By projecting several patterns at the same time, using for instance light of different colours, the required number of exposures is reduced even further. The patterns are, however, not as well focused as a laser sheet-of-light can be.

Using powerful architectural concepts such as matrix array picture processing (MAPP) and near-sensor image processing (NSIP) a sensor proposal was presented, designed to give as much support as possible to a large number of 3D imaging schemes. It allows for delayed decisions about details in the future implementation.

It is necessary to relax at least one of the demands for this project in order to rea-lise a working 3D imaging scheme using concurrent technology. One of the candidates for relaxation is the most obvious demand of snapshot behaviour.Furthermore, there are a number of decisions to make before designing an actual system using the recommen-dations presented in this thesis. The ongoing development of electronics, optics, and imaging schemes might be able to meet the 3D snapshot demands in a near future. The details of light sensing electronics must be carefully evaluated and the optical components such as lenses, projectors, and fibres should be studied in detail.

(8)

(9)

Acknowledgement

I would like to acknowledge the following people for helping me forward on this bumpy road.

• Thanks to professor Dake Liu who allocated time to examine this work.

• My supervisors at SICK IVP, dr Mattias Johannesson and dr Henrik Turbell are acknowledged for their valuable guidance through my first fumbling and wavering steps in this vast jungle.

• SICK IVP and dr Mats Gökstorp made the project possible in the first place.

• Johan Melander, manager of the sensor design at SICK IVP, always supported me and protected the time given to me for this project.

• Thanks to my opponent Robert Johansson for his support on the tools front, for his deep knowledge in languages and CMOS sensors, and for being extremely patient with me.

• Thanks to dr Annika Rantzer, Håkan Thorngren, Leif Lindgren, and everybody else at SICK IVP for valuable input during discussions at the coffee table.

• Thanks to Frank Blöhbaum at SICK AG for valuable input on the PMD sensor and ToF issues.

(10)

(11)

List of figures

1.1 The coordinate system X×Y×Z. . . 2

1.2 An example of Graycode. . . 3

3.1 Complexity of communication between pixels. . . 16

3.2 Separate DFT processor. . . 16

4.1 Overview of 3D imaging schemes. . . 23

4.2 The principles of triangulation. . . 24

4.3 The triangulation schemes. . . 24

4.4 The principles of focusing. . . 25

4.5 Stereo depth perception. . . 26

4.6 Stereo camera with two image sensors. . . 27

4.7 Interferometric schemes. . . 27

4.8 A typical interferometry setup. . . 28

4.9 Time-of-flight schemes. . . 29

5.1 The principles of beam-of-light triangulation. . . 31

5.2 The principles of sheet-of-light profiling. . . 32

5.3 Point spreading due to image defocus. . . 35

5.4 The principles of Moiré. . . 39

5.5 A Graycoded light sequence. . . 41

5.6 Typical colour response using a) photon absorption depth and b) filter. . . 43

5.7 A single-chip stereo sensor. . . 47

5.8 Impact from angular inaccuracies dependent on separation. . . 47

5.9 The principles of motion parallax. . . 49

5.10 A fibre optic speckle interferometer. . . 54

5.11 Edge deterioration. . . 55

5.12 Extrapolation of time-of-flight from two integration times. . . 56

5.13 Time-of-flight from interrupted integration. . . 57

5.14 Autocorrelation of a pseudo-noise sequence. . . 60

7.1 Simplified MAPP column schematics. . . 68 xv

(16)

7.2 Simplified NSIP datapath pixel schematics. . . 68

7.3 A shutter pixel by Johansson. . . 69

7.4 A demodulator pixel by Kimachi and Ando. . . 70

7.5 Off-sensor program execution controller and sensor instruction cache. . . 75

7.6 Sensor integrated program execution controller and off-chip sensor in-struction cache. . . 76

7.7 Off-camera communication control. . . 76

7.8 A simple integer-to-float converter. . . 78

7.9 A serial/parallel integer multiplier. . . 79

7.10 A serial/parallel division circuit. . . 80

8.1 An active pixel with several analogue memories. . . 86

8.2 An ALU proposal using specialized units. . . 88

8.3 An ALU proposal using programmable units. . . 89

9.1 FPGA and LUT in the data path. . . 91

A.1 An example NMOS NOR ROM 6×8 b cell schematic. . . 101

A.2 An example NMOS NOR ROM 6×8 b cell layout. . . 103

A.3 An example 2-input NOR cell layout. . . 103

A.4 An example 2-input NOR cell schematic. . . 104

A.5 A 2 b binary to 4 b one-hot predecoder. . . 104

(17)

List of tables

1.1 Abbreviations used in this thesis. . . 5

6.1 Summary of measurement scheme properties. . . 64

B.1 Some stripe boundary code results. . . 108

B.2 A stripe boundary code example. . . 108

(18)

(19)

CHAPTER 1 Definitions and abbreviations

1.1 Concept definitions

Some concepts used in this document may differ from the Reader’s interpretation. They are listed below in order to clarify the author’s intentions.

• An image containing a two-dimensional (2D) array of depth values is considered to be a three-dimensional (3D) image. A height image or depth map z(x, y) is in its true sense not a 3D image but rather two-dimensional information, or 2½D (see for example Justen [1] pp. 17–18), depending on the beholder’s conception. Images not containing reflectance data for all points within a volume would by some be considered as 2D or possibly two-and-a-half-dimensional (2½D) images, but in this text they are called 3D images. In this document there is no distinction between ‘different levels of 3D’ and because the concept of 2½D is ambiguously defined in some texts it will not be used here.

• What is in this text called an absolute height measurement result is a value of the height relative to placement of the sensor.1 A relative measure on the other hand is for instance ‘two microns above the nearest fringe to my left’, which implies the need for integration or summation of a number of measurements in order to acquire measurements relative to a coordinate system fixed on the object.

• Active image acquisition schemes, like Graycoded light sequence techniques

descri-bed in section 5.7, depend not only on the presence of a general light source but on a specific type of light source, such as an image projector, in order to work. Whe-reas, a passive scheme like stereo vision, described in section 5.8, works in normal daylight or using an arbitrary light bulb (see for example Johannesson [2] p. I-3). A passive scheme enhanced by projection of light patterns is still categorised as a passive scheme even though active lighting is used. An active scheme on the other hand does not work properly without the active lighting so there should be no ambiguities as to whether or not it is an active scheme. Bradshaw [3] among 1_{All things are relative. Absolute means relative to the origin of a coordinate system.}

(20)

others defines active schemes as such that need to know the pattern of projected light. Accordingly, passive schemes may use projected patterns without knowing them beforehand. This definition is very similar to the one used here. For instance: stereo schemes using images from multiple viewpoints and projected (temporally stationary) patterns are in both systems considered as passive. There might, ho-wever, be occasions when the two definitions collide.

• According to Stallings [4], the architectural aspects of a computer system are those that affect the programmer’s model. Design decisions that only affect the latency of an instruction or area efficiency of the hardware are instead called organizational aspects. However, in this text all discussions about computational hardware include the unseen properties as the key feature, and software programming aspects become of less importance.

Image sensor

Sheet laser source

Conveyor belt Scanned object

X Y

Z

Figure 1.1: The coordinate system X×Y×Z.

• Two units are used to represent the size of digital information in the thesis. The smaller unit 1 b is one bit, that is, a two-valued information. The larger unit 1 B ≡ 8 b is a byte, which can hold one of 28= 256 values.

• _{The coordinate system X × Y × Z seen in figure 1.1 is an orthogonal base defined}

from an imaginary scanning sheet-of-light rig (see section 5.2) as follows:

X The X coordinate is the one measuring the width of the conveyor belt. It also corresponds to enumeration of columns in the sensor and the resulting image or range profile.

Y A height difference in the surface of an object corresponds to a shift in the

Y direction of a light stripe caused by intersection between the laser sheet

and the object surface. The Y coordinate axis describes row positions in the sensor and in the resulting image. It is parallel to the direction of motion for the conveyor belt.

Z The sought height of a scanned object is measured in the Z direction, which is parallel to the axis through the sensor optics in the fixed-geometry

(21)

measu-1.1. Concept definitions 3

rement setup shown in figure 1.1. The Z coordinate is normal to the conveyor belt.

1stexposure

6thexposure

Figure 1.2: An example of Graycode.

• Throughout this document the word Gray always relates to Graycode, also called mirror code since the sequence for a bit of weight 2n is mirrored or folded after 2n+1 sequential values. In contrast, a straight-forward binary coded sequence is instead repeated after 2n+1 sequential values. The main advantage of Graycode to its conventional alternative is that the difference between two successive values always constitutes a change in exactly one position. Figure 1.2 illustrates a set of Graycoded light volume projections.

The word grey is British spelling of a nuance somewhere on a straight line between black and white. Grey image data is usually thought of as binary coded values representing light intensity, and grey values may be but are usually not Graycoded.

• The scalable length unitλ is taken from Kang and Leblebici [5]. It is used in

sca-lable complementary metal oxide semiconductor (CMOS) design rules and relates all lengths in a layout to half the minimum drawn gate length allowed, in a cer-tain process. The examples in this document use MOSIS scalable CMOS design rules revision 7.2 with additional π/4 rad angles. These rules do not hold in deep-submicron processes but are good enough for area approximations. For example, in a 0.8µm process 1 λ translates to 0.4 µm.

• Figures A.2 and A.3 show example CMOS layouts. For simplicity no substrate or

well contacts are explicitly shown and n- or p-type select layers are not drawn. In

figure A.2 all diffusions are of n-type and in figure A.3 all p-type channel metal

oxide semiconductor (PMOS) transistors are drawn within an n-well.

• Everywhere it occurs in the text, the phrase visible light represents electromagnetic radiation residing roughly in the wavelength span λ = c_ν _{∈ [400 nm, 700 nm],} corre-sponding approximately to frequencies in the interval ν = c_λ _{∈ [430 THz, 750 THz].} When light is said to be infrared in this document, mainly wavelengths in the inter-val λ ∈ ]700 nm, 1100 nm] are intended. Correspondingly, ultraviolet light exhibits wavelengths shorter than 400 nm. This categorisation is a combination between hu-man colour perception and sensitivity of a silicon based electrooptical device such as the CMOS sensor.

• The abbreviations MAPP and NSIP describe certain types of column parallel and pixel parallel image processing architectures, respectively. Both are architectures

(22)

fulfilling the criteria for near-sensor processing described later. MAPP and NSIP are described in more detail in subsections 7.1.1 and 7.1.2, respectively.

• In this document the expression near-sensor processing is used to describe data processing directly within an image sensor array or with at least one-dimensional2 communication channels between the sensor elements and the data processor. Near-sensor processing is not necessarily synonymous with applying the NSIP archi-tecture.3

• Phase-shifting, phase-stepping, fringe-shifting, and fringe-stepping are four

diffe-rent expressions describing spatial phase movement of a structured light projection of some sort, all indicating different step sizes. They are not treated separately in this text, instead when step size is critical it is mentioned by value.

• In the following text where the term smart sensor is used it refers to the definition stated by Johannesson in [2] p. I-8:

“A smart sensor is a design in which data reduction and/or processing is performed on the sensor chip.”

Or, in a more general form, within the sensor hardware. For a more in-depth study of smart image sensors see for instance Åström [6].

• The terms snap behaviour and snapshot images are in this document used for describing an event that occurs during a period of time that is small enough so that the action can be treated as instantaneous. The concept is therefore relative to issues such as motion. A snapshot image of a scenery is an image taken with so short integration time that features in the image are not blurred significantly by motion in the scenery, as described by the resolution demands for the image.

• Laser speckle is temporally static ‘noise’ in the image that arises when coherent

light hits optically rough surfaces. It can be thought of as an interferometric pattern of bright and dark areas. For more in-depth information regarding speckle and its countermeasures, refer to for example Gabor [7].

• A temporal or sequential code is a pattern that changes in time. The Graycoded

light sequence described in section 5.7 is a good example of such a code.

A boundary code is a pattern that occurs between two different patterns, such as the line between dark and bright areas in the Graycoded light sequence described in section 5.7.

Consequently, a code that occurs on the borderline between two temporally chan-ging patterns of light is called a temporal boundary code.

2_{That is, column or row parallel.}

(23)

1.2. Abbreviations 5

1.2 Abbreviations

2D Two-Dimensional

3D Three-Dimensional

a-Si:H Hydrogenated amorphous Silicon A/D Analogue-to-Digital

ALU Arithmetic Logic Unit

AM Amplitude Modulation

AND logical AND

ASIC Application Specific Integrated Circuit

BGA Ball Grid Array

BiCMOS Bipolar Complementary Metal Oxide Semiconductor

BoL Beam-of-Light

CCD Charge Coupled Device

CDS Correlated Double Sampling

CMOS Complementary Metal Oxide Semiconductor CPU Central Processing Unit

CW Continuous Wave

DC Direct Current

DFT Discrete Fourier Transform DMD Digital Micromirror Device DSP Digital Signal Processor

FM Frequency Modulation

FoV Field of View

FPGA Field Programmable Gate Array

GLU Global Logic Unit

GOR Global OR

LCD Liquid Crystal Display LED Light Emitting Diode

LUT Look-Up Table

LVDS Low Voltage Differential Signalling

MAC Multiply ACcumulate

MAPP Matrix Array Picture Processing NAND inverting logical AND

NESSLA NEar-Sensor Sheet-of-Light Architecture NLU Neighbourhood Logic Unit

NMOS n-type channel Metal Oxide Semiconductor

NOR inverting logical OR

NSIP Near-Sensor Image Processing

OR logical OR

PE Processing Element

PLD Programmable Logic Device

(24)

PLU Point Logic Unit

p-i-n Diode junction with an intermediate layer of intrinsic silicon p-n Diode junction in doped silicon

PM Pulse Modulation

PMOS p-type channel Metal Oxide Semiconductor

PN Pseudo Noise (modulation)

Pvlsar Programmable and Versatile Large Size Artificial Retina PWM Pulse Width Modulation

ROM Read-Only Memory

SAR Synthetic Aperture Radar

SICK IVP SICK IVP Integrated Vision Products AB SIMD Single Instruction, Multiple Data

SIMPil SIMD Pixel processor SNR Signal-to-Noise Ratio

SoL Sheet-of-Light

SRAM Static Random Access Memory

TFA Thin Film on ASIC

ToF Time-of-Flight

UV Ultra-Violet (light) VIP Video Image Processor XOR logical eXclusive OR

(25)

Part I

Overview

(26)

(27)

CHAPTER 2

This document describes the efforts made to find a new way of solving an old problem in the fields of automation and robot vision, namely, the peception of depth. The work has been directed by a set of guidelines presented in section 2.2 that originate from di-scussions within the company financing this study, SICK IVP Integrated Vision Products AB1. The first part of this document provides an overview of 3D imaging schemes. Each scheme is then processed in a bit more detail in part II, leading forward to an identifica-tion of some core issues regarding the choice of path to follow when suggesting soluidentifica-tions to some of these issues. In part III an idea is conceived on how to meet the demands in a decent manner, and part IV contains a conclusion of the results, some directions for future work, and a discussion about how this study was implemented.

2.1 Background

SICK IVP Integrated Vision Products AB started in 1985 as a spin-off from Linköping University, where an idea was developed about integration of high-performance image processing power into a CMOS image sensor. The development of the MAPP architecture described in subsection 7.1.1 led to a system for extremely fast acquisition of laser sheet 3D profiles (See for instance Johannesson [2] pp. I-53–I-78). Today, the technology from SICK IVP is based on improvements and enhancements of the original MAPP sensor architecture combined with laser sheet-of-light scanning for 3D image acquisition. The latest sensor, the M12 sensor described by Johansson et alii [8] and Lindgren et alii [9], and used for example in the products Ranger M40/M50/C40/C50/C55 [10, 11, 12] of SICK IVP, is capable of delivering more than 20000 triangulation profiles per second, each 1536 values wide.

There are some fundamental limits to the scanning sheet-of-light triangulation techni-que when striving for higher speed and higher resolution, for instance, realizing extremely accurate motion control during scanning of very small geometries. Another problem with laser sheet-of-light measurement is that speckle becomes an issue when coherent light

1_{In this text called SICK IVP.}

(28)

sources are used on very small geometries, for example examining solder bumps on ball

grid array (BGA) components. Laser speckle is temporally static and a small movement

or variation in the measurement geometry blurs the speckle patterns, thereby, decreasing the effects. Non-scanning snapshot measurements will however sample the speckle with almost no movement and hence suffer more from speckle than, for instance, scanning measurement schemes.

The vision regarding a new line of SICK IVP products is to be able to acquire a full 3D image snapshot extremely fast and with sufficient resolution, with maintained accuracy and robustness of the measurement while avoiding demands on object movement. This vision is why the snapshot 3D project was launched.

2.2 Conditions for the project

Find a way to implement a full-image ‘3D snap’ algorithm in a sensor architecture that fulfills the following criteria:

• Enough data for a standard PC to be able to calculate a 3D image should be acquired in short-enough time for the system to be accurately modeled as a single-image snapshot system. Multiple images may be used if acquired fast enough.

• There should be no or minimum scanning motion involved.

• Depth resolution should be at least 1µm, and spatial resolution of the sensor of at

least 512 × 512 pixels.

• _{Field-of-view range between 50 mm × 50 mm and 1000 mm × 1000 mm.}

• The sensor should be economically feasible to produce in volume, which means that a standard CMOS process of 0.35µm or possibly down to 0.18 µm minimum

channel length should be sufficient, possibly also a-Si techniques.

• Throughput of at least 25 3D images per second with a reasonable amount of com-putational work left to do for an external PC.

2.3 Working methodology

The project is intended as a Master’s Thesis project of 20 weeks, including the completion of this document. Responsible for the work is the author, Björn Möller, supported at SICK IVP by Mattias Johannesson and Henrik Turbell regarding suitable imaging algorithms. Support has also been available from Johan Melander regarding sensor issues. Examiner of the Master’s Thesis at Linköping University is Dake Liu.

The initial idea was to study existing techniques for 3D imaging as well as suitable circuitry and architectures for image acquisition and image processing. Combined with my knowledge in CMOS design and advanced image sensors, the expected result was a proposal for a 3D snapshot imaging equipment that minimizes the deviation from the project goals. Sources used to collect information are such as various Internet based databases, library catalogues, patent databases, and journals on related subjects. Also, my

(29)

2.3. Working methodology 11

supervisors and other skilled employees at SICK IVP have been very helpful in finding information on the subject.

Given a successful 3D imaging scheme the second task would be to find a way of altering the sensor system, and in particular the architecture of SICK IVP CMOS sensors, in such a way that the behaviour of a new system can come as close as possible to the 3D snap goals. Lacking a suitable 3D imaging scheme, more effort should be applied in identifying and solving some problems related to the most suitable schemes. Decisions during the work process make the adaptation and produceability parts less important and instead more energy has been given to study the schemes and finding general solutions to their weaknesses.

(30)

(31)

CHAPTER 3 Goals for the project

Before digging into the details of existing 3D imaging schemes, a second thought on the criteria listed in section 2.2 might be appropriate.

3.1 Main project goal

• Find a way to implement a full-image ‘3D snap’ algorithm in a sensor architecture that fulfills the following criteria.

The goal is not to develop image acquisition algorithms but rather to evaluate the possi-bility of, and if possible a way of, developing hardware that can support such algorithms.

3.2 Snapshot behaviour

Snapshot behaviour, the capability of processing an entire scene within a limited period of time sampled as if totally immobile, is a cornerstone of this work. If image acquisition, or rather light integration, takes for example 4 ms a movement in the object scenery of only 250µm/s will result in a Y direction inaccuracy of 1 µm. Light integration time is

limited by the power of the light source and the sensitivity of the photodetector device, which in turn depends on light absorbing area and parasitic capacitance, among other things. In the extreme case, integration time is limited by the transient behaviour of the photosensitive device, a fact that is obvious in the case of non-integrating devices.

• Enough data for a standard PC to be able to calculate a 3D image should be acquired in short-enough time for the system to be accurately modelled as a single-image snapshot system. Multiple images may be used if acquired fast enough.

• There should be no or minimum scanning motion involved.

Of all the 3D schemes listed in this chapter, only the pulse scheme using modulated time-of-flight in section 5.14 gives the impression of having the ability to acquire a full 3D image in real-time out of one image. Pulse modulation (PM) Time-of-Flight (ToF)

(32)

is on the other hand severely limited regarding Z resolution. Defocusing and interfero-metric schemes look rather promising from a snapshot behaviour point of view, since they require only two or three images. Unfortunately they do require mechanical movement to manipulate focusing distance of the imaging system, which is in violation of the de-mand for no scanning motion and may limit the maximum acquisition speed. Control of this movement is probably a bit easier than constant-speed scanning though, since the only important value to control is an end position rather than the velocity of a continuous motion.

3.3 Speed

High-speed applications is the most important niche for SICK IVP technology, moti-vating customers to turn to rather expensive and complex solutions when inexpensive high-quality low-speed alternatives exist.

• Throughput of at least 25 3D images per second with a reasonable amount of com-putational work left to do for an external PC.

In combination with the other goals this translates into slightly less than 6.6 · 106resulting measurement points per second. Reasonable computational work means in practice less than or approximately equal to one floating-point multiplication per pixel value. This means that height data may be linearly rescaled before decisions are made, however, there is not enough time for spatial filters, correlation, rotation, or other more or less complex calculations without the aid of extra, specialized hardware.

Throughput and snap behaviour both put constraints on light integration time, which in turn sets demands on light power. For instance, to resolve an image into 512 steps of depth without subpixling efforts requires at least 11 images taken using sequentially coded light described in section 5.7; one dark and one totally illuminated image for thres-hold calculation, and nine sequential Graycoded images allowing for 29= 512 steps of

height data. This gives each image capture about 3.6 ms available for light integration and data processing without violating the 25 Hz frame rate. In order to achieve good enough accuracy it may be necessary to project a couple of inverted Gray images too. Also, the entire time of 40 ms available for measurement in terms of throughput demands is not available for image capturing in order for the measurement to be accurately modelled by snapshot behaviour.

3.4 Z resolution

Resolution of measured height is the key feature of 3D acquisition schemes.

• Depth resolution should be at least 1µm.

This goes for the smaller field of view (FoV) of (50 mm)2, where the interesting depth-of-view is around 5 mm.1 _{In applications using the larger FoV, interesting depth resolution} is around 1 mm, and depth-of-view is about 500 mm. In terms of PM ToF (section 5.14) 1_{This results in around 5000 steps of depth, or 13 b integer representation. Existing products from SICK}

(33)

3.5. X×Y resolution 15

1µm is equivalent to a global timing accuracy of 6.67 fs or better, which is not possible

in practice in a 5122_{pixel array, as stated in the X×Y resolution section. In reality, all PM} ToF schemes using more than one pixel will be limited to an accuracy in the neighbour-hood of a few centimetres, mainly due to signal skew in the pixel array. Interferometric schemes in section 4.2 and Moiré interferometry in section 5.6 are known to produce high depth resolution over small variations of somewhat smooth surface structures. They suffer instead from phase ambiguity when the surface shows discontinuities or steep changes, thus, violating the demand on surface quality independence.

3.5 X×Y resolution

As always in image acquisition and processing the core of the device is a two-dimensional array. In physical terms each element must be kept small and simple since its impact on the total system is multiplied by the square of 1D resolution.2 Data communication wit-hin a sensor chip and out of it, as well as storage of temporal information in the sensor, also scales in this quadratic manner. For instance, 8 tiny bits of memory for each pixel in a 5122pixel sensor equals a huge 2 Mb of total memory. Such a size is perhaps an accep-table quantity on a dedicated memory chip, but not on a chip shared with an image sensor, since the memory will have to coexist with the 262144 photosensitive cells as well as with analogue readout circuitry, analogue-to-digital (A/D) converters, digital communication, and control logic.

• _{Spatial resolution of the sensor of at least 512 × 512 pixels.}

This requirement limits the available area per pixel, the complexity of each pixel, data distribution, and global timing. A maximum total chip area of for example (22 mm)2, which is near the maximal area available in a recently used 0.35µm process, sets the

theoretical pixel size maximum to about 40µm2

, leaving a 760µm wide border for pads

and out-of-pixel electronics. For instance, if omitting photosensitive devices altogether the available area allows less than 20 × (9 µm)2 cells of static random access memory (SRAM) per pixel, excluding read/write and addressing circuitry.3 A large chip also demands good optics in order to project the scenery onto a diagonal of 30 mm (≈ 13/₁₆00) with minimal distorsion. In addition, the yield of a large chip is generally lower than for smaller ones, due to the distribution of defects in silicon wafers.

A conventional CMOS sensor makes use of p-n junctions near the surface of a silicon wafer. The same surface is also used for active components such as transistors. Also, routing of signals and power is made with opaque metals at higher layers placed on top of the silicon. Having electronic circuitry such as memories, A/D converters, comparators, amplifiers, etcetera within the photosensitive area interferes with imaging functionality in a number of ways. For instance, metal wires occlude photo devices and p-n junctions in the vicinity attract photon induced charges, thus, preventing detection in the photo device as well as inducing noise in the interfering circuitry. These effects combined are considered in the so-called fill factor, which is the equivalent percentage of the total pixel 2_{Image sensor arrays often use quadratic or rectangular arrays of around 1000 rows or columns, and the}

number is expected to increase.

(34)

area detecting light in a sensor. The higher the fill factor, the higher the sensitivity of the sensor. Sensitivity is one limiting factor of spatial resolution, since it sets a practical minimum energy of light. Putting photosensitive devices on top of the metal layers using the technology described in section 7.2 and by Rantzer [13], Lulé et alii [14] among others, enables a fill factor close to 100 % even when the sensor array contains complex electronic circuitry. The use of microlens structures as described in section 7.6 on top of the device is another way of improving the optical fill factor.

3.6 Communication requirements

[x-1, y-1] [x-1, y] [x-1, y+1] [x, y-1] [x, y] [x, y+1] [x+1, y-1] [x+1, y] [x+1, y+1] a. 4-pixel neighbourhood [x-1, y-1] [x-1, y] [x-1, y+1] [x, y-1] [x, y] [x, y+1] [x+1, y-1] [x+1, y] [x+1, y+1] b. 8-pixel neighbourhood [x-1, y-1] [x-1, y] [x-1, y+1] [x, y-1] [x, y] [x, y+1] [x+1, y-1] [x+1, y] [x+1, y+1] c. 12-pixel neighbourhood Figure 3.1: Complexity of communication between pixels.

The requirement on X ×Y resolution discussed previously indirectly means that there is no way of implementing complex global one-to-one communication between arbitrary pixels. What can be achieved is communication within a neighbourhood of around four pixels as shown in figure 3.1 or a wired-gate structure of one bit per pixel forming pri-mitive global operations such as global OR, along with column- or row-parallel readout. Also, an asynchronous rippling global-context operation like the NSIP global logic unit (GLU) (see subsection 7.1.2) is possible.4 _{Complex calculations on global data such}

External dft processor Image sensor

External memory

Output

Figure 3.2: Separate DFT processor.

as Fourier transformations and stereo image correlation seem unlikely to be completely solved using only near-sensor processing capacity, due to the demand of memory and communication (see for example Johannesson [2] p. II-67). They are instead rather well suited for architectures using separate image sensors and a stand-alone digital signal

pro-cessor (DSP).

4

Time must be given for such an operation to complete since gate depth is in the order of k · (X + Y), where k is the gate depth per pixel. Acceleration techniques are not applicable due to the regular and highly area optimized array structure.

(35)

3.7. Field of view 17

3.7 Field of view

This requirement mainly affects the choice of optics and light source. There are economi-cal limitations regarding optics and light sources that must be examined when developing a commercially acceptable product.

• _{Field-of-view range between 50 mm × 50 mm and 1000 mm × 1000 mm.}

The smaller field-of-view, with 5 mm depth range, is intended for use in the field of electronics where depth resolution of a singleµm is used. Whereas, the larger one is

ty-pically used in wood processing applications with a depth of 500 mm resolved into 1 mm steps. In other words, a field-of-view between (512×512×5000) × (100×100×1) µm3 and (512×512×500) × (2×2×1) mm3.

The choice of light source and projecting optics is highly dependent on the field-of-view of the application. The greater the depth, the higher the demand on power of the light source. Also, a scenery near the sensor combined with a wide object makes the angles for projection and viewing large, measured from the optical axis. This requires nonlinear models and makes aberrations in the lenses more severe and occlusion more pronounced when using active illumination. In addition, light is attenuated in the viewing lens de-pending on the angle of the incident light ρ as cos4ρ, according to Horn [15] p. 208 and Johannesson [2] p. I-14.

3.8 Production feasibility

• The sensor should be economically feasible to produce in volume, which means that a standard CMOS process of 0.35µm or possibly down to 0.18 µm minimum

channel length should be sufficient. The possibility of using diodes in a-Si techni-que also exists, although this can not be considered economically feasible today. This requirement limits very strongly what can be done within one pixel in a 5122pixel array. Utilization of a-Si is described in section 7.2. At present it seems far from an economically feasible solution but given time and research efforts it may very well be a realistic alternative in the near future. Due to the long-time nature of development efforts involved forming an entirely new product branch both present and probable near-future options must be considered.

Traditionally, volume production at SICK IVP AB has been around 102 _{– 10}4 _units per year. In general, large volumes of CMOS circuits are in the order of 106 – 108units per year. An example of products that are not produced in volume is equipment for pro-duction test for a specific unit, where a very small number of at most 10 units are built. The significance of production volume is in the trade-off between design and develop-ment costs and costs of fabrication and manufacturing. When considering a high-volume product, inexpensive components and highly automated manufacturing are more impor-tant for the price per unit than the design cost. When building five units small resources are spent on automation and the materials of one unit can be a large part of the total cost. Production of some thousand units per year is regarded as moderate volumes, where per-unit costs as well as per-design costs are significant.

(36)

Regarding CMOS process dimensions, the cost of masks and fabrication usually increases with decreasing feature sizes. On the other hand, circuits become smaller and therefore each produced wafer results in more chips. Also as mentioned in section 3.5, due to the distribution of defects in CMOS chips smaller circuits contain less defects. In total, the most productive strategy is often to use as small feature sizes as possible. Good image sensor processes5 have usually been one or two generations behind the smallest commercially available channel lengths for digital application specific integrated circuits (ASICs). The ground for analogue and mixed-signal design seems somewhat unstable for geometries smaller than 0.18µm (see for instance Paillet et alii [16] p. 4,

Kleinfel-der et alii [17], Wong [18]), since leakage currents and other issues tend to sabotage the intended analogue behaviour of circuits. Some sources even state that analogue designs do not scale at all with minimum channel lengths below 0.18µm. Also, the higher

do-ping levels needed in newer processes constitute obstacles for photo detection, since the depletion region of the p-n junctions normally used for photo detection decrease with increasing doping levels. This decreases the volume where photon generated charges are detected and also increases both dark currents and parasitic capacitances.

Devices like the Photonic Mixer Device, Photomischdetektor, or PMD, described by Justen [1], Luan [19], and Schwarte et alii [20], among many others, probably need spe-cially optimized fabrication lines in order to produce good quality units. The whole idea of the CMOS image sensor business is to make use of normal electronic device processes in order to reduce fabrication costs and provide a means for integrating electronics on the sensor chip, contrary to the charge coupled device (CCD) business, where process optimization is dedicated entirely to imaging properties of the device.

3.9 Object surface dependence

All techniques require some optical properties of the studied objects. Requirements like ‘enough light must be reflected back to the sensor’ are difficult to avoid since the sensor is dependent on photon flux, fortunately, they are rather easy to obey since a simple light bulb often solves the problem.6 _{However, some schemes demand properties like} smoothness in shape, colour, or reflectance. A commercially acceptable product can not rely mainly on such properties in order to work; otherwise the field of applications will be too narrow. Interferometric and Moiré schemes can not solve phase ambiguities single-handedly if a step or crack splits a surface because the measured height at a point is given relative to neighbouring measurements with a rather short unambiguity length. Hence, such techniques need to be combined with a more robust, absolute measuring one in order to give accurate and absolute measurements, like in Wolf [21] and Wolf [22] where phase-shifting fringe projection is combined with a Graycoded light sequence in order to overcome problems regarding relative measurement. Focusing, passive, and 5_{That is, CMOS processes that have been used for photoelectric applications of some sort and/or are}

somewhat characterized in terms of photosensitivity. Rantzer [13] (paper IV, pp. 77–82) shows an example of what is in image sensors clearly a defect but has never been considered, not even known, in other CMOS applications.

6_{The number of photons detected in the sensor is a product of light source intensity, reflectance and}

direction of reflected light from the surface of the object, aperture area of viewing optics, fill factor and quantum efficiency of the pixels, and time of exposure or integration time.

(37)

3.10. Detection of surface properties 19

shading schemes depend highly on surface properties. The first two require the presence of recognizable objects and the latter assumes that the reflectance properties are constant over the surface. Use of individual thresholds as described by Trobina in [23] strongly suppresses the surface dependence of a measurement scheme.

3.10 Detection of surface properties

In previous SICK IVP sensors the ability to produce 2D greyscale values has been more or less ‘in the package’, since grey values of some resolution, from one bit up to eight bits, have been used in determining the position of a laser stripe. Colour filters have also been available in some applications. This may not be the case in a future technology using extremely specialized 3D snap sensors. In its true sense, a height image z(x, y) is not a 3D image but rather two-dimensional information about the third dimension. Whether 2D greyscale images are so important as to motivate implementation of specific hardware for the sole purpose of acquiring them must be answered in consideration of possible applications. The answer yes implicates further questions listed below.

1. Is being able to choose between aquisition of depth information and of greyscale images a sufficient solution, or is so-called registered image acquisition necessary as defined by Justen in [1] p. 17?

2. Do we need eight bits of resolution in the 2D image? 3. How fast should 2D images be output?

For the needs of SICK IVP regarding 3D snap imaging, greyscale images are not needed for other reasons than perhaps validation of range data. A two-dimensional ‘depth map’

z(x, y) is sufficient for this project, although, the possibility of greyscale image acquisition

(38)

(39)

Part II

3D imaging schemes

(40)

(41)

CHAPTER 4 General 3D image acquisition

This chapter provides an overview of 3D measurement schemes. In order to achieve a clear view of what is out there in the whole field of 3D imaging I have accepted almost any source of information as long as I can understand the language1. The initial catego-risation of schemes, as exemplified in figure 4.1, is taken almost entirely from Schwarte [24], where all 3D imaging measurements are divided into triangulation described in sec-tion 4.1, interferometry in secsec-tion 4.2, and time-of-flight type methods in secsec-tion 4.3. The subdivision of schemes in the three categories into the two types active and passive schemes is also inherited from Schwarte [24].

3D imaging

Triangulation Interferometry Time-of-flight

Focus Structured light Passive Shading Multi-wavelength Holographic Speckle White light Pulse CW PN

Figure 4.1: Overview of 3D imaging schemes.

4.1 Triangulation schemes

Triangulation makes use of 2D projection of one plane of light onto an object surface seen in another, non-parallel plane to determine distance. The principle of triangulation is described for example by Luan [19] pp. 5–7, by Johannesson [2] pp. I-3–I-8, and in figure 4.2. Knowing the values of α, β, and b, the distance h can be calculated as

h = b · sin α · sin β

sin (α + β) (4.1)

1_{In practice this means that sources written in other languages than Swedish, Norwegian, Danish, English,}

or German are unavailable.

(42)

α ∆α β ∆β h b Object

Light source or sensor _Sensor

Figure 4.2: The principles of triangulation.

The values of α, β, and b are chosen to meet the requirements on field-of-view ∆α, ∆β and depth range ∆h while maintaining sensitivity dα_dh, dβ_dh. Johannesson [2] pp. I-9–I-26 reveals some of the details involved in designing the optical set-up.

3D imaging

Triangulation

Interferometry Time-of-flight

Focusing schemes Structured light

Passive Shading

Confocal Autofocus Defocus Laser tri Light volume

Figure 4.3: The triangulation schemes.

Shown in figure 4.3, Schwarte [24] subdivides triangulation schemes into focusing schemes in subsection 4.1.1, schemes using structured light in section 4.1.2, passive schemes in subsection 4.1.3, theodolite systems not described in this text, and shading schemes described in section 5.9.

4.1.1 Focusing schemes

Focusing schemes use information about the focal distance of the sensor optics to deter-mine the distance to an object. A sketch of a focusing lens system is shown in figure 4.4. Confocal microscopy described in section 5.3, self-adjusting focus or autofocus in section 5.4, and depth from defocus measurement seen in section 5.5 are in [24] classi-fied by Schwarte as the main branches of focusing schemes. There are passive focusing schemes as well as schemes using structured light described in the literature. Passive schemes can be used only if the surface contains visible objects to focus on, whereas for active schemes focal measurements can be done on projected structures. Focusing and defocusing 3D schemes rely on small lens aberrations, well defined2natural or projected target objects, and accuracy of the optical model to achieve high resolution and low noise.

(43)

4.1. Triangulation schemes 25 S en so r o u t o f fo cu s F o cu se d o b je ct im ag e Optical axis Thin lens O b je ct su rf ac e Defocus F o ca l le n g th f

Figure 4.4: The principles of focusing.

For the basic principles of focusing in camera systems, refer to for instance Pedrotti and Pedrotti [25], section 6.3 pp. 125–129.

4.1.2 Schemes using structured light

3D imaging using structured light means that a pattern of some sort is projected onto the object surface by some kind of light source. The projection is then captured as an image and analysed to extract the wanted information. A number of schemes are presented by Schwarte [24] as triangulation schemes based on structured light, for example laser triangulation, Moiré, and Graycoded light. Other schemes listed under focusing schemes (subsection 4.1.1) and passive schemes (subsection 4.1.3) also benefit from the use of structured lighting. Three different types of structured light are sorted out depending on the dimensions of light. They are beam-of-light (see section 5.1), sheet-of-light (section 5.2), and light volume techniques. The first two types use point-type and line-type laser beams, respectively.3 _{The third category is interesting from a snap point of view since it} uses projection of two-dimensional light patterns onto the object surface. This means that the entire scene is illuminated by the structured light at the same time, in contrast to the first two kinds where only a very small portion at a time is lit by the laser source.

One interesting light volume scheme is the so-called sequentially coded light scheme in section 5.7, where a sequence of different patterns is projected onto the target and sampled by an image sensor. Attempting to reduce the sequence of projections one could turn to colour coding where instead of projecting different patterns in sequence a number of illumination stages can be put into one single frame by using different colours of light. The sensor must then be able to separate colours well enough to extract the needed infor-mation. Also, the transfer function from colour pattern projector via the studied object and through the colour sensor must preserve the projected colour.

(44)

Different scenes put different restrictions on the type of pattern used in structured light approaches. Hall-Holt and Rusinkiewicz [26] list three types of necessary assumptions,

reflectivity assumptions like preserving colour information, assumptions on spatial co-herence meaning that spatial neighbourhoods of image elements consist of neighbouring

parts in the used light pattern as well as in the physical surface, and temporal coherence assumptions about maximum velocity of objects in the scene.

4.1.3 Passive schemes 3D scene 2D image 2D image Z0 Z1_Z 2 Z3 Z0 0 Z0 1Z02 Z03 Z00 0 _Z00 1 Z002 Z003

Figure 4.5: Stereo depth perception.

Three different schemes of 3D imaging are listed in the category of passive schemes by Schwarte [24]. All three are stereo based photogrammetry schemes using either several 2D cameras in a known relative geometry (subsection 5.8.1), several 2D cameras with dynamic self-adjustment (subsection 5.8.2), or just one self-adjusting 2D camera taking several images in a scan-type manner (see subsection 5.8.3). Also, Faubert [27] describes the motion parallax phenomenon (subsection 5.8.4). Common to all four schemes is the basic problem of comparing and correlating objects seen in different images from different points of view. There does not seem to be a method to locally solve this problem without using sequences of structured light. Stereo based schemes benefit greatly from the use of structured light, which in turn creates good criteria for solving the much simpler problems of light volume techniques described in subsection 4.1.2.

3D imaging using stereo based sensors requires global correlation of details either naturally present in the object surface or projected onto it. A sensor with column paral-lel or pixel paralparal-lel processing of local properties is inadequate in terms of completely solving the 3D measurement using general stereo vision, but in section 5.8 a number of

(45)

4.2. Interferometric schemes 27 Fea ture extr acti ng imag ese nsor Fea tur_{e e} xtr act_ing im ag_{e s} en sor Object Feature correlation Standard PC

Figure 4.6: Stereo camera with two image sensors.

schemes are presented solving the stereo problem confined to rows4or even single pixels. As shown by Petersson [28], an NSIP sensor might be useful as a first step to process the scenery since it enables fast feature extraction. Still a separate processing unit such as a DSP is required to correlate features in the separate images and calculate depth. Such a system, illustrated in figure 4.6 working with standard image sensors would need to handle two parallel streams of say 53 Mb/s, calculated using 25 images per second with 512×512 pixels of 8 b image data. In section 7.4 the communication problem is discus-sed. A reasonable amount of memory for the DSP would be able to store a few entire images, that is, in the order of 1 MB.

4.2 Interferometric schemes

3D imaging

Triangulation Interferometry Time-of-flight

Multi-wavelength Holographic Speckle White light

Figure 4.7: Interferometric schemes.

There is a wide range of interferometry applications, from microscopes described for instance by Notni et alii [29], via synthetic aperture radar (SAR) by for example Chapman [30], to deep-space interferometry by for instance JPL [31]. Of course, in a typical SICK IVP application the first is the most relevant area.

(46)

Laser source

Beam splitter

Object

Tuneable attenuator

Variable phase stepper

Beam mixer/collimator

Sensor

Figure 4.8: A typical interferometry setup.

The principle of optical interferometry is to make two light beams interfere, of which one has travelled a fixed distance and the other is lead in a path, the length of which is dependent on the shape of a studied object surface. When the two images coincide a fringe pattern occurs, which is dependent on the wavelength and phase difference of the two light rays. For details on the basics of interferometry, refer to for example Pedrotti and Pedrotti [25], chapter 11 pp. 224–244.

There are numerous different interferometer set-ups: Fabry-Perot, Fizeau, Keck, Lin-nik, Mach-Zehnder, Michelson, Mirau, Twyman-Green, Young, etcetera. How to choose interferometry set-up is an issue of system design and is not treated in this project. Com-mon to all interferometers, including Moiré types described in section 5.6, is that one need at least three images with different phase in order to measure absolute distance un-ambiguously, using some sort of phase manipulating as described in subsection 4.2.1. In-terferometers using multiple wavelengths (see section 5.11) and white light (section 5.13) solve the ambiguity problem in the first case by producing a longer synthetic wavelength and in the second case by scanning the scene in the Z direction. Luan [19] pp. 7–8 and Heinol [32] pp. 8–9 classify optical interferometry as a ‘coherent time-of-flight’ method. Interferometry using light of wavelength λ can yield very high depth accuracy in the order of λ/100.

4.2.1 Phase-shifting

Phase or fringe shifting [21, 22, 33, 34] consists of moving a grating or light beam by fractions of a period in order to achieve more accurate measurements between fringes. Shifting the projection between at least three positions also determines the local order of fringes; that is, it is determined without ambiguity whether a slope is downhill or uphill.5

(47)

4.3. Time-of-flight schemes 29

In the case of sinusoidally changing grey levels of light, phase shifting enables calculation of exact phase in a point, regardless of the surface properties in that point.

4.3 Time-of-flight schemes

3D imaging

Triangulation Time-of-flight Interferometry

Pulse detection CW modulation Pseudo-noise

Figure 4.9: Time-of-flight schemes.

Time-of-flight based 3D measurement makes use of the time it takes for light to tra-vel the distance from a light source, to an object surface, and reflect back onto a two-dimensional detector array.6 As shown in figure 4.9, three main types of time-of-flight schemes are mentioned in this document.7 All three use the time for light to travel the distance from a source to a sensor but rely on three different schemes for detection. The most straight-forward type, pulse modulated ToF measurement described in section 5.14, simply turns on a light source at the same time as a ‘stop watch’ is started. This stop watch is then read as soon as the detector senses an increase in intensity of the incoming light. The second type, continuous-wave modulated ToF seen in section 5.15, measures the phase difference between the continuous modulating signal and the modulation of the received light. The third type in section 5.16 uses pseudo-noise modulation.

6_{Of course a point detector can be used in combination with scanning of some sort, at the risk of}

com-promising snapshot behaviour.

(48)

(49)

CHAPTER 5 3D image acquisition schemes

This chapter consists of a list of possible 3D acquisition schemes. Each scheme is descri-bed in short, then its properties are weighted into a conclusion about the scheme.

5.1 Beam-of-light triangulation

Projected height

Laser beam source

Object height O b je ct Sensor

Figure 5.1: The principles of beam-of-light triangulation.

Beam-of-light imaging is performed by locating a laser point on an object surface, originating from a source at a fixed distance from the camera, as shown in figure 5.1. The laser beam and the optical axis of the camera are non-parallel and in-plane, with a known angle. The resulting displacement of the laser dot due to height can be transformed into a measurement of this height, and if small enough compared to the distance to the sensor this calculation can be modelled accurately by a simple linear equation. The laser position can be measured using a CMOS or CCD line sensor or a position sensitive device,

(50)

described by for instance Johannesson [2] pp. I-37–I-38 and Fujita and Idesawa [35]. The

signal-to-noise ratio (SNR) of the measuring system can be improved by the choice of

light wavelength1 and using corresponding optical bandbass filters. Also, one can use background adapted thresholds to find a laser dot or stripe.

Beam-of-light triangulation requires 2D scan and when measuring an entire scene as sampled instantaneously the scheme is entirely inferior to sheet-of-light triangulation described in section 5.2, which is SICK IVP standard today. Scan time can be limited though if only a small portion of the scene is to be measured. Also, the 1D detec-tion principle allows for simpler detector structures than 2D arrays. Since the concept of beam-of-light triangulation is similar to sheet-of-light, no further discussion of this scheme will take place.

5.2 Sheet-of-light triangulation

Projected height profile

Laser sheet source

O b je ct h ei g h t O b je ct Sensor

Laser plane normal

Camera view

Figure 5.2: The principles of sheet-of-light profiling.

The sheet-of-light projection 3D measurement scheme is currently used by SICK IVP [2, 11, 36]. A fixed geometry is preferred, where the angle between the optical axis of a 2D sensor and the direction of a laser plane is known, for example π/6 rad. The angle

is a trade-off between sensitivity and maximum depth of the FoV. The stripe projected over the object surface is detected, and its Y position yields a height profile of the cut between the laser sheet and the object surface, as seen in figure 5.2. To view a full 3D scene, a scanning movement is required in the Y direction, either by changing the laser angle with a rotating mirror or in the case of fixed geometry by moving the target. Using fixed geometries simplifies calculation efforts when the measured raw data is translated into world coordinates.

1_{A CMOS image sensor responds to different wavelengths with varying efficiency and spatial accuracy.}

Also, there may be laws regulating the effect of lasers with certain wavelengths and the price for a laser source differs for different wavelengths. Furthermore, structured light is more easily detected if it differs in colour from the main background source thereby effectively attenuating the main noise source in optical filters.

Full frame 3D snapshot : Possibilities and limitations of 3D image acquisition without scanning

Full frame 3D snapshot

Possibilities and limitations of 3D

image acquisition without scanning

Examensarbete utfört i Datorteknik

av

Björn Möller

LITH-ISY-EX--05/3734--SE

Linköping 2005

Full frame 3D snapshot

Possibilities and limitations of 3D

image acquisition without scanning

Examensarbete utfört i Datorteknik

vid Linköpings tekniska högskola

av

Björn Möller

LITH-ISY-EX--05/3734--SE

Handledare: Mattias Johannesson, Henrik Turbell

Examinator: Dake Liu

Abstract

Acknowledgement

Table of contents

List of figures

List of tables

CHAPTER 1

Definitions and abbreviations

1.1

Concept definitions

1.2

Abbreviations

Part I

Overview

CHAPTER 2

Contents

2.1

Background

2.2

Conditions for the project

2.3

Working methodology

CHAPTER 3

Goals for the project

3.1

Main project goal

3.2

Snapshot behaviour

3.3

Speed

3.4

Z resolution

3.5

X×Y resolution

3.6

Communication requirements

3.7

Field of view

3.8

Production feasibility

3.9

Object surface dependence

3.10

Detection of surface properties

Part II

3D imaging schemes

CHAPTER 4

General 3D image acquisition

4.1

Triangulation schemes

4.2

Interferometric schemes

4.3

Time-of-flight schemes

3D imaging

CHAPTER 5

3D image acquisition schemes

5.1

Beam-of-light triangulation

5.2

Sheet-of-light triangulation