Programming of Microcontroller and/or FPGA for Wafer-Level Applications - Display Control, Simple Stereo Processing, Simple Image Recognition

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Programming of Microcontroller and/or FPGA for

Wafer-Level Applications - Display Control, Simple

Stereo Processing, Simple Image Recognition

Examensarbete utfört i Elektroniksystem vid Tekniska högskolan vid Linköpings universitet

av

Himani Raj Pakalapati

LiTH-ISY-EX--13/4656--SE

Linköping 2013

Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

(2)

(3)

Programming of Microcontroller and/or FPGA for

Wafer-Level Applications - Display Control, Simple

Stereo Processing, Simple Image Recognition

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan i Linköping

av

Himani Raj Pakalapati

LiTH-ISY-EX--13/4656--SE

Handledare: Dr. Helmuth Eggers

Sensorik-Komponenten, Daimler AG

Dr. Armin Huerland

Sensorik-Komponenten, Daimler AG

Examinator: Dr. Kent Palmkvist

isy, Linköpings Universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Electronics System Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2013-02-18 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.es.isy.liu.se http://www.es.isy.liu.se ISBN — ISRN LiTH-ISY-EX--13/4656--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title Programming of Microcontroller and/or FPGA for Wafer-Level Applications - Dis-play Control, Simple Stereo Processing, Simple Image Recognition

Författare

Author

Himani Raj Pakalapati

Sammanfattning

Abstract

In this work the usage of a WLC (Wafer Level Camera) for ensuring road safety has been presented. A prototype of a WLC along with the Aptina MT9M114 stereo board has been used for this project. The basic idea is to observe the movements of the driver. By doing so an understanding of whether the driver is concentrating on the road can be achieved.

For this project the display of the required scene is captured with a wafer-level camera pair. Using the image pairs stereo processing is performed to obtain the real depth of the objects in the scene. Image recognition is used to separate the object from the background. This ultimately leads to just concentrating on the object which in the present context is the driver.

Nyckelord

(6)

(7)

Abstract

For this project the display of the required scene is captured with a wafer-level camera pair. Using the image pairs stereo processing is performed to obtain the real depth of the objects in the scene. Image recognition is used to separate the object from the background. This ultimately leads to just concentrating on the object which in the present context is the driver.

(8)

(9)

Acknowledgments

The present thesis is documentation of the work done at ‘Environment Perception’ (Group Research and Advanced Engineering) department of Daimler AG.

I would like to offer my sincere gratitude to Dr. Helmuth Eggers for accepting me for a thesis position in their department. Its been an honour to work under him and Dr. Armin Huerland. I was able to improve my knowledge in image processing and wafer level cameras under their supervision.

I would like to thank Dr. Kent Palmkvist, for his guidance during my thesis work.

I am grateful to my husband and my parents who supported me all through my Master studies.

(10)

(11)

Introduction

In the recent years an immense change in the Human-machine interface employed in the automobiles has been observed. The current generation automobiles are fitted with smart driver assistance systems such as in-vehicle navigation systems, adaptive cruise control, lane departure warning systems, night vision, traffic signal recognition, driver drowsiness detection to name a few. These ultimately lead to enhanced road safety for the driver and the pedestrians.

Above mentioned systems generally make use of sensors for collecting the en-vironmental information. CMOS (Complementary Metal Oxide Semiconductor) camera sensors are employed for LDWS (Lane Departure Warning Systems), in-frared sensors are employed for night vision, adaptive cruise control typically uses radar. Each application differs from the other. However the processing is usually a three stage process : data capture, pre-processing and post-processing.

In data capture stage, acquisition of the information for processing is per-formed. In the pre-processing stage functions are applied to full image and there-fore are data-intensive and regular in structure. These functions include transfor-mation of the image, signal and feature enhancement, noise reduction, and motion analysis. The post-processing stage consists of tracking features, interpretation of the scene, system control and decision-making.

1.1 Motive

For this project the display of the required scene is captured with a wafer-level camera pair. Using the image pairs stereo processing is performed to obtain the

(16)

4 Introduction

real depth of the objects in the scene. Image recognition is used to separate the object from the background. This ultimately leads to just concentrating on the object which in the present context is the driver.

1.2 Report Outline

The general structure of this thesis report is as follows.

In chapter 2, the basic theoretical concepts to understand the project are pre-sented.

Chapter 3 contains the system design and proposed algorithm for achieving the wafer level applications.

In chapter 4, the results are discussed. Chapter 5 concludes the thesis report.

(17)

Chapter 2

Theoretical Concepts

2.1 Wafer Level Camera (WLC)

The cost of the camera module has been decreasing rapidly due to increased de-mand for integration of the camera module into electronic devices. Some fields which require reliable, precisely integrated functionality and low cost digital cam-era modules are field of mobile electronics, medical applications, automotive field, security and so on.

For a digital camera, an image sensor is a crucial component used for image capture. It helps in converting an optical image into an electronic signal. A WLC consists of two main components : the image sensor and the camera optics.

Implementation of WLC technology [1] has lead to : wafer level packaging of an image sensor,

wafer level optics (WLO) and wafer level integration of optics.

2.1.1 Wafer Level Packaging of an Image Sensor

A set of requirements must be fulfilled to achieve wafer level packaging of an image sensor. First, to allow the light from the desired scene to reach the sensor, the package should have a window which is optically transparent. Secondly, the package has to efficiently protect the die from accumulation of dirt and dust on the areas which are optically sensitive. Compatibility of the package with both orientations of the image sensor i.e front-side illuminated and back-side illuminated is required.

Initial phase of processing employs wafer bonding technology for encapsulating and protecting the image sensor wafer by a glass wafer. Thinning of the sensor wafer is performed after bonding. Ball grid array and redistribution layer are set on the underlying part of the image sensor wafer. Finally the wafer scale testing

(18)

6 Theoretical Concepts

of the packaged sensor is performed. Figure 2.1 shows the shellcase MVP wafer level chip scale packaging (WLCSP).

Figure 2.1. Shellcase MVP (Micro Via Pad) cross section view ([1])

2.1.2 Wafer Level Optics

In general the lens optics used in mobile phone cameras are created by injection moulding techniques (moulded glass lenses and moulded plastic lenses). Typically moulded glass lenses are much better than plastic lenses in thermal performance, however the cost is higher. For reducing the costs, fabrication of thousands of lenses on a single glass wafer is possible using wafer level optics (WLO) technology. Monolithic integration of infrared cut filter and optical apertures is done into the glass substrate. A wafer stack is formed by bonding multiple lens wafers after the lens formation. Individual dies are formed from the stacked lens wafer, hence creating the WLO.

2.1.3 Wafer Level Integration of Optics

The camera module is created by the wafer level integration between the optical element and the packaged image sensor.

(19)

2.2 Noise in CMOS Image Sensors 7

Figure 2.2. Wafer-level camera fabrication process ([2])

2.2 Noise in CMOS Image Sensors

Generally CMOS designs are formed around APS (Active Pixel Sensor) technology. In APS technology, each imaging element or pixel contains a photodiode, three transistors for conversion of accumulated charge into measurable voltage, resetting the photodiode and transferring the voltage to a column bus (as shown in Figure 2.3).

A major issue in CMOS image sensors is high amount of noise. Some of the noise forms present in CMOS image sensors are photon shot noise, reset noise, dark current and thermal noise [13][14].

Photon shot noise is clearly evident in the captured images which appears in the form of a random pattern. This is caused by the temporal fluctuations of the output signal which in turn is due to the inconsistency in the illumination.

Dark current is due to the artifacts which produce electrons (signal charge), when there is no illumination. This type of noise is sensitive to temperature. It is hard to eliminate dark current.

(20)

Figure 2.3. Active Pixel Sensor (APS) photodiode ([13])

Lag can pave its way through the array if the reset condition is not fully achieved thereby increasing the reset noise.

The thermal noise in CMOS image sensors is generated due to the transis-tors, capacitors and the buses connected amid the photosensitive areas of array of photodiodes (pixels).

2.3 Computer Vision

Computer vision is a field which is helpful for analyzing and processing of digital images. It generally includes methods helpful for image processing. Computer vision is used for finding the 3D information from the given image data [5].

One of the major problem in computer vision is stereo matching. Estimation of depth from a pair of images to restore 3D model of the corresponding object is known as stereo matching.

2.3.1 Basics of Stereo Vision

The meaning of ‘stereo’ is to have depth or three dimensions. An environment created by two inputs, which are combined to create a single unified perception of 3D space is referred to as stereo.

Stereo-vision depends on the phenomena of parallax [3]. According to this phenomena when a common object is observed from two locations along the same baseline, the object seems to have some displacement. This can be easily felt just

(21)

2.3 Computer Vision 9

by observing a object before you with the right eye closed and then observing the same object with the left eye closed. A change in the object position can be noticed. This effect is due to the change in the observation angle of eye for the object. This apparent position shift of the object is dependent on the distance from the baseline of the observation.

Similarly, two cameras mounted on a common baseline may be used to observe a common target. The possible displacement of the object in both the images i.e right camera image and left camera image is useful for computing the distance of the target from the baseline. This displacement is generally referred to as disparity. The setup for a stereo observation can be seen in the Figure 2.4.

(22)

Consider that the coordinates (x,y,z) represent a point on an object [4]. Let (x0r,yr0) and (x0l,y0l) be the coordinates in the image plane of the respective

cam-eras and ‘f’ is the focal length of the camcam-eras. By the property of similar triangles ,

x0_l f = x + d/2 z (2.1) x0_r f = x − d/2 z (2.2) y_l0 f = y_r0 f = y z (2.3)

Solving for (x, y, z) gives

x = d (x 0 l+ x0r) 2 (x0_l− x0 r) (2.4) y = d (y 0 l+ yr0) 2 (x0 l− x0r) (2.5) z = df (x0_l− x0 r) (2.6) The difference (x0_l− x0r) is known as disparity. The disparity value for an

object which is near to the camera pair is larger when compared to an object which is far away from the camera pair. Disparity is inversely proportional to the corresponding scene depth.

The baseline d is a known quantity, the focal length is also a known value for a given camera and disparity can be measured in pixels from the image plane coordinates. From these values z can be computed which is the distance of the target from the baseline.

From the image pair in Figure 2.5, the differences in the images are clearly evident. These differential values ultimately give the 3D depth or the real distance.

2.3.2 Stereo Matching

Stereo correspondence, or matching is an important aspect in stereo vision. Stereo matching is used to determine the position, in both the left and the right image, of the projections of the same point in space.

Stereo correspondence is a difficult task. One of the factor which makes it difficult is the partial occlusions of the 3D geometry where a part of one image is not visible in other image. Homogeneous regions and textured regions are other problems making stereo correspondence difficult [3]. Homogeneous areas pose a problem because sufficient amount of information is not available for establishing unique correspondence values. Matching of individual pixel is uncertain when one pixel-by-pixel dense correspondence is required. Textured areas have similar effect.

(23)

2.3 Computer Vision 11

Figure 2.5. A scene showcasing the stereo view of two cameras mounted on a common

baseline

2.3.3 Semi Global Matching (SGM)

For stereo matching, the SGM algorithm has been used in this thesis. SGM com-bines both the local and global stereo methods for pixelwise matching [6][7]. SGM makes use of Mutual Information and the global smoothness constraint to perform pixelwise matching.

The algorithm comprises of distinctive processing steps considering a couple of images with known extrinsic and intrinsic orientation.

Pixelwise cost calculation

The imaging characteristics of the camera such as vignetting effect, different ex-posure times and the attributes of the scene like change of the light source can produce radiometric differences. The matching cost which is the measure of the dissimilarity between the corresponding pixels has to deal with radiometric differ-ences. For this algorithm MI (Mutual Information) is used as matching cost.

Pathwise Aggregation

Sometimes the pixelwise cost calculation may lead to erroneous lower cost due to noise. An additional constraint that supports smoothness which penalizes changes in the neighbouring disparities is added. The energy that is dependent on the disparity image is defined by the pixelwise cost and the smoothness constraint.

The energy comprises of three terms: photo-consistency defined by a data term, a smoothness energy term for surfaces which are slanted that effect the disparity slightly and an energy term which represents depth discontinuities. Obtaining a disparity map that can minimize the energy is the aim of SGM.

(24)

Disparity Calculation

For every pixel, disparity with the minimum cost is selected for determining the disparity map of the corresponding base image. This is generally applied in local stereo methods.

Multi-Baseline Matching

From the correspondences between the images i.e both the base image and match image, pixelwise matching cost can be calculated. This is referred to as multi-baseline matching which is an extension of the algorithm.

Complexity and Implementation

To reduce the complexity for larger images and various disparity ranges, the input image is split into tiles. These tiles can processed individually.

2.4 Basics of Camera Calibration

Camera calibration is another crucial task in computer vision. Its main purpose is to determine the relation between 3D world coordinates of a point and 2D coordinates of its respective image in the camera. Estimating the intrinsic and extrinsic parameters of the camera is referred to as ‘Camera Calibration’ [8].

Let m = [u v]T represent a 2D point and M = [X Y Z]T represent a 3D point. Let ˆm = [u v 1]T and ˆM = [X Y Z 1]T represent the augmented vector [8]. A camera is modelled as a pin-hole model. The relation between a M (3D point) and its m (2D image projection) is given as

s ˆm = A [R t] ˆM (2.7) s   u v 1  = A [R t]     X Y Z 1     (2.8)

where ‘s’ is the scale factor.

[R t] are the camera extrinsic parameters. R,t denotes the rotation and trans-lation which relate the 3D world coordinates to the camera coordinate system.

A =   α γ u0 0 β v0 0 0 1   (2.9)

(25)

2.5 Image Processing and Image Thresholding 13

α and β are the scale factors in image axes and γ is represents the skew

coef-ficient of the two image axes.

Figure 2.6. One of the orientation of the patterned grid

Experimental techniques for camera calibration requires photograph of an ob-ject with known geometry using the camera needed to be calibrated. The basic idea is to estimate the correspondences between the 3D coordinate points with known distance and the image coordinates in the camera. Every world and pixel coordinate pair results in an equation in relation to the camera parameters. The set of equations is solved for the best solution.

A pattern of known geometry is used for camera calibration. The patterned grid is placed before the camera pair in various orientations and pictures are taken (as shown in Figure 2.6). Each photographed orientation gives an equation for the camera parameters. The relation between a specific point on the patterned grid and the respective coordinates of its image in the camera are described by the equation. This is done to estimate the intrinsic and extrinsic parameters of the camera.

2.5 Image Processing and Image Thresholding

Image processing refers to digital conversion of an image and performing operations on it. This is generally done to enhance the image or for extracting valuable information from it.

Image segmentation is the process of clustering pixels into unique image regions which represent individual surfaces or objects. Thresholding is one of simplest form of image segmentation based on pixel intensity.

(26)

Thresholding can be used to obtain binary images from grayscale images [9]. This process involves classifying an image pixel into either ‘object’ or ‘background’. If the intensity of the pixel is greater than some threshold then it is categorized as ‘object’ pixel otherwise it is a ‘background’ pixel.

Let ‘T’ be a global threshold, then t (x, y) the thresholded image of f (x, y) is defined by the below equation.

t(x, y) =

1 if f (x, y) ≥ T ; 0 if f (x, y) < T .

Pixels having a value of ‘1’ represent the object and pixels having a value of ‘0’ represent background.

Thresholding with a constant threshold is referred to as global thresholding. On the other hand dynamically changing the threshold value over the image con-tours for thresholding is referred to as adaptive thresholding. Generally adaptive thresholding is useful when the image or the scene is non-uniformly illuminated [10].

2.6 Median Filtering

Figure 2.7. Median filtering example ([11])

Median filtering technique is widely used in digital image processing. It is a nonlinear digital filtering method helpful in reducing the noise in an image [12].

As shown in Figure 2.7, the pixels in the neighbourhood are ranked in accor-dance with their intensity. Then the central pixel value is replaced with the value of the median. This type of median filtering helps to remove impulse or shot noise wherein some pixels tend to have extreme values.

(27)

Chapter 3

System Design and

Proposed Algorithm

3.1 System Design

Figure 3.1. Block diagram

A prototype of a WLC along with the Aptina MT9M114 stereo board has been used for this project. Figure 3.1 shows the block diagram of the experimental setup. Aptina ’s MT9M114 is a CMOS digital image sensor with an active-pixel array

(28)

16 System Design and Proposed Algorithm

of 1.26Mp [15]. It supports several camera functions like automatic white balance, exposure control, black level control, avoidance of flicker and so on.

The advanced camera system is integrated into the digital image sensor. The camera system consists of a microcontroller (MCU), a image flow processor (IFP), mobile industry processor interface(MIPI) and parallel output ports. The camera system functions are managed by the microcontroller. The MCU sets the param-eters for the sensor core for optimizing the raw image data which enters the IFP. The IFP take cares of the image processing and enhancing.

Figure 3.2. MT9M114 block diagram ([17])

MT9M114 can be operated in default mode where the output image size is 720p at 30 fps. Using the parallel port (output), it outputs 8-bit data. MT9M114 can also be programmed for exposure, frame size and other parameters.

Internal registers of the sensor core and the IFP can be controlled by the user. In normal operation mode, most of the operations are managed by the integrated microcontroller. The parallel or MIPI interface is used for transmitting the processed image data to the host system.

(29)

3.1 System Design 17

Sensor Core

The color image sensor of MT9M114 has a Bayer color filter setup and active-pixel array of 1.2Mp with Electronic Rolling Shutter (ERS). The sensor core which is a progressive scan sensor generates pixel data at a steady frame rate. Readout from the sensor core is 10 bits for every pixel. The sensor core output is a Bayer pattern (shown in Figure 3.3) where alternate rows are either a chain of green and red pixels or blue and green pixels.

Figure 3.3. Bayer pattern ([16])

Image Flow Processor and Microcontroller Unit

The image sensor performance is optimized and enhanced with IFP. The IFP hardware logic executes the image control processing part of the MT9M114. Built-in algorithms such as black level conditionBuilt-ing, defect correction, edge detection, aperture correction, image formatting and so on enable the operation of MT9M114 as an adaptable and fully automatic SOC for various camera systems.

An internal bus interface is used by the microcontroller for its communication with other functional blocks. The registers in both the sensor core and the IFP are configured by the MCU.

System Control and Output Interface

The MT9M114 contains a PLL oscillator which generates the internal sensor clock from the system clock. The PLL can adjust the input clock frequency which allows the MT9M114 to be run at any desired frame rate and resolution.

Power-conserving features like soft standby mode are provided by the MT9M114. The read and write access to the internal registers and variables of the MT9M114

(30)

is enabled by a 2-wire serial bus interface. The sensor core, the output interface and the color pipeline are controlled by the internal registers. The variables po-sitioned in the MCU RAM memory configure and control the algorithms and the camera control functions.

Raw or processed data can be selected by the output interface block. A 8-bit parallel port or a serial MIPI port can be used to output the image data to the host system. The output parallel port provides either 10-bit Bayer data or 8-bit RGB data.

3.2 Wafer-Level Applications - Display Control,

Simple Stereo Processing, Simple Image

Recog-nition

A wafer level camera (WLC) pair is used for image capture of the required scene. Calibration of the camera pair is performed by taking photographs of a scene with different orientations as explained earlier in section 2.4. Once the calibration of the camera pair is done, the next step is to further proceed with the stereo processing. Stereo processing of the image pairs is done to obtain the disparity values. These values are useful for obtaining the real depth of the objects in the scene. The disparity values obtained from stereo processing are used for image thresholding. Image thresholding has been used for object detection in this project. Generally image thresholding is performed to convert a grayscale image to binary image. In present context, the disparity values are used to calculate the threshold limit applied for thresholding. As the illumination of the scene is uneven, adaptive thresholding has been used.

The algorithm for performing thresholding has been shown in Figure 3.4. This ultimately leads to object detection.

Images are captured with the WLC camera pair. Stereo processing of the images is done using the stereo algorithm SGM (Semi Global Matching). The disparity values are obtained which can be used for the threshold calculation as explained below.

The threshold value is calculated for every column in the image. For a 640×480 image, 640 threshold values are calculated and columnwise thresholding is applied for that particular image.

It is calculated from the columnwise mean, variance and standard deviation of the disparity values. Performing thresholding dynamically helps to take into account the spatial variations in the illumination of the scene.

The following equations show the calculation of the threshold. Here xn

(31)

3.2 Wafer-Level Applications - Display Control, Simple Stereo

Process-ing, Simple Image Recognition 19

Figure 3.4. Thresholding algorithm

m = 1 N N X i=1 xn (3.1) v = 1 N N X i=1 (xn− m) 2 (3.2) v = 1 N N X i=1 x2_n− 1 N N X i=1 xn !2 (3.3)

(32)

σ =√v (3.4)

t = mn+ f actor ∗ σn (3.5)

The output pixel value is either set to ‘0’ or ‘256’, depending on the corre-sponding disparity value. If the disparity value at a particular pixel is above the calculated column threshold, then the output pixel is set to ‘256’ or else set to ‘0’. Thresholding is applied to every image, however threshold calculation is done for every 10 images. Information (disparity) is gathered for every 10 images and new threshold value is calculated. Only when the object is found i.e there is reasonable amount of pixels which are valid or above the threshold value, threshold calculation is stopped. When there is no object, threshold calculation starts again. The calculated stereo constant ( k = bf/x ) for the given camera setup is 125.79.

r = bf

N x (3.6)

where

‘r’ - range of the object, ‘b’ - baseline,

‘f’ - focal length of the image sensor, ‘x’ - pixel size of the image sensor, ‘N’ - maximum disparity value.

For the camera module used in this project the values are : b = 20 cms, f = 2.38 mm, x = 1.9 µm.

For the present camera setup, the minimum distance from the camera for stereo visualization is 0.99 m (nearly 1 m from the camera) and the maximum distance is 111.81 m.

(33)

Chapter 4

Results

The WLC camera pair is calibrated before proceeding with stereo processing. The disparity map shown in Figure 4.2 is evaluated using SGM. The stereo processing of the images is done at the rate of approx. 7 fps (frames per second). Left camera image is the reference for the disparity map. The colors encode the disparity values from green (far) to red (close).

A value of zero disparity means the object is too far from the camera. A negative value of disparity represents the occlusions (an object seen by one camera is not visible to the other camera). One of the reason for holes in the disparity map are the occlusions. When the images in Figure 4.1 are examined it is evident that on both the left and right borders, there is an inadequacy of information leading to disparity holes.

Stereo processing becomes difficult for structure less surfaces or plain surfaces (the cupboard in the left camera image of Figure 4.1). This is easily visible from the above disparity map (shown in Figure 4.2). For this reason the wall was given some structure to avoid holes in the disparity map. Dark areas in the scene should be illuminated well. Dark areas tend to have a higher disparity value irrespective of the distance from the camera.

As explained earlier image thresholding is used for object detection. The Fig-ures 4.3, 4.4, 4.5 show the object detection. The images in the upper portion represent the left and right camera images. The images in the lower portion rep-resent the respective disparity map (bottom left) and the respective thresholded image (bottom right).

The threshold calculation is performed for every 10 images due to the following factors. The stereo processing of the images is done at the rate of approx. 7 fps and also to have enough disparity information when there is a movement of the object.

The disparity information gathered from the ten images is used for threshold calculation and the threshold value is set (threshold calculation explained in section

(34)

22 Results

Figure 4.1. Left and right camera image

(35)

23

Figure 4.3. Left camera image, Right camera image, Disparity map, Binary thresholded

image

(36)

24 Results

Figure 4.5. Object detected

3.2). Different cases have been explained below.

For the first ten images, the threshold value is set to 256 (maximum value for the binary image) and thresholding is performed. However the disparity informa-tion is gathered for threshold calculainforma-tion of the eleventh image and so on. This case is shown in Figure 4.3.

The threshold calculation is done columnwise which accounts for the variations in the illumination of the scene. When there is no object in the scene as shown in Figure 4.4, the output (bottom right) shows very few pixels above the threshold value. These suggest that there are some abnormal disparity values which are explained later.

When there is an object in the scene, the main aim is to distinguish the object from the background. A new value of the threshold is calculated. The number of pixels above the threshold are more compared to the above cases (shown in Figure 4.5).

Object recognition is implemented (shown in Figure 4.5), however it needs some improvement. During threshold calculations some abnormal disparity values have been observed. Some of the reasons for abnormal values are occlusions, reflections at the glass surfaces, structure-less surfaces and dark areas.

Mean and threshold values have been plotted to observe the effect of dark areas on stereo disparity. Figures 4.6 and 4.7 are the test images for which the

(37)

25

Figure 4.6. Image with dark areas

(38)

26 Results

corresponding mean and threshold values have been plotted. Disparity mean and threshold values of the test images are shown in the Figures 4.8, 4.9, 4.10, 4.11.

As discussed earlier, disparity is inversely proportional to distance from the camera pair. However this is not the case when there are dark areas in the scene. This is evident from columns 200 to 300 in Figure 4.8. The disparity at dark regions (columns 200-300) is a high value suggesting that the region is very near to the camera.

When some dark surfaces were removed from the scene, the mean plot showed improvement which is shown in Figure 4.9. The distance of the objects in the scene was valid, objects near to camera pair had higher disparity compared to the objects far away.

The plots in Figure 4.10 and 4.11 show the threshold values for both the cases with dark areas and without dark areas. The threshold values must be even for the objects which are at an equal distance from the camera pair. The columns 200-300 in Figure 4.10 show peak values compared to Figure 4.11, even though when the objects in the scene are equally spaced from the camera pair. The threshold values in Figure 4.11 produce better differentiation of the object from the background.

(39)

27

Figure 4.9. Disparity mean with some dark areas

(40)

28 Results

Figure 4.11. Threshold values with some dark areas

This issue can be compensated for using a strong illumination source. For this purpose ‘infrared’ illumination can be used inside the car.

4.1 Median Filtering

Median filtering was applied to further improve the thresholded output image. Every two images were added and then the median filter was applied. As the thresholded output is a binary image, the following rule was applied for median filtering. For a 4×4 window values, the count of pixels with a ‘0’ value and a value of ‘256’ is calculated. If 0’s are greater in number than 256’s, the complete window is replaced by 0 or else by 256. If both are in equal number, the window remains the same.

A median filter of size 4×4 which slides through the columns and jumping over the rows was implemented. From thresholded image in Figure 4.12 (bottom right), it is evident that number of 0’s is greater than number of 256’s in the upper rows. When the median filter slides through the rows, the output is a blank image. For this reason, sliding through the rows is not preferred.

Figure 4.12 shows the output thresholded image after the application of the 4×4 median filter (top right) to the thresholded image at the bottom right.

(41)

4.2 Temporal Noise in CMOS Image Sensors 29

Figure 4.12. Output image after median filtering (top right)

This clearly shows some improvement of the output thresholded image. It is possible to ignore some unnecessary pixel values.

4.2 Temporal Noise in CMOS Image Sensors

Temporal noise refers to time-dependent/temporal variations in the pixel output values under a constant level of illumination. Signal averaging can be done to reduce temporal noise.

Temporal noise tends to increase when the illumination is stronger. However the noise results were different with the prototype. Lowlight image (minimum amount of illumination) was more noisy compared to illuminated image (maximum amount of illumination) as shown in the Figures 4.13 and 4.14. For this experiment 250 images were used to estimate the noise effects.

The SNR (Signal to Noise Ratio) value is given by ratio of mean to standard deviation of a signal (pixel intensity),

SN R = µ

(42)

30 Results

Figure 4.13. SNR image illuminated scene

(43)

4.2 Temporal Noise in CMOS Image Sensors 31

The plots in Figures 4.15 and 4.16 represent the mean versus standard deviation (calculated from pixel intensity values of 250 images) of the illuminated scene and the lowlight scene. The plots show that values increase linearly only for a short period. This suggests that dark current noise is more pronounced in the present prototype. Dark current noise is difficult to eliminate.

(44)

32 Results

(45)

Chapter 5

Conclusions

• A prototype of a WLC along with the Aptina MT9M114 stereo board has been used for performing the wafer-level applications.

• Stereo processing of the image pairs captured with WLC pair has been per-formed. The generated disparity maps have been examined.

• The system design, the proposed algorithm and the results have been pre-sented.

• Object detection has been implemented using image thresholding. Image thresholding has been applied columnwise, to account for the spatial varia-tions of the illumination. To improve the object detection a median filter of size 4×4 has been used. The necessity of a proper illumination of the scene has been discussed.

• The noise effects on the images have been discussed. The dark current noise was more pronounced in this setup which cannot be eliminated.

(46)

(47)

Chapter 6

Future Work

• Further improvement in image thresholding and median filtering used for this experimental setup can be achieved.

• The system can be further improved to identify simple gestures.

• Optical flow can be used for recognizing the movements of an object such as moving forward or backward.

(48)

(49)

Bibliography

[1] Hongtao Han, Moshe Kriman, Mark Boomgarden. Wafer Level Camera Tech-nology - from Wafer Level Packaging to Wafer Level Integration. In Inter-national Conference on Electronic Packaging Technology and High Density Packaging (ICEPT-HDP), pages 121 - 124, 2010. (document), 2.1, 2.1 [2] Photonics. http://www.photonics.com/Article.aspx?AID=30459, viewed in

September 2012. (document), 2.2

[3] Alan C. Bovik. Handbook of Image and Video Processing, 2005. 2.3.1, 2.3.2 [4] A.D.Marshall, R.R.Martin. Computer Vision, Models, and Inspection, 1992.

(document), 2.4, 2.3.1

[5] E.R.Davies. Computer and Machine Vision: Theory, Algorithms, Practicali-ties, Fourth Edition, 2012. 2.3

[6] H. Hirschmuller. Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 807 - 814, 2005. 2.3.3

[7] S.K.Gehrig, C.Rabe. Real Time Semi-Global Matching on the CPU. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 85 - 92, 2010. 2.3.3

[8] Z. Zhang, A flexible new technique for camera calibration. In IEEE Transac-tions on Pattern Analysis and Machine Intelligence, pages 1330 - 1334, 2000. 2.4

[9] R.C.Gonzalez, R.E.Woods, S.L.Eddins. Digital Image Processing Using Mat-lab, 2004. 2.5

[10] D.Bradley, G.Roth. Adaptive Thresholding Using the Integral Image. In Jour-nal of Graphics, GPU, and Game Tools, pages 13 - 21, 2007. 2.5

[11] http://www.scimedia.com/fis/support/download/bva/ver0703/RelNote-V0703.html, viewed in September 2012. (document), 2.7

[12] http://medim.sth.kth.se/6l2872/F/F7-1.pdf, viewed in September 2012. 2.6

(50)

38 Bibliography

[13] Olympus. http://www.olympusmicro.com/primer/digitalimaging/cmosima gesensors.html, viewed in July 2012. (document), 2.2, 2.3

[14] Imatest. http://www.imatest.com/docs/noise/, viewed in July 2012. 2.2 [15] MT9M114: 1/6 Inch 720p High-Definition (HD) System-on-Chip (SOC)

Dig-ital Image Sensor datasheet. 3.1

[16] http://scien.stanford.edu/pages/labsite/2007/psych221/projects/07/demos aicing/introduction.htm, viewed in September 2012. (document), 3.3 [17] http://www.aptina.com/products/soc/mt9m114/, viewed in December 2012.

Programming of Microcontroller and/or FPGA for Wafer-Level Applications - Display Control, Simple Stereo Processing, Simple Image Recognition

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Programming of Microcontroller and/or FPGA for

Wafer-Level Applications - Display Control, Simple

Stereo Processing, Simple Image Recognition

Programming of Microcontroller and/or FPGA for

Wafer-Level Applications - Display Control, Simple

Stereo Processing, Simple Image Recognition

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan i Linköping

av

Abstract

Acknowledgments

Contents

List of Figures

Chapter 1

Introduction

1.1

Motive

1.2

Report Outline

Chapter 2

Theoretical Concepts

2.1

Wafer Level Camera (WLC)

2.1.1

Wafer Level Packaging of an Image Sensor

2.1.2

Wafer Level Optics

2.1.3

Wafer Level Integration of Optics

2.2

Noise in CMOS Image Sensors

2.3

Computer Vision

2.3.1

Basics of Stereo Vision

2.3.2

Stereo Matching

2.3.3

Semi Global Matching (SGM)

Pixelwise cost calculation

Pathwise Aggregation

Disparity Calculation

Multi-Baseline Matching

Complexity and Implementation

2.4

Basics of Camera Calibration

2.5

Image Processing and Image Thresholding

2.6

Median Filtering

Chapter 3

System Design and

Proposed Algorithm

3.1

System Design

Sensor Core

Image Flow Processor and Microcontroller Unit

System Control and Output Interface

3.2

Wafer-Level Applications - Display Control,

Simple Stereo Processing, Simple Image

Recog-nition

Chapter 4

Results

4.1

Median Filtering

4.2

Temporal Noise in CMOS Image Sensors

Chapter 5

Conclusions

Chapter 6

Future Work

Bibliography