Towards hardware accelerated rectification of high speed stereo image streams

(1)

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science with Specialization in

Embedded Systems - 30.0 credits

TOWARDS HARDWARE

ACCELERATED RECTIFICATION OF

HIGH SPEED STEREO IMAGE

STREAMS

Sudhangathan Bankarusamy

sby14001@student.mdh.se

Examiner: Mikael Ekstr¨

om

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Carl Ahlberg

M¨

alardalen University, V¨

aster˚

as, Sweden

(2)

Abstract

The process of combining two views of a scene in order to obtain depth information is called stereo vision. When the same is done using a computer it is then called computer stereo vision. Stereo vision is used in robotic application where depth of an object plays a role. Two cameras mounted on a rig is called a stereo camera system. Such a system is able to capture two views and enable robotic application to use the depth information to complete tasks. Anomalies are bound to occur in such a stereo rig, when both the cameras are not parallel to each other. Mounting of the cameras on a rig accurately has physical alignment limitations. Images taken from such a rig has inaccurate depth information and has to be rectified. Therefore rectification is a pre-requisite to computer stereo vision. One such a stereo rig used in this thesis is the GIMME2 stereo camera system. The system has two 10 mega-pixel cameras with on-board FPGA, RAM, processor running Linux operating system, multiple Ethernet ports and an SD card feature amongst others. Stereo rectification on memory constrained hardware is a challenging task as the process itself requires both the images to be stored in the memory. The FPGA on the GIMME2 systems must be used in order to achieve the best possible speed. Programming a system that does not have a display and for used for a specific purpose is called embedded programming. The purpose of this system is distance estimation and working with such a system falls in the Embedded Systems program. This thesis presents a method that makes rectification a step ahead for this particular system. The functionality of the algorithm is shown in MATLAB and using VHDL and is compared to available tools and systems.

(3)

1 Introduction

In stereo vision systems, extracting depth information accurately is a common and challenging task. Stereo vision has many applications in the autonomous domain. This includes navigation, object detection, surveillance, medical applications, virtual reality and so on. An important preprocessing step to stereo matching is rectification. Rectification consists of aligning the image points in both the left and right images to a common global plane. The geometry of stereo vision is called epipolar geometry. Figure1represents the the most important terminologies of epipolar geometry. The lines that are formed by the intersection of the two camera image projection planes with the epipolar planes are called the epipolar lines. Figure2 shows image rectification terminologies. In stereo rectification the images are transformed so that epipolar lines are merged along with horizontal scan lines of the image.

Rectified images are easier to be processed by the stereo vision applications [1, 2], than the unrectified images, since they will have to only search laterally to find the matching points to estimate the disparity, based on which distance to the object can be calculated. The distance formula based on the disparity is:

Z = f.B

d (1)

where Z is the distance from base line along the camera axis in meters, f is the focal length in meters, B distance between the two cameras in meters and d disparity in meters.

In stereo camera systems, sources of errors arise due to the fixed parameter design of the geometry of the cameras. Parameters such as rotation of the cameras, centers of the lenses, focal lengths give rise to errors. Good stereo vision becomes increasingly difficult if multiple sources of errors are to be handled by software applications. In modern systems where FPGAs (Field Programmable Gate Array) are included within the system, it is a natural solution to implement the most processing intensive and crucial tasks in FPGA itself, if possible, before passing the rectified images to the applications.

FPGAs are re-programmable silicon chips. Programmable routing resources and pre-built logic blocks are used with such FPGAs. Implementation of custom hardware functionality can be done without having to code, write instructions or without rigging up hard wired circuits. An FPGA can be configured by the means of writing software and compiling them to bitstream which has information on how the wiring of the components in the FPGA should be done. FPGAs are fully re-configurable and take up a new form of circuitry based on the purpose. FPGAs are parallel in nature, unlike processors, meaning that tasks do not have to compete for resources. Each task is assigned a dedicated part of the chip and therefore can work without any interference from other blocks. In view of these features, FPGA adoption is on the rise across all industries. In this thesis FPGA becomes important so as to maintain very low latency between the flow of input and output pixels.

The undistortion process is a crucial step in stereo image rectification, especially in case where the fish eye lenses are used or the distortion is high. Figure 4 show an idea of distorted and undistorted images. Distortion parameters like skew, rotation, radial and tangential effects are considered in the undistortion process. The GIMME2 [3] board, which will be used in this thesis, uses low distortion lenses, which means there is less movement of pixel between the input and output images and therefore reducing complexity in handling large data.

Working on FPGA based systems and programming a system, made for specific purpose, with-out a display, fits to be termed as an embedded system. The GIMME2 board has no on-board display and the system purpose is stereo vision and distance estimation. Therefore the GIMME2 board is an embedded system. The programming of such a system requires that memory, speed, timing issues and utilization of logical resources are part of the design, but speed and timing are not considered here as it is beyond the scope of this thesis. A system design with these characteristics fits into the Embedded Systems program.

(5)

Figure 1: Epipolar Geometry [4]

1.1 Camera parameters

When dealing with epipolar geometry, or the geometry of stereo vision, camera parameters which mathematically define the camera behaviour and the position of camera with respect to the outside world, becomes very important. The variables which define the camera behaviour is called the intrinsic parameters. The variables that define the position of the camera and the direction of the view is called the extrinsic parameters. These parameters are explained below.

Intrinsic Parameters: K =   ax s u0 0 ay v0 0 0 1   (2)

The intrinsic matrix, K (2), has five parameters which describes how the camera captures the images. The axand ay variables represent the focal point in terms of pixels. Variable s, is the skew

factor that relates the length and breadth of the pixels on the camera’s capture plane. Variables u0 and v0 indicates the position, in terms of pixels, of the principle focal point on the captured

image, which is ideally the center of the image.

Extrinsic Parameters:

The position of the camera with respect to the world is described by R, the rotation matrix, and T, the translation matrix. The two matrices together indicates the direction and location of one camera. Matrix (3) represents the transition matrix which can be used to obtain camera co-ordinates from world co-ordinates. Equation4shows the mathematical expression for doing the same. R T =   r1,1 r1,2 r1,3 t1 r2,1 r2,2 r2,3 t2 r3,1 r3,2 r3,3 t3   (3)   u v 1  = KR T     U V W 1     (4)

(6)

Figure 2: Projections of rectified images [1]

Where, U, V, W represents the 3D world ordinates and u, v represents the 2D camera co-ordinates. For image rectification and depth estimation purposes the world co-ordinates serves no purpose. Unit projection can be considered and that the position of the camera is also assumed to be at the origin. Therefore with T = [ 0 ], and W = 1, equation4becomes as shown in equation5.

  uu vu 1  = KR   Uc Vc 1   (5)

The variables Uc and Vc, are now the 2D co-ordinates of the picture taken by the camera.

Varibales uu and vu are the undistorted image co-ordinates. The intrinsic and extrinsic

parame-ters are called the calibration parameparame-ters. Obtaining the calibration parameparame-ters using a checker board and rectification is explained in section4. A matrix that transforms an image is called a homography matrix, H. Matrix H is also known as the calibration matrix. In the above case, H = K [R]. Therefore equation5 becomes as shown in

  uu vu 1  = H   Uc Vc 1   (6)

From epipolar geometry, matrix F is called the fundamental matrix. The fundamental matrix relates two corresponding points from left and right images. If x and x’ are points from left and right images, then they are related as:

x0TF x = 0 (7)

An extension of the fundamental matrix is the essential matrix. Essential matrix (E), relates two calibrated cameras as:

E = K0T.F.K (8)

where K and K’ are the intrinsic matrices of the two calibrated cameras.

In the next section the problem aimed to be solved is explained. Followed by the related work, methods and discussion section which will explain a few methods from the literature. A most suitable method is also explained along with some potential modifications to enhance the implementation and to handle the high resolution images used. Then the project details are

(7)

explained in two parts followed by suggestions for further implementation. Later there is a section explaining outcomes of work done and then ended by conclusion.

(8)

2 Problem Formulation

In vision related works there are multiple problems which depend on the application and the processing platforms. The issues can be latency, handling high volumes of data, memory shortage, accuracy, frequency of operation, resolution of the sensors, designing of algorithms and optimizing for the platforms and so on. The focus of this thesis is the designing of algorithms and handling huge volumes of data. The author in [2], states that there is a ’super-linear’ increase in processing complexity as the resolution increases. It should be noted that the two image sensors(Micron Aptina sensors MT9J003) on board the GIMME2 can deliver upto 15 fps 10 mega-pixel images.

The GIMME2 board can be used for efficient video processing applications if the sensor in-formation is pre-processed. Pre-processing includes rectification(including undistortion), colour correction etc.. For the purpose of calculating distance to objects, undistortion and rectification are important steps to be done. The GIMME2 board has two cameras for stereo vision purpose. The volume of data to be handled is high when compared to other similar rectification implementa-tions. The related work in section3covers more on resolution of cameras in other implementations using an FPGA. Currently the GIMME2 board does not have any rectification algorithms in place and thus performs poorly. The applications used in this system assumes that the raw images are aligned correctly. This introduces errors in distance calculation applications, thereby deteriorating the value of the results produced. Therefore giving rise to the need of rectification on this platform.

The major tasks would be:

• Familiarizing stereo geometry key concepts and implementing in MATLAB

• Coding in VHDL and estimating the handling capacity of the proposed algorithm with respect to the hardware

3 Related Work

The work of rectification itself is not new. The paper which concentrates more on rectification appears in the year 1996 [5], which uses warped image sampling. Moreover implementation of rectification using FPGAs too started atleast 12 years back, wherein the authors of [6] use one FPGA and rectify stereo images of 640 x 480 images at 30 fps. All literature found, implements only upto HD resolution (1080p) or lower [1,2,7].

In [8, 9] look-up table based rectification systems are implemented, where large data of offline rectified image co-ordinates are stored in external memories and accessed when needed. These systems typically need large external RAMs for implementation. Specifically [8] uses two extra 64 MB SDRAMs, one for storing the co-ordinates and the other for the rectified image to be stored. This system rectifies 640 x 512, 8-bit gray-level pixels reaching about 85 fps, they use bilinear interpolation. Bilinear interpolation is a way in which required missing pixel value can be found by evaluating a weighted average of all four surrounding known pixels. [9] does a similar work and reaches 35 fps for 1024 x 1024 stereo images rectification. Although GIMME2 has an on-board SDRAM, utilizing an external RAM component along with 20 mega pixels will introduce additional undesired latency. Though this could be the only other solution, it is not explored in this thesis as there are basic challenges remaining.

There are also implementations based on compressed look-up tables(CLUT). The CLUT method is used in cases where look-up table method is used instead of equation based method. Here the table is made to occupy less memory space, either by ignoring pixel location that have simillar value in the input and output images or by using piece-wise characteristics. This method works well especially in cases where the distortion is less, and piece-wise characteristic can be adopted, thereby reducing the look-ups. In piece-wise method the images is split into many parts, for exam-ple, low, medium and high distortion areas. For the low distortion areas less image points would be considered and high distortion areas more pixel point or all pixel points would be considered. This inturn reduces the memory requirement and increases the speed of operation. In [10] the look-up coordinates are iteratively reduced so that a threshold depth estimation error is met. By doing so, the requirement of external memory was removed. This method achieved 30 fps for a 1024 x 768 resolution.

(9)

[11] arrives at equations for rectification of stereo images from the fundamental matrix. The fundamental matrix F, is a 3 x 3 matrix which relates two views of a scene. Two points from the same scene will genarate an equation so as to xT_.F.x0 _{= 0. Where x and x’ are the same}

point co-ordinates from two scenes/pictures. Using seven or more such equations, the fundamental matrix for all such simillar scenes can be estimated. Note that camera parameters are not required to estimate the fundamental matrix. Equations for mapping of the projections onto the rectified image plane has been shown in the article [11], from which inverse look ups can be done. This method has been widely used in the literature [1,2,12].

The work in [1] uses MATLAB Camera Calibration toolbox to get the calibration parameters which includes undistortion and rectification. The work is done in such a way that no external RAM is necessary for the resolution of 1280 x 720 (0.9 MP, 120 fps) that has been considered. A number of pixel rows, as suggested by the MATLAB toolbox are continuously remapped on the fly.

The implementation in [2] is a complete image processing pipeline which takes into consid-eration colour correction, radial undistortion, rectification and finally disparity estimation. The rectification is done by using homography matrices. These matrices are 3 x 3 and does projective transformation on the coordinates of the images. Once the co-ordinates have been transformed the pixels can looked up from the raw image buffers. They propose warping arithmetic which they implement using the FPGA, that is based on the fundametal matrix. The work in [7] has performed rectification for a 720p at 45 fps stereo images.

In an other implementation [13], rectification is done without epipolar geometry. The author in this work makes use of a known form or pattern in the pair of rectified stereo images. Though the epipolar geometry is not used in evaluating the required projections, matrix calculations are still used.

The literature study has shown that rectification has been done in different ways. It can be summarized that two methods are popularly used, they are, look-up table based or also called memory mapped implementations, while the other being from the fundamental matrix which is derived from the epipolar geometry. In the look-up table based method the offline computation can use any technique available, therefore the algorithm is of least importance. In the case of on-the-fly computation the algorithm must be given due importance as the design will affect the quality of implementation on the FPGA.

3.1 Advantages and Disadvantages

From a view point of FPGAs a few advantages and disadvantages can be seen. Which will further help in the understanding and thereby select the best method to improve on. Here the two popular methods from literature are considered, namely the look-up table and the epipolar geometry based matrix equations.

Look-up table based Advantages:

• No computation complexities

• Compressed LUT possible, with some processing overhead Disadvantages:

• High memory usage, use of external RAM unavoidable • Not general implementation, varies between each board

• High data transfer bandwidth, which might increase latency in the case of 10MP x 2 lenses

On the fly - equations based Advantages:

(10)

• External RAM can be sparsely used for storing the image rows • Well designed hardware design can reduce overall latency

• Extensible to higher resolution with a linear increase in memory usage Disadvantages:

• Higher design complexity

4 Proposed Method

The implementation needed would emphasize on rectification. As the GIMME2 board uses low distortion lenses, less effort should be put on undistortion methods. Therefore, a two step method, wherein an offline step using MATLAB for camera calibration will be done, then an onboard rectification process as done similarly in [7].

The most suitable and extensible architecture for FPGA design is the one proposed by Zicari [1]. Figure3 shows the hardware design and data flow. This design is a pipelined parallel architecture with least bottlenecks. In the case of look-up table based architecture a major bottleneck is the external RAM access times.

(11)

This method focuses less on undistortion techniques and more on rectification. External tools for calibration of the camera with a known checker board are used. Then the parameters are used in the FPGA to do the online rectification. The mapper block shown in Figure3 does the undistortion and rectification of the pixel coordinates to project the image onto the rectified frame. Once the pixels coordinate values are calculated, raw image buffer is looked-up and pixel colours are filled in the rectified frame.

Figure 4: Mapping and inverse pixel look-up [2]

The process shown in Figure 4 summarises the image transformation process when the above method is applied. The rectification process takes place between the raw image acquisition and disparity estimation, therefore a continuous flow of pixels has to be ensured. The current method buffers only the minimum required rows of the image in the memory before passing rows out to the next level of processing. This minimum rows depends on the lens distortion, and this is also suggested by MATLAB, as reported in [1].

The pixel reconstruction block shown in Figure 3, performs bi-linear interpolation and the mathematics associated with it comes at cost of increased complexity. Instead it is simply better to use nearest neighbour from the raw pixels due to lower complexity. Now the rectified image can be passed onto the next level in the stereo matching process, usually the disparity estimator.

4.1 Project Work-flow

The project is carried out in two parts. The first part constitutes rectification using image files in MATLAB. For this the calibration parameters of the stereo rig is required. Details of the calibration is further explained in section5. The advantage of using MATLAB is to concentrate more on the theoretical part and less on the coding part, considering that coding in MATLAB is easier. Therefore the first part of implementation is intended for a better understanding of handling image frame co-ordinates and the image frame pixels with rules from stereo vision geometry. The MATLAB implementation form the simulation test bench. The second part constitutes of coding in VHDL, where more effort will be put into consideration of hardware designing and handling of co-ordinates and pixels from register to register. This effort is important due to the fact that the pixels taken into account is very high compared to previous works in literature and that the FPGA resources are limited. For coding purpose, Xilinx Vivado is used. The synthesis process outputs information related to the chosen FPGA, i.e., resources utilized, whether implementable etc, using which the the VHDL code can be suitably re-designed or modified to meet the constraints.

(12)

5 Stereo Camera Calibration

In order to undistort and rectify given stereo images, the stereo camera properties called the calibration parameters are needed. Calibration parameters of stereo camera can be obtained in a couple of ways. They are namely:

• Directly from the given stereo images

• Using the stereo rig to to find out the intrinsic and extrinsic parameters using known checker board.

The use of the first type for undistortion is prone to errors due to the fact that the intrinsic parameters are unknown. Therefore the first type is only suitable in cases where the images are already undistorted. The author in [1] uses the second type where a set of images of the checker board with known dimension is taken and then processed to obtain the stereo parameters of the rig and undistortion parameters of each individual camera. The author uses Camera Calibration Toolbox for MATLAB [14] for the calibration process. The web page describes in detail the process for capturing the images and step by step instructions for handling the toolbox.

Figure 5: Checker board image pairs from stereo rig

Few pairs of images for the calibration process is shown in5. These pairs of images are input in the Camera Calibration Toolbox for MATLAB. The GUI of the stereo toolbox is as shown in6.

Figure 6: The Stereo Toolbox

The stereo parameters are listed.

(13)

F o c a l Length : f c l e f t = [ 4 9 7 1 . 9 0 8 9 8 4 9 5 9 . 6 4 9 4 5 ] [ 2 6 . 0 8 8 0 9 2 5 . 6 2 5 2 0 ] P r i n c i p a l p o i n t : c c l e f t = [ 1 9 8 7 . 4 4 9 4 6 1 3 3 5 . 8 1 6 2 5 ] [ 1 5 . 5 7 6 4 7 1 6 . 0 7 3 6 1 ] Skew : a l p h a c l e f t = [ 0 . 0 0 0 0 0 ] [ 0 . 0 0 0 0 0 ] => a n g l e o f p i x e l a x e s = 9 0 . 0 0 0 0 0 0 . 0 0 0 0 0 d e g r e e s D i s t o r t i o n : k c l e f t = [ −0.08327 0 . 1 5 0 3 1 −0.00216 0 . 0 0 4 7 8 0 . 0 0 0 0 0 ] I n t r i n s i c p a r a m e t e r s o f r i g h t camera : F o c a l Length : f c r i g h t = [ 4 9 5 6 . 4 9 6 6 8 4 9 4 6 . 2 0 9 1 7 ] [ 2 6 . 0 4 2 6 1 2 5 . 5 8 3 3 6 ] P r i n c i p a l p o i n t : c c r i g h t = [ 2 0 3 3 . 3 2 9 3 4 1 3 3 2 . 1 1 2 3 1 ] [ 1 5 . 6 4 9 0 1 1 6 . 1 0 2 2 3 ] Skew : a l p h a c r i g h t = [ 0 . 0 0 0 0 0 ] [ 0 . 0 0 0 0 0 ] => a n g l e o f p i x e l a x e s = 9 0 . 0 0 0 0 0 0 . 0 0 0 0 0 d e g r e e s D i s t o r t i o n : k c r i g h t = [ −0.07384 0 . 0 8 4 2 1 −0.00156 0 . 0 0 5 1 4 0 . 0 0 0 0 0 ] E x t r i n s i c p a r a m e t e r s ( p o s i t i o n o f r i g h t camera wrt l e f t camera ) : R o t a t i o n v e c t o r : om = [ −0.00155 −0.00124 0 . 0 0 0 1 2 ] [ 0 . 0 0 2 1 1 0 . 0 0 3 2 4 0 . 0 0 0 1 3 ] T r a n s l a t i o n v e c t o r : T = [ 7 0 . 2 2 5 7 5 −0.51746 0 . 8 9 1 1 9 ] [ 0 . 2 4 5 8 6 0 . 2 4 3 1 5 1 . 4 5 4 7 7 ]

MATLAB has its own Stereo Camera Calibration toolbox and is not used in this thesis. The reason is that the output parameters are in different format and the reference frame co-ordinate for the stereo camera setup is different from that of the Caltech’s Camera Calibration Toolbox for MATLAB. The specialty of this toolbox was that the post calibration data included a parameter called as the re-projection error. The re-projection error tell us how good the picture are with respect to sharpness of the images and in-turn of the squares on the checker board. Therefore the toolbox was used in all sets of captured calibration images. The set with least re-projection error was finally used for further processing.

Setting up of GIMME2 board is explained in [15]. When starting from scratch the whole chapter III and until section F of chapter IV are to be carried out. In section F, the Linux application must be to initialize the sensors and capture the picture from both sensors, and finally save the two images in the SD card. This couple of pictures from the SD card is copied to the host machine for calibration purpose. Capturing of picture is repeated until various angles of the checker board image is obtained. In the initialization phase the program must be set to right resolution. For this thesis the maximum possible size, i.e. 3840 x 2748 pixels is set.

6 MATLAB Implementation

The simulation of stereo image rectification is important for a couple of reasons. Namely: • Clearing the theoretical aspects with respect to the stereo geometry for rectification. • Post rectification analysis like observing pixel movement, finding effective rectified image size

and so on.

• Finding minimum buffer size required for storing of pixels.

• Trying out various parameters for undistortion and rectification in order to optimize the resulting output image.

6.1 Rectification

The simulation architecture is as shown in Figure3. The first part is called as the Image Coordinate Scanner. Here a matrix of 3-dimensional array with dimensions as 2748 x 3840 x 3 is generated where the coordinates rows(r) and columns(c) are stored; the third column indicates whether the r or the c value is accessed, which takes values 1 for r or 2 for c. Now rc coordinates are stored in this matrix in linear order with this code:

f o r m = 1 : rows

f o r n = 1 : c o l s r c (m, n , 1 ) = m; r c (m, n , 2 ) = n ;

(14)

end end

In the mapper block there are four sub-blocks. The first sub-block is the projection 1. The mapper block needs the parameters from the calibration. The projection 1 block uses the rotation matrix and the inverse of the camera matrix. This projection translates from image co-ordinates to camera coordinates. So now the image is affine transformed(a transformation where the image size ratio and collinearity is maintained, i.e. points in a line remain in line before and after transformation) by the following code snippet:

f o r m = 1 : rows f o r n = 1 : c o l s r e s = Rt ∗ ( i n v k k ∗ [ r c (m, n , 1 ) ; r c (m, n , 2 ) ; 1 ] ) ; r c r o t (m, n , 1 ) = r e s ( 1 ) / r e s ( 3 ) ; r c r o t (m, n , 2 ) = r e s ( 2 ) / r e s ( 3 ) ; end end

After rotation the next sub-block is undistortion, for which the ’kc’ parameter is required. The ’kc’ is a 5 x 1 vector storing the radial and tangential distortion coefficients. In order to reduce the computing complex only the first 2 coefficients could be tried. This is explained in more detail in the result section. After following the undistortion equations, the undistortion is achieved by the following code snippet:

f o r m = 1 : rows f o r n = 1 : c o l s rnewSq = ( r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 1 ) ) ; qSq = rnewSq + ( r c r o t (m, n , 2 ) ∗ r c r o t (m, n , 2 ) ) ; qP4 = qSq ∗ qSq ; qP6 = qP4 ∗ qSq ; v a r 1 = ( 1 + kc ( 1 ) ∗ qSq + kc ( 2 ) ∗ qP4 + kc ( 5 ) ∗ qP6 ) ; r c d i s t (m, n , 1 ) = v a r 1 ∗ r c r o t (m, n , 1 ) . . . + 2 ∗ kc ( 3 ) ∗ r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 2 ) . . . + kc ( 4 ) ∗ ( qSq + 2 ∗ rnewSq ) ; r c d i s t (m, n , 2 ) = v a r 1 ∗ r c r o t (m, n , 2 ) . . . + kc ( 3 ) ∗ ( qSq + 2 ∗ r c r o t (m, n , 2 ) ) + 2 ∗ kc ( 4 ) ∗ . . . r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 2 ) ; end end

The projection 2 translates the coordinates from camera back to image. This transformation uses the final rectified image coordinate frame via kk new matrix. This is achieved by the following snippet: f o r m = 1 : rows f o r n = 1 : c o l s r e s 2 = KK ∗ [ r c d i s t (m, n , 1 ) r c d i s t (m, n , 2 ) 1 ] ’ ; r c d a s h (m, n , 1 ) = r e s 2 ( 1 ) ; r c d a s h (m, n , 2 ) = r e s 2 ( 2 ) ; r c d a s h (m, n , 3 ) = r e s 2 ( 3 ) ; end end

The last sub-block, called validator, checks if the newly generated coordinates are within the original image coordinate; i.e. within 3840 x 2748. For look-up outside this range, either different colours could be assigned or just black. This is indicating that no information is available after rectification for these image parts.

In Matlab simulation the raw image buffer part is not required as in every stage all calculated coordinates in different variables are stored and are free to be accessed any time. As such there is no memory constraints involved here. The raw image buffer will rather need more attention in the VHDL implementation and more details will be explained thereon.

Now that the rectified image coordinates are calculated and are ready to be used, an inverse pixel lookup as depicted in Figure 4 is needed. Now during pixel lookup, there could be cases where lookup in between the actual available pixels in the original image. This gives rise to the need for interpolation. In order to keep computation and lookup simple,

(15)

f o r n = 1 : c o l s a = round( r c s h i f t (m, n , 1 ) ) ; b = round( r c s h i f t (m, n , 2 ) ) ; i m g d a s h (m, n , : ) = img ( a , b , : ) end end

The new rectified image is stored in ’img dash’ variable and can be viewed with suitable commands in MATLAB.

6.2 Analytics

Before quickly moving on to the VHDL part, some analytics using MATLAB will make further implementations better.

Finding the minimum buffer size:

The following equations tell us how many lines maximum a pixel has moved vertically in the image.

m = M axLines = max upper of f set + max lower of f set max upper of f set = max

1≤i,r int0≤W

r int0− r if r int0 ≥ r

max lower of f set = max

1≤i,r int0≤W

r − r int0 if r int0 < r (9)

Where r is the current row number, r int’ is the newly found coordinate obtained by finding projection 2. These equations are used together with pixel lookup and are implemented in the following code: f o r m = 1 : rows f o r n = 1 : c o l s a = round( r c s h i f t (m, n , 1 ) ) ; b = round( r c s h i f t (m, n , 2 ) ) ; i f a >= m o f f s e t U = a − m; i f o f f s e t U > maxUpperOffset maxUpperOffset = o f f s e t U ; end e l s e o f f s e t L = m − a ; i f o f f s e t L > maxLowerOffset maxLowerOffset = o f f s e t L ; end end end end

Finding effective size of rectified image:

The effective rectified image is the rectangle dimension of the image excluding the curved regions and regions without the borders where there is no pixel information. Figure7is a representational image to explain the effective rectified image. The green square represents the effective rectified image size. This can be found out in the pixel lookup code with some additional tweak as:

Figure 7: Representational diagram showing effective image size

f o r m = 1 : rows

f o r n = 1 : c o l s

(16)

b = round( r c s h i f t (m, n , 2 ) ) ; i f a < 1 | | b < 1 i m g d a s h (m, n , : ) = [ 0 0 2 5 5 ] ; %b l u e i f innerm < m && a < 1 innerm = m; end i f i n n e r n < n && b < 1 i n n e r n = n ; end e l s e i f a > rows | | b > c o l s i m g d a s h (m, n , : ) = [ 0 255 0 ] ; %g r e e n

i f outerm > m && a > rows outerm = m; end i f o u t e r n > n && b > c o l s o u t e r n = n ; end e l s e i m g d a s h (m, n , : ) = img ( a , b , : ) ; end end end

Observing pixel movements:

The rectified image can be observed to find out how many pixel have moved more than a certain limit, say more than 50 lines, when observed between the input and output image. Using this the effective and useful rectified image area can be estimated, with which the buffer size required, can be reduced. Figure8show us pixel movement of more than 50 and 60 respectively. Having known this the buffer requirement against the cost of loss in image size can be balanced.

Figure 8: Pixel movement in rectified images: Area marked in yellow are pixels that moved more than 50 rows

6.3 Conclusion

The purpose of MATLAB simulation was to make the algorithm familiar, to know what forms of the image should be given as inputs, to note the output and to observe the range of the parameters that can be given as inputs during rectification against the time taken, etc. Other fine tuning parameter such as the rectified image size is a very useful input to the VHDL code, is explained in the Analytics section. Further tuning such as adjusting the image size based on pixel movement is found. The pixel movement which is found to have a huge impact on the buffer size is yet another very useful and important input to the VHDL code. This buffer will be used by the ’inverse look up’ part of the VHDL code.

The working of the algorithm is confirmed by the forming an anaglyph image and viewing it using a anaglyph glass. A 3-dimensional image should be seen. The output is also visually compared with the MATLAB’s built-in Stereo camera calibration toolbox output and they are

(17)

found to be similar. Further analytical outputs are presented in the result section. The output from this MATLAB simulation can now be used for comparison with the VHDL implementation output. Differences in the VHDL output can now be resolved to either algorithmic issues or other VHDL coding issues, based on the learning from MATLAB simulation.

7 VHDL Implementation

VHDL is VHSIC Hardware Description Language (VHSIC stands for Very High Speed Integrated Circuit). This is much unlike a regular programming language. In a regular programming language, instructions that a processor understands are made. Using VHDL electronic gates and logical signals are formed thereby describing electronic circuits. Therefore the implementation will vary much from the MATLAB counterpart, as more attention is needed in details like data flow from register to register, number of signals used and the constrained memory usage. To start with, a test bench is needed using which rectification hardware (design unit) can be developed and tested. A test bench is a part of software that provides stimuli to simulate hardware signals in the same way as that of the actual hardware, so the input and output can be tested without the system. In this case the test bench reads a input image from a file, that is stored on a disk, and the data is provided using the same as the component decoding image data on the hardware.

In VHDL implementation, two instances of the same component is used for the left and right image rectification. Therefore, the right image output was obtained by altering the input parame-ters to same VHDL code as that used for the left image. To keep the VHDL coding process simple one image is considered.

The interface of the testbench is described by a VHDL component:

component r e c t i s p o r t ( c l k : s t d l o g i c ; r e s e t n : s t d l o g i c ; d : i n i m a g e t y p e ; q : o u t i m a g e t y p e ) ; end component r e c t ;

So the design under test will have the same interface described by an entity. Where image type is a VHDL record:

t y p e i m a g e t y p e i s r e c o r d

d a t a v a l i d : s t d l o g i c ;

s o f : s t d l o g i c ; −−S t a r t o f frame

e o l : s t d l o g i c ; −−End o f l i n e

d a t a : u n s i g n e d ( 3 ∗ DATA SIZE−1 downto 0 ) ; −−p i x e l data , 24 b i t

end r e c o r d;

The VHDL test bench reads the BMP-image (bit map format) from the file. After reading the header information and obtaining the image dimensions from the input file, the test bench starts to read the pixel data in a 24 bit format. Each pixel is made to be available at the data for one clock cycle. The sof, eol and data valid are controlled suitably and behave the same way as the hardware. The test bench also comprises of another ’save’ block that reads info at the q port. This part is made to write pixel data to a file on the filesystem. The ’save’ block looks for pixel every clock cycle until it finds all pixel information as known from the image dimension that was read before. These pixels are stored in a file named as ’output.bmp’.

The image coordinate scanner is a simple number counter, where the number increases by one every clock cycle. This start of the count is synchronised with data valid signal. The row is incremented once every column is counted till the end. The mapper block consisting of 4 other sub-blocks is described to generate a new coordinate every clock cycle. In a separate process also synchronised with the data valid signal, a buffer is started. This process starts to record the pixel information in a cyclic row buffer, refer9. The mapper block starts working only when the cyclic row buffer has reached half it’s maximum row value. In this way, as soon as the new coordinates

(18)

Figure 9: Cyclic row buffer. Top: Buffer filling up for the first time. Bottom: Pixel read out in successive buffer refilling

are generated and when the pixel lookup is started, it is done so right at the middle of the buffer so all surrounding pixels are accessible.

The row number at which the required pixel falls in the cyclic buffer can be found by mod operator, due to the fact that the cyclic buffer starts to fill at the same time as mapper block start. The following code snippet finds the correct row number in the cyclic buffer:

r mod := ( r i n t mod r o w b u f f e r s i z e ) ;

i f r mod = 0 t h e n

r mod := r o w b u f f e r s i z e ;

end i f;

Where r int is the output by the mapper module. r mod is the row number in the cyclic buffer. The effective delay between input and output is the time that it takes to fill up half the cyclic row buffer. After this wait time, for every clock cycle and an input pixel, an output pixel is ready. This output pixel is put to hold for one cycle at the q data port.

In order to represent a decimal number in VHDL, the package fixed pkg is needed. The ieee.fixed pkg has libraries that can be compiled for simulation. The ieee proposed.fixed pkg has libraries that can be compiled for synthesis. The cost of the package is at 40 bits per fixed point number, which is split as 16 bits on the left and 24 bits on the right. Which such a split all co-ordinates upto 3840 can be represented, with 24 bits as decimal accuracy. Therefore it is about 360 bits for every matrix used. Using the fixed pkg decimals can be denoted as shown:

s u b t y p e s f i x e d r c i s s f i x e d ( l r a d − 1 downto r r a d ) ;

c o n s t a n t kc1L : s f i x e d r c := t o s f i x e d ( − 0 . 0 8 3 8 2 9 6 7 , l r a d − 1 , r r a d ) ;

7.1 Conclusion

The proposed rectification algorithm is implemented using VHDL and the code snippets are pre-sented. All blocks shown in the architecture, Figure 3 are implemented in the same order. The buffer is implemented is a cyclic row buffer and each co-ordinate is stored as a fixed type which uses

(19)

40 bits per co-ordinate per pixel per image. That counts to about 80 bits for x and y co-ordinate per pixel per image. Although 40 bits is high, these variables are used in the undistortion part until the fraction is truncated before performing the inverse lookup. We can see further analytical comparisons of proposed algorithm implemented in VHDL, in the result section.

In order to check if the VHDL implementation is synthesizable, the code and the input image was modified to rectify images of smaller size and then incremented in smaller steps. The code was synthesizable for one image of size 615 x 440 with 22 rows as buffer. The image was scaled proportionately and the buffer size was also calculated proportionate to the number of rows. The Utilization factor for this image size is 457% for the device in consideration and for one camera. The device used in GIMME2 board is Zynq7020. The synthesizable image size for which the utilization is less than 100% is 160 x 115 with 6 lines of buffering. Therefore changes in the implementation are required, especially the row buffer storage area and other major changes could be pixel size(like greyscale) and image size reductions. Further comparative discussion is found in the main conclusion section. The VHDL code for rectification can be found in Appendix section B of this report.

8 Results

Rectification is done using 4 different methods, they are: • MATLAB, two different versions of MATLAB are used:

- Built-in version, Stereo Camera Calibrator app

- CALTECH version, CALTECH Camera calibration tool. This version is used only for obtaining camera calibration parameters.

• MATLAB implementation of proposed algorithm. • VHDL implementation of proposed algorithm.

The CALTECH version of rectification is easier to understand and interpret, than the built-in version due to the fact that the source code is visible and can be used to extract valuable(method, resolution, time) information. Factors like rectified image resolution and timing of the scripts cannot be analyzed if the built-in version is used.

For the purpose of comparison, the built-in version of MATLAB app is used, along with the two different implementations, namely MATLAB and VHDL implementation. All the methods are illustrated below. Since all rectification methods use the calibration parameters of the same set of input images, outputs are same from all the methods. A comparative study is done to analyze the outputs to know differences mathematically.

Rectification of stereo images using the Stereo Camera calibrator app (built-in version) of MATLAB are shown. Rectified images can be seen by drawing epipolar lines on the images. The input and the output images are shown in Figure10and11.

Drawing epipolar lines: Multiple epipolar lines drawn over the rectified image indicates the effect of rectification. In rectified images the corresponding epipolar lines from the left and right images must be parallel to each other. Each of these lines must also be parallel to the stereo camera’s base line, refer Figure1 for an illustration of the base line, O l O r represents the base line. From the epipolar geometry, epipolar line l = F.x, where F is the fundamental matrix and x is the point on the image. The fundamental matrix is obtained from the calibration parameters from MATLAB. The epipolar lines are now drawn over the same two points on the left and right image. These lines are compared by placing the images alongside. Similar points from left and right images are called inliers. Inliers on the images are found using built-in functions in MATLAB. These inliers are points, x, over which the epipolar lines are drawn.

Rectification of stereo images using the MATLAB implementation is shown in Figure 12, 13

and14.

Rectification of stereo images using the VHDL implementation is shown in Figures15and the corresponding anaglyph image in Figure 16. The artifacts that can be seen in the white plain

(20)

Figure 10: The input images from the Camera Calibrator app

Figure 11: The rectified output images with epipolar lines from the Camera Calibrator app

regions of the images are due to the MATLAB’s resize tool, and not related to the implemented algorithms or the checker board. The images are reduced in size to fit this report and when reducing comes artifacts due to aliasing. The artifacts can be seen as stair-step patterns.

The purpose of implementation on MATLAB is to get a better understanding of the process of rectification. This will make the implementation using VHDL straightforward. This thesis focuses on implementation using VHDL and therefore not much attention is given to speed of rectification using MATLAB. The fact that MATLAB app is faster than implemented design is due to pre-computation of rectified co-ordinates, after which only look up is done to obtain the rectified images. MATLAB app method can be compared to the LUT method which was discussed earlier and is not suitable for VHDL implementation due to severe memory constraints and increased access times with the external memory.

Observation of the output images of different methods reveal that the results are very much similar. The epipolar lines are matched and the 3-dimensional anaglyph images are reproducing correct 3D view. There could be minute differences between the outputs of various methods, which are difficult to notice. The only observable differences are that of the image size. The original image sizes are 3840 x 2748. The output images could be more or less than that of the input, but they are all cropped to a maximum of 3840 x 2748 as the templates (including allocated memory space) considered are of the original size. The output images are varying upto 5% of the original size. This depends on accuracy of the decimal numbers considered in the undistortion part of the rectification, as this part involves mathematical calculations of image co-ordinates of multiple orders, which will influence the final output. During inverse look-up if there is no information about pixels in the original image for the calculated co-ordinates, then a different colour (ex. green, blue) is filled up in the output images in those regions. These regions can be cropped for better redering of output,

(21)

Figure 12: The input images with epipolar lines

Figure 13: The rectified images with epipolar lines

but here it is not done, so the output can be observed. The variation in image sizes also depend on factors like how the algorithm considers the origin of the image and alignments for vertical adjustments.

For the analytical comparison of the rectification outputs, image subtractions are shown in Figure 17 and Figure 18. Figure 17 shows the difference between the MATLAB’s built-in tool output and the implemented algorithm output. A difference in terms of black and white pixels after applying a threshold of 128 to 255 as white pixels, is 2.2%. Meaning that 2.2% of the pixels are varying from the MATLAB’s built-in tool output, with root-mean-square error value as 6.059. While Figure 18 shows the difference between MATLAB’s built-in tool output and the VHDL output. Here the difference percentage is 0.2% with root-mean-square error value of 6.002, indicating that the VHDL output is closer to the MATLAB’s built-in tool output than the MATLAB implementation. This can also be considered as error percentage along with 50% greyscale threshold, i.e. grey scale pixel values of 128 to 255 are considered as white pixels, out of the full range of 0 to 255. Although this error calculation depends on the image considered, it gives an idea as to how good or bad is the performance of the algorithm comparatively. The root-mean-square error (RMSE) value should be considered as a good number for comparison in general for all image inputs. The RMSE values is calculated by taking both MATLAB’s builtin tool output and the output for which error is to be found. The difference in the corresponding values are squared. Then an average of these values are found, after which all the averages are summed and then square root of this sum gives the RMSE. Therefore in summary more the pixel are different the RMSE value is higher. For comparisons with other works in literature the author of [16] performs a comparison between three other methods, one of which is there own. The article uses root-mean-square error (RMSE) values to compare the rectification and call it rectification-mean-square error. When compared to the three other methods in [16](table 4), the proposed algorithm stands similar with the Zhang’s and their own robust method.

(22)

Figure 14: Anaglyph images from MATLAB output

Further for the comparison of only the distortion effects, the pixel counts of both the MATLAB and VHDL outputs are considered. The pixel count of the image is proportional to the area, which is found by using standard image tools. The numbers in table indicate that the implemented MATLAB algorithm is closer to the built-in tool.

Image Pixels Variation % Distortion % Input 10552320

Matlab output 10307702 97.77 2.22 VHDL output 10276623 97.38 2.62

The rectification error calculations encompasses the distortions effects, which can be represented as, Rectification = distortion correction effects + vertical shift effects. The error and the variation in the error between the two outputs can be attributed to the decimal accuracy used by both the MATLAB’s built-in tool algorithm and the proposed algorithm. Since pixel co-ordinate range upto multiple thousands and the undistortion algorithm uses a complex formula involving powers of multiple orders, the errors multiply too. This formula also involves selection of the number of co-efficients used, it could vary between algorithms, and number of co-efficients used also contribute to the error difference.

(23)

Figure 15: The rectified images with epipolar lines from VHDL output

(24)

Figure 17: Output showing difference between MATLAB tool output and Implemented MATLAB output

Figure 18: Output showing difference between MATLAB tool output and Implemented VHDL output

(25)

9 Conclusion

This thesis mainly aims at making rectification implementation on the given hardware closer to reality, than with no rectification, as before. Stereo images were captured from the GIMME2 board and were subject to extraction of calibration parameters from MATLAB. The CALTECH Camera Calibration toolbox was used in order to obtain the parameters. These parameters are used for further implementation so that the calibration itself need not be done on the GIMME2 board, given that the lens system is fixed for a particular setup. Stereo image rectification was performed using MATLAB and VHDL. In MATLAB, all the blocks in Figure 3 of rectification was implemented in succession without any significance to the resource usage. The results are shown by drawing epipolar lines on input and output images. The code parts are explained in detail, which removes theoretical ambiguities for future research studies. For VHDL, the blocks in Figure3, are implemented in such a way that, the continuous flow of data keeps pixel output constant with fixed latency. In fact, since start of clock cycles, the time taken for the filling up half the buffer is the only delay, after which for every clock cycle there is one pixel at the output for input pixel. Therefore the outputs show the rectification using different methods, without much observable differences.

The different levels of implementation in MATLAB and VHDL has increased the understanding and has also elaborated the difficulties involved in rectification of stereo images using FPGA. The algorithm for rectification is implemented first in MATLAB and then in VHDL. The VHDL input images are rectified and compared with MATLAB’s Camera Calibrator tool which proves the work-ing of the implemented VHDL code. The VHDL code in the current form is not implementable on the FPGA, this could be because the algorithm is too complicated to be resolved to available logical constraints. During synthesis of the VHDL code the synthesizer of the VIVADO software runs forever without any information or output.

Highlights of implementation demands vs availability:

• Resource demands for an image size of 615 x 440 with 22 lines of buffering is 457%.

• Memory (BRAM) demand for two 3840 x 2748 pixel images is 25.8 Mb as against the avail-ability of 4.9 Mb.

Considerations:

• To undistort and rectify left and right images of size 3840 x 2748 pixels with 140 rows buffer as needed for rectification

• Rectification to be done on the fly method using data from offline calibration. The other method of contention is using look-up table which needs very high memory.

• To perform rectification without using external RAM as it would introduce delays.

In order to check the implementability the image size was lowered to find out sythesizable image rectification size. For the given device considering the resources available a rectification for the image size 160 columns (width) x 115 lines (rows) with 6 lines of buffering without using the BRAM was synthesizable with an utilization factor of 91%. From section 7.1, the required resources for synthesizing the reduced image rectification with size as 615 x 440 with 22 lines buffering was 457% with the Zynq7020. The corresponding device that would accommodate the image rectification of 615 x 440 with 22 row buffer, within the same family is the Zynq7100. The Zynq7020 has BRAM capacity of 4.9 Mb, which is not sufficient for full size image rectification. Moreover the maximum supported data address width is 18 bits against the required which is 32 bits. Existing implementations as in [1], rectifies an image size of 1280 x 720 and uses 64 blocks of BRAM (576 Kbits) to buffer 50 rows. As this work is based on a related work done on a similar hardware, the hardware specific design considerations is not mentioned which makes task not very straightforward. The Zynq7100 FPGA has BRAM size of 26 Mb. With the Zynq7100 FPGA a full size (3840 x 2748) rectification can be done for both left and right images considering a row buffer size of 140 rows. The total bits that need to be stored are 3840 Width * 24 bits per pixel * 140 row buffer size * 2 left and right = 25.8 Mb, which is less than the availability in Zynq7100 FPGA

(26)

and makes utilization of BRAM very effective. Therefore, theoretically the Zynq7100 FPGA would be ideal device for a full scale image rectification. The data buffering can be further reduced by optimizing parameters such as reducing input image size, computing homographies, implementing multiple dual port memory blocks, which are further mentioned in the Future Work section. Using the external RAM is not a possibility due to increased access times and would defeat the purpose of this thesis due to increased latency, but is probably the only solution in order for it work on Zynq7020 FPGA.

10 Future Work

Further VHDL implementation can be considered by changing parameters that will reduce the image buffer size. Factors like the input image size, 8 bit pixels (grey) instead of 24 bits (colour, RGB) could play an important role in reducing the buffer and in turn make it implementable on the FPGA hardware. The input image size could be reduced to the effective rectified image size, which will reduce unnecessary computation, by eliminating the unusable area of the input image. Computation of homographies and therefore combining multiple mathematical operations should be considered, thereby effectively using the hardware. The cyclic buffer could be changed to multiple Dual port RAM blocks which reduces the number of connections made in the hardware. Implementing multiple Dual port RAM blocks could improve the synthesizer response during synthesis. Alternatively, with advanced FPGAs the task could be simplified. The Zynq7100 FPGA, which comes with adequate BRAM, is enough to store the required image buffer, which will enable an implementation on the FPGA, but will leave no resources for other major tasks such as stereo matching and distance estimation. Therefore it is good to estimate resources needed for other tasks and then survey for the best-fit FPGA.

(27)

References

[1] P. Zicari, “Efficient and high performance fpga-based rectification architecture for stereo vision,” Microprocessors and Microsystems, vol. 37, no. 8, Part D, pp. 1144 – 1154, 2013. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0141933113001282

[2] P. Greisen, S. Heinzle, M. Gross, and A. P. Burg, “An fpga-based processing pipeline for high-definition stereo video,” EURASIP Journal on Image and Video Processing, vol. 2011, no. 1, pp. 1–13, 2011.

[3] “Gimme2 - product brief v1.1.” [Online]. Available: http://www.af-inventions.de/

[4] G. Gimelfarb, “Course slides fundamental matrix / image rectification.” [Online]. Available: https://www.cs.auckland.ac.nz/courses/compsci773s1t/lectures/773-GGpdfs/ 773GG-FundMatrix-A.pdf

[5] V. Papadimitriou and T. Dennis, “Epipolar line estimation and rectification for stereo image pairs,” Image Processing, IEEE Transactions on, vol. 5, no. 4, pp. 672–676, Apr 1996. [6] Y. Jia, X. Zhang, M. Li, and L. An, “A miniature stereo vision machine (msvm-iii) for

dense disparity mapping,” in Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 1, Aug 2004, pp. 728–731 Vol.1.

[7] J. Mun and J. Kim, “Real-time fpga rectification implementation combined with stereo cam-era,” in Consumer Electronics (ISCE), 2015 IEEE International Symposium on, June 2015, pp. 1–2.

[8] C. Vancea and S. Nedevschi, “Lut-based image rectification module implemented in fpga,” in Intelligent Computer Communication and Processing, 2007 IEEE International Conference on, Sept 2007, pp. 147–154.

[9] E. Staudinger, M. Humenberger, and W. Kubinger, FPGA-based rectification and lens undis-tortion for a real-time embedded stereo vision sensor. na, 2008.

[10] K. Jawed, J. Morris, T. Khan, and G. Gimel’farb, “Real time rectification for stereo correspon-dence,” in Computational Science and Engineering, 2009. CSE ’09. International Conference on, vol. 2, Aug 2009, pp. 277–284.

[11] J. Mallon and P. F. Whelan, “Projective rectification from the fundamental matrix,” Image and Vision Computing, vol. 23, no. 7, pp. 643–650, 2005.

[12] H. Su and B. He, “A simple rectification method of stereo image pairs with calibrated cam-eras,” in Information Engineering and Computer Science (ICIECS), 2010 2nd International Conference on, Dec 2010, pp. 1–4.

[13] F. Isgro and E. Trucco, “Projective rectification without epipolar geometry,” in Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., vol. 1, 1999, p. 99 Vol. 1.

[14] J.-Y. Bouguet, “Camera calibration toolbox for matlab.” [Online]. Available: https: //www.vision.caltech.edu/bouguetj/

[15] C. Ahlberg, “Gimme2 - a getting started guide,” Tech. Rep., November 2015. [Online]. Available: http://www.es.mdh.se/publications/4142-GIMME2 a getting started guide

[16] R. Guerchouche and F. Coldefy, “Camera calibration methods evaluation procedure for images rectification and 3d reconstruction,” 2008.

(28)

Appendix A: MATLAB Rectification code t o r e c t i f y = 1 ; l o o k u p = 1 ; f n 0 = ’ cam0 5 . bmp ’ ; f n 1 = ’ cam1 5 . bmp ’ ; % f n 0 = ’ l e f t 1 2 . j p g ’ ; % f n 1 = ’ r i g h t 1 2 . j p g ’ ; o p f n 0 = ’ c a m 0 m a r e c t 5 . bmp ’ ;%ma = my a l g o r i t h m o p f n 1 = ’ c a m 1 m a r e c t 5 . bmp ’ ; a n a g l y p h f n = ’ cam ma 5 . j p g ’ ; pad = 0 ;%on e a c h s i d e p a d d i n g = 0 ; %image s i z e [ rows , c o l s , ˜ ] = s i z e( i m r e a d ( f n 0 ) ) ; % 2748 X 3840

bpp = 3 ;%rows = 2 7 4 8 ; c o l s = 3 8 4 0 ; %s t a n d a r d gimme2 image o u t p u t s i z e %wh ol e c a n v a s s i z e %rows = rows + p a d d i n g ; %c o l s = c o l s + p a d d i n g ; i f t o r e c t i f y == 1 %L o a d i n g p r a m e t e r s from c a l i b r a t i o n a s m s t r u c t u r e s r e c t i f l e f t = l o a d( ’ C a l i b R e s u l t s l e f t . mat ’ ) ; r e c t i f r i g h t = l o a d( ’ C a l i b R e s u l t s r i g h t . mat ’ ) ; r e c t i f s t e r e o = l o a d( ’ C a l i b R e s u l t s s t e r e o . mat ’ ) ; r e c t i f s t r e c t i f = l o a d( ’ C a l i b R e s u l t s s t e r e o r e c t i f i e d . mat ’ ) ;

%l o o p a l g o r i t h m f o r l e f t and r i g h t images , 0 f o r l e f t and 1 f o r r i g h t

f o r l r = 0 : 1 i f l r == 0 %l e f t image KK new = r e c t i f s t r e c t i f . K K l e f t n e w ; %KK new = r e c t i f s t e r e o . K K l e f t ; Rt = e y e( 3 ) ;%u n i t y m a t r i x %r e c t i f s t e r e o . R ; %kc = r e c t i f s t e r e o . k c l e f t ;% kc ( 3 ) = 0 ; kc ( 4 ) = 0 ; kc ( 5 ) = 0 ; kc = r e c t i f l e f t . kc ;%kc ( 3 ) = 0 ; kc ( 4 ) = 0 ; kc ( 5 ) = 0 ; KK = r e c t i f l e f t .KK;% r e c t i f s t r e c t i f . K K l e f t n e w ;% s h i f t m a t = e y e( 3 ) ; e l s e i f l r == 1 %r i g h t image KK new = r e c t i f s t r e c t i f . K K r i g h t n e w ; %KK new = r e c t i f s t e r e o . K K r i g h t ; Rt = r e c t i f s t e r e o . R ; %kc = r e c t i f s t e r e o . k c r i g h t ;% kc ( 3 ) = 0 ; kc ( 4 ) = 0 ; kc ( 5 ) = 0 ; kc = r e c t i f r i g h t . kc ;%kc ( 3 ) = 0 ; kc ( 4 ) = 0 ; kc ( 5 ) = 0 ; KK = r e c t i f r i g h t .KK;% r e c t i f s t r e c t i f . K K r i g h t n e w ;% s h i f t m a t = [ 1 0 0 0 ; 0 1 0 0 0 ; 0 0 1 ] ;%85 and 213 % | up | l e f t end %%%f o r m a t i n g image co−o r d i n a t e s%%% r c = z e r o s( rows , c o l s , bpp ) ; %3948 X 5040 r c ( : , : , 3 ) = 1 ; %co−o r i d n a t e g e n e r a t o r f o r m = 1 : rows f o r n = 1 : c o l s r c (m, n , 1 ) = m ; r c (m, n , 2 ) = n ; end end %%%p r o j e c t i o n 1%%% i n v k k = i n v( KK new ) ; %Rt = r o d r i g u e s (om ) ; r c r o t = z e r o s( rows , c o l s , bpp ) ; %o n t o a l i g n e d camera f r a m e

(29)

f o r m = 1 : rows f o r n = 1 : c o l s r e s = Rt ∗ ( i n v k k ∗ [ r c (m, n , 1 ) ; r c (m, n , 2 ) ; 1 ] ) ; r c r o t (m, n , 1 ) = r e s ( 1 ) / r e s ( 3 ) ; r c r o t (m, n , 2 ) = r e s ( 2 ) / r e s ( 3 ) ; i f m == 29 && n == 569 % e r r o r ( ’ Stopped ’ ) end end end %r e s = 0 ; %%%u n d i s t o r t i o n%%% r c d i s t = z e r o s( rows , c o l s , bpp ) ; r c d i s t ( : , : , 3 ) = 1 ; %t h e e q u a t i o n f o r m = 1 : rows f o r n = 1 : c o l s rnewSq = ( r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 1 ) ) ; qSq = rnewSq + ( r c r o t (m, n , 2 ) ∗ r c r o t (m, n , 2 ) ) ; qP4 = qSq ∗ qSq ; qP6 = qP4 ∗ qSq ; v a r 1 = ( 1 + kc ( 1 ) ∗ qSq + kc ( 2 ) ∗ qP4 + kc ( 5 ) ∗ qP6 ) ; r c d i s t (m, n , 1 ) = v a r 1 ∗ r c r o t (m, n , 1 ) . . . + 2 ∗ kc ( 3 ) ∗ r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 2 ) + kc ( 4 ) ∗ ( qSq + 2 ∗ rnewSq ) ; r c d i s t (m, n , 2 ) = v a r 1 ∗ r c r o t (m, n , 2 ) . . . + kc ( 3 ) ∗ ( qSq + 2 ∗ r c r o t (m, n , 2 ) ) + 2 ∗ kc ( 4 ) ∗ r c r o t (m, n , 1 ) ∗ r c r o t (m, n , 2 ) ; end end i f l r == 0 r c d i s t l = r c d i s t ; end %%%P r o j e c t i o n 2%%% %i n i t r c d a s h = z e r o s( rows , c o l s , bpp ) ; r c d a s h ( : , : , 3 ) = 1 ; %o n t o r e a l camera r e f e r e n c e f r a m e f o r m = 1 : rows f o r n = 1 : c o l s r e s 2 = KK ∗ [ r c d i s t (m, n , 1 ) r c d i s t (m, n , 2 ) 1 ] ’ ; r c d a s h (m, n , 1 ) = r e s 2 ( 1 ) ; r c d a s h (m, n , 2 ) = r e s 2 ( 2 ) ; r c d a s h (m, n , 3 ) = r e s 2 ( 3 ) ; end end %%%s h i f t i n g%%% %f o r h o r i z o n t a l l y matching t h e e p i p o l a r l i n e s r c s h i f t = z e r o s( rows , c o l s , bpp ) ; r c s h i f t ( : , : , 3 ) = 1 ; f o r m = 1 : rows f o r n = 1 : c o l s r e s 3 = s h i f t m a t ∗ [ r c d a s h (m, n , 1 ) r c d a s h (m, n , 2 ) 1 ] ’ ; r c s h i f t (m, n , 1 ) = r e s 3 ( 1 ) ; r c s h i f t (m, n , 2 ) = r e s 3 ( 2 ) ; end end i f l r == 0 A r c s h i f t = r c s h i f t ; e l s e i f l r == 1 B r c s h i f t = r c s h i f t ; end

(30)

%l r ’ f o r l o o p ’ end end % i f ’ t o r e c t i f y ’ end end i f l o o k u p == 1 %%%p i x e l l o o k u p%%% %i n i t i m g d a s h = u i n t 8 (z e r o s( rows , c o l s , bpp ) ) ; f o r l r = 0 : 1 %−− innerm = 0 ; i n n e r n = 0 ; outerm = rows ; o u t e r n = c o l s ; maxUpperOffset = 0 ; maxLowerOffset = 0 ; o f f s e t U = 0 ; o f f s e t L = 0 ; %−− %image s e l e c t i o n and r e a d i f l r == 0 img = i m r e a d ( f n 0 ) ; img0 = img ; r c s h i f t = A r c s h i f t ; e l s e i f l r == 1 img = i m r e a d ( f n 1 ) ; img1 = img ; r c s h i f t = B r c s h i f t ; end

%%Padimg = wextend ( 2 , ’ zpd ’ , img , pad ) ; %Aimg 3948 X 5040

f o r m = 1 : rows f o r n = 1 : c o l s a = round( r c s h i f t (m, n , 1 ) ) ; b = round( r c s h i f t (m, n , 2 ) ) ; i f a >= m o f f s e t U = a − m; i f o f f s e t U > maxUpperOffset maxUpperOffset = o f f s e t U ; end e l s e o f f s e t L = m − a ; i f o f f s e t L > maxLowerOffset maxLowerOffset = o f f s e t L ; end end i f a < 1 | | b < 1 i m g d a s h (m, n , : ) = [ 0 0 2 5 5 ] ; %b l u e i f innerm < m && a < 1 innerm = m; end i f i n n e r n < n && b < 1 i n n e r n = n ; end e l s e i f a > rows | | b > c o l s i m g d a s h (m, n , : ) = [ 0 255 0 ] ; %g r e e n

i f outerm > m && a > rows outerm = m; end i f o u t e r n > n && b > c o l s o u t e r n = n ; end e l s e % i f o f f s e t U > 50 | | o f f s e t L > 50 % i m g d a s h (m, n , : ) = [ 2 5 5 247 2 6 ] ; % from gimp %e l s e

(31)

i m g d a s h (m, n , : ) = img ( a , b , : ) ; %end end %end %f i n d i n g o f f s e t % i f a >= m % o f f s e t U = a − m; % i f o f f s e t U > maxUpperOffset % maxUpperOffset = o f f s e t U ; % end % e l s e % o f f s e t L = m − a ; % i f o f f s e t L > maxLowerOffset % maxLowerOffset = o f f s e t L ; % end % end end %n l o o p end [ innerm , i n n e r n ] [ outerm , o u t e r n ] maxUpperOffset maxLowerOffset i f l r == 0 i m g 0 d a s h = i m g d a s h ; e l s e i f l r == 1 i m g 1 d a s h = i m g d a s h ; end end %f i g u r e , i m s h o w p a i r ( i m g 0 d a s h , i m g 1 d a s h , ’ montage ’ ) i m w r i t e ( i m g 0 d a s h , o p f n 0 ) ; i m w r i t e ( i m g 1 d a s h , o p f n 1 ) ; %imshow ( i m g 0 d a s h ) ; %f i g u r e , imshow ( i m g 1 d a s h ) ; %a n a g l y p h i m g a g = s t e r e o A n a g l y p h ( i m g 1 d a s h , i m g 0 d a s h ) ; f i g u r e , imshow ( i m g a g ) ; i m w r i t e ( img ag , a n a g l y p h f n ) ; %end l o o k u p ’ i f ’ end %c l e a r m n o a b r e s ;

Appendix B: VHDL Rectification code

l i b r a r y IEEE ;

u s e IEEE . STD LOGIC 1164 .ALL;

u s e IEEE . s t d l o g i c u n s i g n e d .ALL;

u s e IEEE . NUMERIC STD .ALL;

u s e work . r e c t p a c k .a l l; u s e work . r e c t c o n s t .a l l; u s e work . r e c t f u n c .a l l; −− Uncomment t h e f o l l o w i n g l i b r a r y d e c l a r a t i o n i f u s i n g −− a r i t h m e t i c f u n c t i o n s w i t h S i g n e d o r Unsigned v a l u e s l i b r a r y IEEE PROPOSED ;

−−u s e IEEE PROPOSED . f i x e d p k g . a l l ;

u s e i e e e . f i x e d p k g .a l l;

−− Uncomment t h e f o l l o w i n g l i b r a r y d e c l a r a t i o n i f i n s t a n t i a t i n g −− any X i l i n x p r i m i t i v e s i n t h i s c o d e .

−− l i b r a r y UNISIM ;

−−u s e UNISIM . VComponents . a l l ;

(32)

P o r t ( c l k : s t d l o g i c ; r e s e t n : s t d l o g i c ; d : i n i m a g e t y p e ; q : o u t i m a g e t y p e ) ; end r e c t ; a r c h i t e c t u r e B e h a v i o r a l o f r e c t i s c o n s t a n t l r a d : i n t e g e r := 1 6 ; c o n s t a n t r r a d : i n t e g e r := −24; s u b t y p e s f i x e d r c i s s f i x e d ( l r a d − 1 downto r r a d ) ; s i g n a l dq : s f i x e d ( l r a d ∗2 − 1 downto r r a d ∗ 2 ) ; s i g n a l r , c : s f i x e d r c := t o s f i x e d ( 1 , l r a d − 1 , r r a d ) ; −−s h a r e d s h a r e d v a r i a b l e c o l s : n a t u r a l := 0 ; s h a r e d v a r i a b l e rows : n a t u r a l := 1 ; s h a r e d v a r i a b l e p i x b u f : p i x b u f t y p e ; s h a r e d v a r i a b l e p r e c o u n t r o w s , p r e c o u n t c o l s : n a t u r a l := 1 ; t y p e r o w k e e p e r t y p e i s a r r a y ( 1 t o r o w b u f f e r s i z e ) o f i n t e g e r ; s h a r e d v a r i a b l e r o w k e e p e r : r o w k e e p e r t y p e ; s h a r e d v a r i a b l e r o w k e e p e r c n t : i n t e g e r := 1 ; −−c o n s t a n t mat11 , mat12 ; −−s h a r e d c o n s t a n t mat11 , mat22 ; b e g i n −−q <= d ; −−c l k <= not c l k a f t e r 5 n s ; −−dq <= mat11 ∗ mat12 ; r c : p r o c e s s( c l k ) i s −−v a r i a b l e p r e c o u n t r o w s , p r e c o u n t c o l s : n a t u r a l := 1 ; v a r i a b l e rows : n a t u r a l := 1 ; v a r i a b l e e o l f l a g , n o t f i r s t f r a m e : b o o l e a n := f a l s e ; −−i m a g e s i z e and b u f f e r s i z e v a r i a b l e c o l c o u n t , rowcount : n a t u r a l := 0 ; −−c o n s t a n t r o w b u f f e r s i z e : n a t u r a l := 8 ; v a r i a b l e r o w b u f f e r s i z e b y 2 : n a t u r a l := 0 ; −−p r o j e c t i o n 1 v a r i a b l e p r 1 r , p r 1 c : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ; −−p r o j e c t i o n 1 r & c v a r i a b l e sd , s e : t 2 2 := (o t h e r s => (o t h e r s => ’ 0 ’ ) ) ; −−3 x1 matrix , v a r i a b l e mulout , mul2out : t 3 3 := (o t h e r s => (o t h e r s => ’ 0 ’ ) ) ; −−s f i x e d ( 6 3 downto −32); −−s f i x e d v a r i a b l e d , e , f −−u n d i s t o r t i o n v a r i a b l e r s q , c s q : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ; −−r s q u a r e , c s q u a r e v a r i a b l e r p c s q , q s q s q : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ; −−( r s q u a r e ) p l u s ( c s q u a r e ) v a r i a b l e u n d i s t p 1 , u n d i s t p 2 , u n d i s t p 3 : s f i x e d r c := t o s f i x e d ( 1 , l r a d − 1 , r r a d ) ; −−f o r t h e 3 p a r t s o f ( 1 + kc1 ∗ q ˆ2 + kc2 ∗ q ˆ 4 ) , −−s e e r e p o r t o r p a p e r . I n i t i a l i s e d t o 1 . v a r i a b l e u n d i s t : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ; −− f i r s t sub−p a r t o f u n d i s t o r t i o n e q u a t i o n ; u n d i s t o r t i o n co− e f f i c i e n t v a r i a b l e u n d i s t r , u n d i s t c : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ;

−− u n d i s t o r t e d camera frame co−o r d i n a t e s −−p r o j e c t i o n 2

v a r i a b l e p r 2 r , p r 2 c : s f i x e d r c := t o s f i x e d ( 0 , l r a d − 1 , r r a d ) ;

−−p r o j e c t i o n 2 r & c −−l o o k −up

Towards hardware accelerated rectification of high speed stereo image streams

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science with Specialization in

Embedded Systems - 30.0 credits

TOWARDS HARDWARE

ACCELERATED RECTIFICATION OF

HIGH SPEED STEREO IMAGE

STREAMS

Sudhangathan Bankarusamy

sby14001@student.mdh.se

Examiner: Mikael Ekstr¨

om

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Carl Ahlberg

M¨

alardalen University, V¨

aster˚

as, Sweden

Table of Contents

1

Introduction

1.1

Camera parameters

2

Problem Formulation

3

Related Work

3.1

Advantages and Disadvantages

4

Proposed Method

4.1

Project Work-flow

5

Stereo Camera Calibration

6

MATLAB Implementation

6.1

Rectification

6.2

Analytics

6.3

Conclusion

7

VHDL Implementation

7.1

Conclusion

8

Results

9

Conclusion

10

Future Work

References