Implementing Object and Feature Detection Without Compromising the Performance

(1)

Institutionen för datavetenskap

Department of Computer science

Examensarbete

Implementing Object and Feature

Detection Without Compromising the

Performance

by

Jonas Gerling

LIU-IDA/LITH-EX-A--16/010--SE

2016-05-08

(2)

Linköpings universitet

Institutionen för datavetenskap

Examensarbete

Implementing Object and Feature

Detection Without Compromising

the Performance

by

Jonas Gerling

LIU-IDA/LITH-EX-A--16/010--SE

2016-05-08

Handledare: Aseel Berglund Examinator: Henrik Eriksson

(3)

Abstract

This thesis will cover how some computationally heavy algorithms used in digital im-age processing and computer vision are implemented with WebGL and computed on the graphics processing unit by utilizing GLSL-shaders. This thesis is based on an already im-plemented motion detection plug-in used in web based games. This plug-in is enhanced with new features and some already implemented algorithms are improved. The motion detection is based on image subtraction and uses the delta image from previous frames to determine motion.

The plug-in is used in web based games so the performance is of utmost importance since bad performance leads to frustration and less immersion for the players.

Techniques brought up are edge detection, Gaussian filter, features from accelerated segment test(FAST) and Harris corner detection. These techniques will be implemented by utilizing the parallel structure of the GPU. Both Harris corner detection and features from accelerated segment test can be run in real time but the result of the Harris corner detection is the better of the two. The thesis will also cover different color spaces, how they are implemented and why they were implemented.

(4)

List of Figures

2.1 The figure illustrates the first row’s right to left scan and left to right scan. It shows that in the first right scan no labels are set since in the right scan it only copies labels from the upper neighbour or secondly the left neighbours value. The left scan of the first row sets new labels to the object pixels. . . 6 2.2 In the second row’s scans, the right to left scan starts to copy the neighbours and

is illustrated in the figure. Note that in the left scan it becomes obvious that object pixels with label C and B is part of the same object. The algorithm recognizes this and label C is set to point at B. . . 6 2.3 The third row scan continues in the same fashion and adjacent pixels within the

same object with different labels is noted. . . 6 2.4 A figure displaying the steps of the shape detection algorithm . . . 7 2.5 An image taken from the original report that illustrates the idea behind the FAST

corner detector[fast ]. If twelve pixels within the circle radius of three the candi-date pixel is classified as a corner. . . 9 2.6 The figures (a), (b) and (c) represent the different attributes a pixel can have. The

red arrows indicates the directions of intensity shifts. A pixel is classified a corner when there are intensity shift in multiple directions. In figure (a) the window does not have any shift in intensity and is therefore classified as flat, or surface. In figure (b) the window shifts intensity in the x-axis and can therefor be classified as an edge and in figure (c) the window have intensity shift in all directions and can be classified as a corner. . . 10 3.1 The image to the left displays the filter to create a horizontal gradient, the intensity

shift in the x direction. The right hand image is used to get the vertical gradient, the intensity shift in the y direction. . . 17 4.1 Figure (a) is the original image with no alternations made to it. Figure (b) is the

result of applying Gaussian blur to the original image. The Gaussian blur uses a variance of 1.8. Figure (3) shows the result when the overall luminosity is in-creased by 40 percent. . . 18 4.2 Both results are from the web camera stream in an office environment with normal

lighting. The two images represent the result from detecting pink in the texture and subtracts the background. The color pink was chosen because it was a color that was not as simple to detect as colors such as blue or red. . . 19 4.3 The figures is of me holding a paper containing circles with different colors. The

image to the left is the old implementation using chromacity space and the figure to the right is the new implementation using L*a*b color space. . . 19 4.4 Figure (a) is a test image created to help illustrate the result. Figure (b) is the

result when the background was subtracted from the original image. Figure (c) illustrates the different labels achieved when labeling the pixels, the labels are color coded to properly exhibit the result of the pixel labeling. The background is seen as an undefined object. . . 20

(6)

4.5 The result from a blurred gray-scale image with FAST applied. The Gaussian blurred used on the image uses a variance of 1.8. . . 21 4.6 The result when the FAST algorithm is applied to an unaltered image of high

qual-ity. . . 21 4.7 The results are when the image is held in front of the video camera. The images

is printed on a regular paper and the camera resolution used is 320x240. Image (a) uses the same threshold as the results previously shown of the FAST detector, 0.04. Image (b) uses a threshold of 0.1. . . 22 4.8 Figure (a) is the unaltered original image that the Harris algorithm will be applied

to in the end. Figure (b) illustrates the gray-scaled image, which is the second step in the algorithm. Figure (c) presents the result from the edge detection filter that was applied to the gray-scale result. The edge detection filter used was the sobel filters described in the method. . . 22 4.9 The result of the Harris corner detector. Pixels classified as potential corner pixels

are presented with blue. . . 23 4.10 The result of the Harris corner detector. Pixels classified as potential corner pixels

are presented with green and the eliminated corner candidates are presented with dark blue. . . 24 4.11 The result from the Harris corner detector from the web camera stream. The

im-age uses no-maxima suppression but the eliminated corner candidates are not pre-sented in the image. . . 25

(7)

Chapter 1 Introduction

1.1 Motivation

The game industry is growing and more people are starting to recognize the positive effects of gaming. A lot of studies are carried out about benefits and drawbacks of playing video games. A study carried out at Michigan State University indicates that children who play video games are more creative than children that do not play video games[8], the study was based on Torrance tests of creativity[16]. Depending on the game genre the increase in cre-ativity varies, but no matter what games are played they all increased the users crecre-ativity and decision making ability. Recent studies shows that dyslexic children improved their reading abilities drastically from playing action based video games and these are just a few examples of why it is interesting to develop and study video games[5].

One of the main concerns about playing video games is that the participants are stationary and sit still, but by using body interaction as user input the player gets activated in a physical way. This was first approached in modern day gaming by Nintendo with their Wii Remote controller for the Wii console. One of the Wii Remote’s features is the use of motion sensing to let the user interact with game objects via gestures. It also requires the user to point the controller towards a receiver. Xbox360 later introduced the Kinect game sensor, a full-body interaction device that lets the player use their whole body as a controller. The downside to this is that it is expensive to buy a console and the Kinect sensor.

A new trend in video games is to collect real life items such as collectable figurines that can be used to unlock certain parts of a game. Examples of these are Nintendos Amiibo figurines which unlocks content within games or save character data which enables users to bring game content to other locations. This is one way to improve the sense of being a part of the game which increase immersion. Immersion is an important aspect to consider when creating a game, it helps players to concentrate on the task at hand.

This thesis will cover some digital image processing techniques to enhance a plug-in. Some techniques used by modern self driving cars will be used to detect objects in a video stream. These techniques will be implemented on the graphics process unit by utilizing GLSL shaders.

1.2 Background

O. Havsvik explains in his master thesis at Linköping University how techniques used in surveillance systems can be used to create a motion detection application from the web cam-eras video stream and how to use these detected motions as controller input in web based games.[7] His master’s thesis is the basis of this thesis and the plug-in he created will be enhanced and new features will be added. The motion detection he created works, simply described by checking the difference between the previous and the current frame. The data attained is used to detect interaction with various game objects. The plug-in also consist of a color detection to detect a certain color in the video stream.

(8)

1.3. Aim

This plug-in has a similar effect as the Xbox360’s Kinect sensor but it is more convenient thus most modern households own a computer and a web camera. One aspect to consider when making an application for a wider audience is that even though most households do contain a computer, not all of these are expansive high performing game computers. It is im-portant to keep the performance cost down as much as possible, this can be done by moving as many calculations as possible to the graphics process unit(GPU) in the graphics card.

1.3 Aim

The main purpose of this thesis is to test different techniques to enhance an already imple-mented motion detection plug-in for web based games. This will be done by increasing the motion detection accuracy, without decreasing the performance and to implement new fea-tures to increase immersion.

1.4 Research Questions

How to improve a motion capture detection system for a WebGL-based game to enhance interaction and enable game symbol detection?

1.5 Delimitations

The application does not work offline, a WebGL compatible internet browser connected to the internet is a necessity. It is possible to host a virtual server but then the user needs access to the plug-in’s source code. Another delimitation is that the shape detection only focuses on simple geometries, it was chosen because there were no need for more complex geometries to be determined.

(9)

Chapter 2 Theory

Object detection and image matching is a well known problem in the field of computer vision. This chapter will explain a few well known algorithms that can be used to solve this prob-lem. Also some theory of color spaces and pre-process techniques used to enhance motion detection effectiveness.

2.1 Pre-Processing

2.1.1 Gaussian Filter

Gaussian blur is a well known convolution filter used in computer graphics to blur an im-age. D. Rákos explains how the Gaussian blur works and the use of it.[] Gaussian blur uses neighbour information and calculates how much a neighbour pixel should contribute to the blurring process. The weights are calculated with equation(2.1)

G(x, y) = 1 2πσ2e

´x2+y2

2σ2 _(2.1)

The σ2 in the equation is the variance, x and y are the current pixel coordinates. The Gaussian function above is a separable two dimensional filter and can be separated into two one dimensional convolution filters, vertical smoothing seen in equation(2.3)and horizontal smoothing seen in equation(2.2).

G(x) = ? 1 2πσ2e ´x2 2σ2 (2.2) G(y) =? 1 2πσ2e ´y2 2σ2 _(2.3)

By separating the function the computations will become more efficient since the number of texture pixel fetches are reduced.

To further improve the efficiency D.Rákos presents a technique that uses linear sampling by utilizing the GPU’s bi-linear texture filtering hardware function. This leads to even fewer texture pixel fetches because information from two texture pixels can be fetched at the same time. This performance increase is achieved by instead of using a neighbours pixel texture position an offset and weight is calculated for neighbour information fetches. The weight is calculated with equation(2.4) and the offset is calculated with equation(2.5).

weightL(p1, p2) =weightD(p1) +weightD(p2) (2.4)

o f f setL(p1, p2) =

o f f setD(p1)˚weightL(p1) +o f f setD(p2)˚weightL(p2)

(10)

2.2. Color image segmentation

2.2 Color image segmentation

One crucial aspect to consider when doing color image segmentation is what color space to work in. The base color space in toady’s technology is the RGB model. The RGB model is a three dimensional additive color space where a color is defined by how much red, green and blue that is used. To think of colors in RGB is not intuitive for humans. To get a darker shade of a color in RGB all three color components need to be modified, to us humans it is natural to think it needs less light.

In 1970s L*a*b color space became an international standard for perceptually uniform color space created by Commission Internationale de l’Eclairage(CIE)[3], which means that the euclidean distance between two colors corresponds accurate to how humans perceive two colors. In L*a*b the L is the luminosity component and goes from 0 to 100, the a is the chro-maticity component representing the red/green value and b is the chrochro-maticity component representing the yellow/blue value.

2.2.1 **Converting RGB to Lab**

Converting RBG to L*a*b consist of two main steps. The first step is to convert the RGB components to XYZ-space and the convert XYZ to L*a*b space. To convert RGB to XYZ-space the first step is to normalize the RGB components.[4] If the normalized color components exceeds a threshold of 0.04045 the color component is calculated with equation(2.6).

C= _(C₊ 0.055) 1.055 2.4 (2.6) If the normalized color component does not exceed the threshold it is calculated with equation(2.7).

C= C

12.96 (2.7)

The color components is then multiplied by 100 and the last step is the conversion of the color components to the XYZ-Space and is done by computing the XYZ converting matrix seen in equation(2.8).       X Y Z       =       0.412453 0.357580 0.180423 0.212671 0.715160 0.072169 0.019334 0.119193 0.950227       ˚       R G B       (2.8)

The next step in the conversion algorithm is to convert XYZ to L*a*b. First the XYZ com-ponents are divided by the tristimulus values of the reference to white and then calculated by equation(2.9).

C= "

C1/3 C ą 0.008856

(7.787 ˚ C) + (₁₁₆16) otherwise (2.9) The final step is to calculate the L*a*b components with the new X, Y and Z values. This is done with the following equations:

L=116 ˚ Y ´ 16 (2.10)

a=500 ˚(X ´ Y) (2.11)

(11)

2.3. Pixel Labeling

2.3 Pixel Labeling

Pixel labeling is used to separate the different objects in an image.

2.3.1 Using the Pixel’s Neighbourhood

P.Narkhede and A.Gokhale present a way to label pixels to identify objects in an image that uses L*a*b color space[11]. It starts by calculating the mean of a and b components for each pixel with a 3x3 neighbourhood which is used to calculate the difference between the mean and original value.

∆a(i, j) =amean(i ´ j)´a(i, j) (2.13) ∆b(i, j) =bmean(i ´ j)´b(i, j) (2.14) The difference values for a and b is used to calculate the euclidean distance, delta E, be-tween the two color components.

∆E(i, j) = b

(a(i, j)2₊_b₍_{i, j}₎2₎ _(2.15) The edge threshold, TE is calculated with the standard and average deviation of∆E. If ∆E is bigger than the threshold the pixel is an edge pixel. The other pixels, non-edge pixels, are considered initial seeds. Initial seeds with the similar mean values are assigned the same label.

TE="avgE

´0.7 ˚ stdE avg ´ 0.7 ˚ stdEą0

avgE otherwise (2.16)

The avgEand stdEare the average and standard deviation of∆E.

To this end the only pixels without a label are the edge pixels. By calculating the euclidean distance to their four closest neighbours the edge pixel gets the same label as the closest neighbour. At this stage the image is over segmented and the aim is to merge neighbouring regions if the distance of the mean color is less than a given threshold. In the last step the goal is to merge smaller regions with bigger regions with similar mean values.

2.3.2 Pixel by Pixel Image Scan

The image is seen as a two dimensional array and each row of the image is scanned twice, first from left to right, then from right to left. If the pixel value is larger than zero it is considered an object pixel. In the right scan, if the current pixel is an object pixel its label value is set to the upper neighbours label. If the upper neighbour does not have a label the object pixel takes the left neighbour’s label, if neither the upper nor left neighbour has an label the object pixel is left unchanged.

In the left scan the object pixel copies the right neighbour’s label if it does have one, if it does not have a label the object pixel sets a new label. If the pixel is a background pixel the pixel is left unchanged. If the object pixel has a label and the right neighbour also has a label the object pixel’s label is set to point at the right neighbour’s label.

(12)

2.3. Pixel Labeling

Figure 2.1: The figure illustrates the first row’s right to left scan and left to right scan. It shows that in the first right scan no labels are set since in the right scan it only copies labels from the upper neighbour or secondly the left neighbours value. The left scan of the first row sets new labels to the object pixels.

Figure 2.2: In the second row’s scans, the right to left scan starts to copy the neighbours and is illustrated in the figure. Note that in the left scan it becomes obvious that object pixels with label C and B is part of the same object. The algorithm recognizes this and label C is set to point at B.

Figure 2.3: The third row scan continues in the same fashion and adjacent pixels within the same object with different labels is noted.

(13)

2.4. Shape detection

2.4 Shape detection

S.Rege et al. suggest a method for shape detection of simple geometries that uses the differ-ence between an object’s area and the area of the surrounding bounding box to determine the geometry of the object.[12] Figure 2.4 below shows the flow chart to determine what shape a detected object has.

Figure 2.4: A figure displaying the steps of the shape detection algorithm

2.4.1 Converting RGB to Gray-scale

There are three well known methods to convert a color image into a gray-scale image, aver-age, lightness and luminosity. The most intuitive method is the average method(Eq. 2.17) that adds the color components together and divide the sum by the number of components. The lightness method(Eq.2.18) takes the highest and lowest color component values and add them together and divide the sum by two. The luminosity method(Eq.2.19) is the most sophisti-cated and take into account the human’s color perception. Humans are more sensitive to green than red and blue, in the luminosity method the tristimulus components are weighted according to humans perception.

R+G+B

3 (2.17)

max(R, G, B) +min(R, G, B)

2 (2.18)

(14)

2.5. Feature Detection

In the equations above the R is the red channel, the G is the green channel and the B is the blue channel. max takes the biggest value and min takes the smallest value.

2.4.2 Binary Image

In a binary image the pixels are either black or white. To achieve this, a technique called thresholding is used. Each pixel’s luminosity value is compared to a given threshold value, if the pixel value is lower than the given threshold the pixel is set to black, otherwise it is set to white[14]. When the objects are identified a bounding box is calculated and a check is made to see how many object pixels are located inside the bounding box. A decision tree is used to determine what shape the object has.

2.4.3 Morphological Operations

Morphological operations changes the local structure of an image by applying a structuring element and returns an output image with the same size. The structuring element is de-scribed by a matrix H(i, j)P[0, 1]. Morpgological operations can be used for different tasks, it can be used to reduce noise in an image but it is mostly known for closing and opening in binary images. It uses a pixel’s neighbourhood and determines the output pixel’s value by comparing it with the values of the neighbouring pixels. The two most basic morphological operations are dilation and erosion.

By applying dilation on a binary image the boundaries of objects in the image expands. The size of the expansion depends on the size of the structuring element. W. Burger and M. J. Burger describes the dilation operation as[1]:

I ‘ H=t(p+q)| for some I and q P Hu (2.20) Erosion is the exact opposite of dilation and shrinks the objects boundaries and is defined as:

I a H=!p PZ2| for every I and q P H) (2.21) The closing operation is done by applying dilation on an image and then, on the output image of the dilation result, use erosion. The result of this is that holes in objects are mini-mized or removed. Opening is done in the opposite way, erosion is applied on an input image and the result is dilated. This results in reduced boundaries without changing the size of the object and it also removes objects smaller the structuring element. Worth noticing, though it might be obvious is that both operations need to use the same size of H.

2.5 Feature Detection

Feature detection is used to find points of interest in an image that can later be used.

2.5.1 Features from Accelerated Segment Test (FAST)

Features from Accelerated Segment Test (FAST) corner detection with machine learning was created by E. Rosten and T. Drummond to be able to be used in real time applications. There were already plenty of good corner detection algorithms available such as SUSAN[15], SIFT[10] and Harris. But E. Rosten and T. Drummond felt the former corner detectors were too computational heavy and was not suited for real time applications of any complexity so they created FAST with machine learning.[13]

The segment test works by looking at a circle with a radius of three which becomes a circle with sixteen pixels around the corner candidate pixel p. If there is a set of n contiguous pixels within the sixteen circle pixels that are all brighter than pixel p plus a given threshold or if

(15)

there is a set of n which are all darker than pixel p minus the given threshold the candidate pixel p is classified as a corner. First a high speed test is done to candidate pixel p which looks at four pixels in the circle. If three of these pixels are darker or brighter than candidate pixel p it could be a corner and if so, the full segment of the circle is checked and if twelve or more pixels are darker or brighter in the pixel the candidate pixel i classified a corner. There are some weaknesses to this detector according to Rosten and Drummond:

1. The high-speed test does not generalise well for n < 12.

2. The choice of ordering of the test pixels contains implicit assumptions about the distri-bution of feature appearance.

3. Knowledge from the first fours test is discarded 4. Multiple features are detected adjacent to one another.

Figure 2.5: An image taken from the original report that illustrates the idea behind the FAST corner detector[13]. If twelve pixels within the circle radius of three the candidate pixel is classified as a corner.

2.5.1.1 Machine Learning a Corner Detector

The machine learning part addresses the three first points and consists of two parts and the last point is addressed using non-maximal suppression. The first part is the corner detection. The corner detection algorithm explained is used on a set of training images. For candidate pixel p, the circles pixels are stored in an vector P each position in the circle can consists of three different states:

SpÑx= $ & % d, IpÑx ďIp´t (darker) s, Ip´t ă IpÑx ăIp+t (similar) b, Ip´t ď IpÑx (brighter) (2.22) Depending on the states of the circle pixels the the vector P is divded into three sub-sets, Pd,Ps and Pb. Each subset is queried in a ID3 decision tree classifier which is then used for fast detection in other images.

2.5.1.2 Non-maximal Suppression

The problem with detecting multiple feature points in adjacent locations is solved with non-maximal suppression. First it calculates a score function for all the detected feature points by calculating the absolute difference between P and the circle information. Adjacent feature points are compared and the feature point with the lowest score value is discarded.

(16)

2.5.2 Harris Corner Detection

The Harris corner detector works by looking at intensity variations within a sliding win-dow w(x, y)with displacements u and v. These intensity variations are gradient information which can be achieved through a variety of different edge detection algorithms such as Sobel and Prewitt.[6]

E(u, v) =ÿ x,y

w(x, y) [I(x+u, y+v)´I(x, y)]2 (2.23) Where w(x, y)is the weighted window function and I(x, y)is the intensity at position(x, y). I(x+u, y+v)is the intensity within the moved window. Figure below illustrates different pixel attributes that can be determined by looking at the variations of the intensity.

(a) Flat (b) Edge (c) Corner

Figure 2.6: The figures (a), (b) and (c) represent the different attributes a pixel can have. The red arrows indicates the directions of intensity shifts. A pixel is classified a corner when there are intensity shift in multiple directions. In figure (a) the window does not have any shift in intensity and is therefore classified as flat, or surface. In figure (b) the window shifts intensity in the x-axis and can therefor be classified as an edge and in figure (c) the window have intensity shift in all directions and can be classified as a corner.

To this end the aim is to maximize the E(u,v) by applying the Taylor expansion and expand the equation to be able to express the equation with a matrix.

Taylor Expansion:

E(u, v)«ÿ x,y

[I(x, y) +uIx+vIy´I(x, y)]2 (2.24)

Expanding: E(u, v)«ÿ x,y u2I_x2+2uvIxIy+v2I_y2 (2.25) Matrix form: E(u, v)«u v ÿ x,y w(x, y) Ix2 IxIy IxIy I2 y ! u v (2.26) The final result is a matrix that can be used to calculate the Harris response.

M=ÿ x,y w(x, y) Ix2 IxIy IxIy Iy2 (2.27)

(17)

2.5.2.1 The Harris Response

The Harris response R, describes the "cornerness" of a point of interest and is calculated with equation 2.28.

R=det(M)´k(traceM)2 (2.28) The k is a given constant threshold between 0.04 and 0.06. The size of the calculated R describes the pixel’s attribute. The three attributes are flat, an edge or a corner.

(18)

Chapter 3 Method

This chapter will cover how different digital image processing techniques were implemented. Most of the different algorithms described in the theory are slow and computational heavy and is not meant to be used in real time applications. The goal was to make these algorithms run in real time by utilizing the parallelism available on the GPU by carrying out the calcula-tions with shaders.

3.1 Feasibility Study

The first thing that was done in the progress of implementing the suitable algorithms which would enhance an already existing plug-in, was a literature study. The literature study was carried out to find what algorithms and filters that would work best and which was most suitable for the task at hand. There are many different ways to solve most problems and some old ideas can be modified to be used in the most modern applications. Not only was there a need to find what algorithms to use but also be able to motivate why this work might be needed and of use.

One of the most important things to consider when searching for information in your field is to know what to search for and to check if the source of information is reliable. In this case "computer vision" and "digital image processing" was key phrases with the specific task that was of interest. Since it was known that it was a WebGL based plug-in, that was also taken into account when choosing algorithms and filters. Shaders are very powerful when it comes to do calculations in parallel on a numerous amount of fragments but it lacks the knowledge of a fragment’s specific position and it is not possible to decide in which order the fragment calculations are carried out. This leads to finding algorithms that mostly use neighbour information and/or temporal information.

A test program was created for two purposes; get a deeper knowledge of how the plug-in worked and how it was set up, the other purpose was to create a test environment with more flexibility and mobility to test and implement the algorithms that was found suitable in the literature study.

(19)

3.1. Feasibility Study

Table 3.1: The specifications of the equipment used in this thesis.

Type Specification

Computer Model HP Z220 CMT WorkStation

Processor Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz 3.40 GHz

RAM 32.0 GB

System Type 64-bit Operating System Display Adapter Intel(R) HD Graphics 4000

Camera USB Camera-B4.09.24.1 Camera Resolution 800x600 pixels Canvas Resolution 800x600 pixels

3.1.1 Performance and Quality Analysis

The performance and quality analysis was empirically evaluated. The feedback from the motion detection was measured with the human eye. The necessity of some methods and results was also be empirically evaluated. It is worth mentioning that some results chosen in this report might not always be the best, this thesis is based on reducing information to process in later stages for faster results and some times more information gives an advantage in finding better results.

3.1.2 Frameworks

Web Graphics Library (WebGL) is integrated into the standard web browsers available and is a JavaScript API used for rendering interactive computer graphics. The GPU accelerated calculations available in WebGL makes it suitable for image processing and physics simula-tions that can become computationally heavy. WebGL 1.0 was released 3rd March, 2011 by the non-profit Khronos group and is based on OpenGL ES 2.0.[9]

In WebGL there are two types of shaders in the programmable pipleline; the vertex shader and the fragment shader. The vertex shader used in this thesis is a normal pass through shader that sends the data of interest to the fragment shader. The vertex shader can be used to manipulate the vertices properties such as orientation, color and texture coordinates. The vertex shader is not able to duplicate or create more vertices, this can be done in a geome-try shader in standard OpenGL but it is not yet implemented in WebGL. The fragment on the other hand is a useful tool for digital image processing with WebGL since it performs calculations on each fragment of a texture, a fragment can be seen as a pixel. By using the tex-ture coordinates and the textex-tures height and width, an offset can be calculated which enables access to a fragment’s neighbourhoods information. Neighbour information can be used in various digital image processing techniques such as edge detection and averaging filters to blur and reduce image noise. In shaders there are three different variable declarations: Uni-forms, Attributes and Varyings. Unifroms do not change across the pipeline and is sent to both the vertex and fragment shader. Attributes have a one-to-one relationship with vertices and are only sent to the vertex shader. An attribute is a value applied to individual vertices. The last variable declaration is varyings, which is value first sent to the vertex shader and than passed to the fragment shader. Varyings can be altered in the vertex shader before it is sent forward in the graphical pipeline.

(20)

3.2. Implementation

3.2 Implementation

This section will cover how some of the different algorithms was implemented and the dif-ference between how they were implemented from how the theory behind the algorithms describes them. The plug-in uses a pre-processing technique which will be described since it has been modified.

Three.js was used to generate geometry and to render the scene[2]. To render anything with Three.js a scene, a camera and a renderer is needed, which was first initialized. Ge-ometries are added to the scene and the camera and scene is passed to the renderer. In 3D applications lights are also added to the scene but in this 2D application no lights are needed. To be able to capture the video stream from the camera with navigator.getUserMedia a video element and a canvas element was created. It was important to take into account that different browsers handle the web-cameras video stream in different ways. To make sure the application works in all compatible web-browsers a check was made to see in what way the video stream should be handled. Browsers that does not use the raw video stream requires window.URL.createObjectURL(stream).

The video stream was rendered to a Three.js texture. This texture needs to be constantly updated in the render loop with the drawImage function. For the drawImage function to be called a readyState had to be set to true. The readyState was set to true when the video stream consisted of enough data.

To accomplish multiple shader passes as maintainable and simple as possible the Three-RTT API was used. With ThreeThree-RTT it is possible to keep a buffer of previous frames which is needed in some image processing techniques used in this thesis.

3.2.1 Pre-Processing

The plug-in always use Gaussian blur to make the motion detection the most accurate with its ability to reduce noise. The Gauassian blur gives the best empirical result and is one of the fastest blur techniques when done on the graphics processing unit according to O.Havsvik[7]. The Gaussian filter used in the plug-in uses two different shader passes, one for horizontal blur and one for vertical blur. The difference between the two shaders are the offsets sent in to shaders, the horizontal shader uses pre-computed horizontal offsets and the vertical shader uses pre-computed vertial offsets. These offsets are sent in to the shaders as array uniforms. Both shaders use the same σ and is sent in as a float uniform.

The amount of blur achieved with Gaussian blur is defined by the size of the Gaussian weights. The problem with Gaussian blur is that the amount of blur also defines how much darker the resulting image becomes. In dark environments the motion detection is not as effective as in well lit environments.

To compensate for the darkening of the resulting image another shader pass was added to the general pre-processing stage and was added before the Guassian shaders were applied. The shader utilize the luminosity component in L*a*b space by first converting the camera texture to L*a*b space and then increasing the luminosity component L, depending on the Gaussian weights. Before the result is passed to next shader the L*a*b components are con-verted back to RGB-space. The resulting image is sent in to the vertical Gaussian shader.

3.3 Object Detection and Recognition

3.3.1 Color Detection and Background Subtraction

An array of predefined L*a*b representations of colors that are of interest, are sent to an shader along with the texture containing the pre-processed web camera stream. The shader consists of two functions, one that transforms a three component RGB vector to L*a*b-space

(21)

3.3. Object Detection and Recognition

and the other inverse transform a three component L*a*b vector back to RBG-space. The computations used to do these transformations are described in the theory part of the thesis. First thing that is done in the shader is to save the texture fragment to a three component RGB-vector with the function texture2D.

vec3 rgb=texture2D(texture, uv).xyz;

The texture is the web cameras stream and uv are texture coordinates. Texture2D returns a four component RGBA vector with the forth component being the alpha channel, by adding .xyz it returns a three component RGB vector instead.

The shader then calculates the euclidean distance between the texture fragment and the different colors in the color array, if the distance is less than a given threshold the texture fragment keeps its initial RGB values and if the distance exceeds the threshold the texture fragment is set to the color black and can be seen as the background. The web camera perceive some colors more correctly than others which leads to the use of varying thresholds for the different colors of interest.

3.3.2 Pixel Labeling

To compensate for the web cameras poor video quality the result from the color detection needs to be "closed" to fill in holes that can occur in the detection and subtraction stage. These holes occur due to spectral noise from the web cameras video stream which leads to inaccu-rate pixel values across the object’s surface. The closing described in the theory is based on handling binary images but in this application a color image will be closed. The principle is the same, but instead of looking for a pixel value of one within the current pixel’s neighbour-hood, the pixel gets the neighbourhood’s max value. The max value of the neighbourhood is achieved through an already defined shader function, max(). The closing operation is done with two different shader passes. The first shader pass dilates the the image and the result is sent to the second shader pass that erodes the image.

To fetch the image information from the shader that performs the closing operation, the function glReadPixels is used to store the scene in a one dimensional array that can be ac-cessed on the CPU. This has to be done since fragment shaders performs the calculations per fragment and makes it impossible to perform the scans in the correct order. To define how many objects of interest that are rendered in the scene a simple object detection algorithm is used, described in the theory, where all rows of pixels are scanned from right to left and from left to right.

An object is a group with the same labels. For each object the length, height and area is calculated. Due to noise there might be small artificial objects that should be classified as objects. To determine if an object is of interest it has to pass a certain criteria, its area has to exceed a certain threshold which is dependent on the size of the texture. If the object is of interest a JavaScript object is created with information about the object’s height, length, area and position.

3.3.3 Shape

The shape detection is done similarly to the method described in the theory session. The biggest difference is that the image is not converted to gray scale nor into a binary image. The reason for this is that it would only induce a higher computational cost without improving the end result. All three methods for converting an RGB-image to gray-scale mentioned in the theory calculates an average of the three color channels and store the result in a one-dimensional array. To accomplish this two different ways were considered. It could have been done on the GPU by doing the calculations in the fragment shader and store the gray value in the alpha channel or it could have been done directly on the CPU. If it would have been done on the GPU the computational cost would not be too immense but as stated, would

(22)

not have any greater impact of the end result. The CPU implementation would have been a very costly experience since it lacks the parallel efficiency the GPU possess. The function that determines an objects shape takes in an array containing JavaScripts objects. For each object a bounding box is calculated. All pixels within a bounding box is checked and all the object pixels are counted. The counted object pixels are compared to the area of the bounding box. The ratio between the counted object pixels and the area of the bounding box determines what shape the object has.

3.4 Feature Detection

3.4.1 Features from Accelerated Segment Test (FAST)

Instead of doing the machine learning part mentioned in the theory to create a fast corner detection suitable for real time applications the calculations and neighbour information will be made in a shader. The FAST corner detection shader will not process a pre-processed texture instead it uses the texture to be as high quality as possible.

The FAST shader takes in a gray-scale image. The basic structure is explained in the theory part of the thesis. It looks at neighbour information with a radius from the current pixel with a radius of three. These neighbours are extracted within the shader by offsetting the texture coordinates and fetching the pixel values at this offset explained earlier.

vec4 rgba=texture2D(texture, uv+vec2(w, h)) (3.1) In equation(3.1) the w is the horizontal offset calculated by dividing one with the input textures width, the h is calculated by dividing one by the height of the input texture and is the vertical offset. texture2D returns a four component vector with the pixel at that offset’s RGBA values.

3.4.2 Harris Corner Detection

The Harris Corner detection can be achieved on the GPU with a few shader passes. This method does not use pre-processed texture explained earlier in this chapter. The shaders the pre-process contain will be used but in a later part in this method. This is because the Gaussian blur is essential when doing a Harris corner detector since the detector works better with blur and at a certain time in the process. Depending on the amount of blur, the threshold needs to be adjusted when calculating the Harris response to get an accurate performance.

The first part of the Harris corner detection is to convert the texture from RGB to gray-scale. This is done with the luminosity method mentioned in the theory with equation(2.19). Any of the other methods for converting RGB to gray-scale can be used. The red channels output is simply multiplied with 0.21, the green channel output is multiplied with 0.72 and the blue channel output is multiplied by 0.07.

The resulting texture from the gray-scale shader is passed to an edge detection shader. The edge detection method used is the sobel operator that detects intensity shift. There is one sobel filter to detect vertical intensity changes and one filter to detect horizontal intensity changes and they can be seen in figure 3.1 below.

(23)

Figure 3.1: The image to the left displays the filter to create a horizontal gradient, the intensity shift in the x direction. The right hand image is used to get the vertical gradient, the intensity shift in the y direction.

This shader also calculates the product of the intensities. The x-gradient is sent out of the shader in the green channel, the y-gradient is put in the red channel and the product of the gradients are put in the blue channel.

The next step in the Harris corner detection is to blur the resulting texture from the edge detection shader. This is done with the same shaders as the pre-process stage, explained earlier in the chapter.

With the information achieved from the blurred texture the Harris response is calculated. This method does not use the same equation to calculate the Harris response as mentioned in the theory, a slight modification was made to get a greater result, the modified equation(3.2) can be seen below.

R= det(M)

k(trace(M))2 (3.2)

The resulting equation R after expanding can be seen in equation(3.3). R= I 2 x˚Iy2´I2xy k(I2 x+Iy2) (3.3) The I2xis the gradient information calculated in the edge detection stage that was sent in the red channel, I2yis the y gradient sent in the the green channel and I2xyis the product of the gradients sent in the blue channel. k is the threshold and was set to 0.055.

If the Harris response R was bigger than a chosen threshold it was classified as a corner pixel, if so the Harris response was normalized and put in the alpha channel in the RGBA output vector from the shader. If the pixel was not classified as a corner pixel the output vector was set to be cam video texture.

The last step of the implementation of the corner detector was to use non-maximum sup-pression to eliminate classified corners adjacent to each other. Only the corner point with the highest value of the Harris response is wanted. For each corner candidate a check was made with a neighbourhood of five by five. If any of the neighbours had an alpha value bigger than the current pixel the current pixel value was set to the web camera texture and the alpha value was set to zero.

(24)

Chapter 4 Results

In this chapter the results acquired will be illustrated with images of the resulting outputs. The original unaltered images will also be presented to simplify the comparison with the final result.

4.1 Feasibility Study

After the literature study was carried out, the chosen algorithms were implemented in the plug-in. Changing color space for methods that needs color subtraction and color info. A shape detection algorithm was implemented to locate simple geometries in a scene, this is done by locating points of interests by looking for color clusters. Dilation and erosion was used to close holes and remove isolated pixels. Features from Accelerated Segment Test (FAST) was implemented on the GPU, Harris corner detection was also implemented and the calculations was carried out on the GPU.

4.2 Implementation

4.2.1 Pre-process

When adding Gaussian blur to an image the overall luminosity is decreased and the image becomes darker. The result of the overall luminosity increase described in the method can be seen in figure 4.1 below.

(a) Original image (b) Blurred image (c) Color corrected

Figure 4.1: Figure (a) is the original image with no alternations made to it. Figure (b) is the result of applying Gaussian blur to the original image. The Gaussian blur uses a variance of 1.8. Figure (3) shows the result when the overall luminosity is increased by 40 percent.

4.2.2 Color detection and background subtraction

The following images will present the result of the improved color detection and background subtraction. The result of the new color detection algorithm will be compared to the pre-viously implemented method. The biggest difference is the choice of color-space but both

(25)

4.2. Implementation

methods use a pre-defined threshold and use the euclidean distance to determine if the color is within range.

(a) Old implementation (b) New implementation

Figure 4.2: Both results are from the web camera stream in an office environment with normal lighting. The two images represent the result from detecting pink in the texture and subtracts the background. The color pink was chosen because it was a color that was not as simple to detect as colors such as blue or red.

In figure 4.3 below another result from the color detection is illustrated. The image below, is of me holding a paper with different colored circles. The shadow from the paper made my face disappear completely compared to figure 4.2.

(a) Old implementation (b) New implementation

Figure 4.3: The figures is of me holding a paper containing circles with different colors. The image to the left is the old implementation using chromacity space and the figure to the right is the new implementation using L*a*b color space.

(26)

4.3. Pixel Labeling and Shape Recognition

4.3 Pixel Labeling and Shape Recognition

The result from labeling the pixels can be seen in figure 4.4.

(a) Original image (b) Background subtraction (c) Color coded labels Figure 4.4: Figure (a) is a test image created to help illustrate the result. Figure (b) is the result when the background was subtracted from the original image. Figure (c) illustrates the different labels achieved when labeling the pixels, the labels are color coded to properly exhibit the result of the pixel labeling. The background is seen as an undefined object.

The result printed in the console log can be seen in table 4.1 below.

Table 4.1: Table displaying the information outputed in the console log. Position Shape Length Height Ratio

0 Nan Nan Nan Nan

1 Circle 151 151 0.7911 2 Circle 175 175 0.7918 3 Triangle 109 110 0.5427 4 Circle 175 175 0.7915 5 Circle 141 142 0.7913 6 Square 99 99 1.0000

The Nan in the table means it is the background and is not defined as an object.

4.4 Feature Detection

4.4.0.1 Features from Accelerated Segment Test (FAST)

The texture used to help illustrate the result of the FAST feature detection is the same as in pre-process stage. The two figures below will show the result of the FAST algorithm. Neither of the results use non-maxima suppression and they both use the same threshold value of 0.04. Figure 4.5 will exhibit the result of the algorithm when blur is added to the texture and figure 4.6 will illustrate the result from an unaltered texture. Pixels classified as corners is painted blue.

(27)

Figure 4.5: The result from a blurred gray-scale image with FAST applied. The Gaussian blurred used on the image uses a variance of 1.8.

Figure 4.6: The result when the FAST algorithm is applied to an unaltered image of high quality.

The figure 4.7 below illustrates the result from when the image is held up in front of the web camera. The web camera is not the same as the one mentioned in the method. The native resolution of the camera used to capture the result is 320x240.

(28)

(a) Threshold of 0.04. (b) Threshold of 0.1.

Figure 4.7: The results are when the image is held in front of the video camera. The images is printed on a regular paper and the camera resolution used is 320x240. Image (a) uses the same threshold as the results previously shown of the FAST detector, 0.04. Image (b) uses a threshold of 0.1.

4.4.0.2 Harris Corner Detection

Figure 4.8 below illustrates the results of the different filters used before the Harris algorithm is applied. The unaltered starting image and the resulting gray-scale image can be seen. The result of the edge detection is exhibited but the result of the Gaussian filter applied to the edge detection result will not be presented.

(a) Original image (b) Gray scale (c) Edge detection

Figure 4.8: Figure (a) is the unaltered original image that the Harris algorithm will be applied to in the end. Figure (b) illustrates the gray-scaled image, which is the second step in the algorithm. Figure (c) presents the result from the edge detection filter that was applied to the gray-scale result. The edge detection filter used was the sobel filters described in the method. The result of the Harris detector applied on the blurred edge detection result can be seen in figure 4.9. The threshold used in the Harris response calculations is 0.055 and for the pixel to be classified as a corner candidate the Harris response most be great than 0.48825. Pixels classified as corner candidates are painted blue.

(29)

Figure 4.9: The result of the Harris corner detector. Pixels classified as potential corner pixels are presented with blue.

(30)

In figure 4.10 the final result of the Harris detector is presented. It uses a non-maxima suppression with a five by five neighbourhood. Pixels classified as corners are presented with green and the dark blue pixels are the eliminated corner candidates.

Figure 4.10: The result of the Harris corner detector. Pixels classified as potential corner pixels are presented with green and the eliminated corner candidates are presented with dark blue.

(31)

The figure 4.11 below presents the result of the Harris corner detector when the image is captured in the web camera.

Figure 4.11: The result from the Harris corner detector from the web camera stream. The image uses no-maxima suppression but the eliminated corner candidates are not presented in the image.

(32)

Chapter 5 Discussion

This chapter will criticize and discuss the results achieved and the methods that were used.

5.1 Results

First of all the overall result of the implemented algorithms is satisfactory. The result corre-sponds well to the literature study, thus the results were expected.

When it comes to images and video, lighting will always be a factor which needs to be taken into account. The quality of the camera equipment correlate with the need for good lighting in the environment. The poorer the equipment the more light is needed to achieve satisfying quality of the images.

The result from color correcting the darkened texture from the result of the Gaussian filters by utilizing the luminosity component in L*a*b space was a good. The improvement of the color detection and background was also satisfactory. In some cases it leads to incorrect color representations. This is not a problem though, the motion detection uses frame difference and the color reproduction is not an issue. The point of the solution is to make it easier to detect differences in the image, which it does.

The object detection and shape recognition is the weakest part of the implemented func-tions since it needs to get the current context from the graphical processing unit and a lot of computations are done on the CPU, which leads to performance loss. The performance loss corresponds to the amount of objects detected in the image. With less objects the better the algorithm works.

Features from the accelerated segmentation test implementation gave the expected result according to the literature [17]. It requires a high quality texture and is not really suited for this type of application that uses web camera steam as a texture. In the result it can be seen that the lower the resolution and quality of the image the worse the result get. We can also see that by modifying the threshold used in the algorithm the result can be improved.

The Harris corner detector’s result was good, as expected[6]. It is considered a robust but slow algorithm but by utilizing the graphics processing unit it is a simple task to make it run in real time. The result from using the web camera stream is better than expected. As stated it is a robust method but I did not expect it to perform so well with the camera’s poor quality input.

Both Harris and FAST share one problem, both methods uses pre-defined thresholds. The problem with thresholds are that they need to be modified according to the users device. For the Harris implementation this is not a big problem, but for the FAST implementation this is a big issue. In the end it is easy to see that the Harris implementation gives a much better result.

(33)

5.2. Method

5.2 Method

Doing digital image processing techniques with WebGL and shaders is extremely efficient and is definitely a great way to make computionally heavy algortihms run in real time. Though it is not all fun and games, WebGL is based on a quite old version of the original OpenGL and the output from shaders are very restricted since textures in WebGL are un-signed byte RGBA textures. By using unun-signed byte texture, in some cases there will be loss of data due to the fact that unsigned bytes only use values from 0 to 255. To acquire data from the current context the pre-defined WebGL function gl.readPixels()is used. gl.readPixels()is a slow process since the CPU needs to wait until all computations done on the GPU are com-pleted. In the end I had big problems with getting correct information from the shaders due to the rounding errors occurred when converting the float outputs of the shaders to WebGL unsigned byte textures.

The use of Three.js simplifies the complexity of WebGL and lets the user create a scene and geometry with ease compared to a pure WebGL implementation [9][2]. One problem to consider is that it loads a lot of unused functionality. Loading functionality that is not used and can compromise the performance. ThreeRTT.js made the task of using multiple shader passes a simple task. Overall, even when considering there might be some performance loss from using external libraries the amount of time saved by doing so is worth considerably more. That time can be used to create better filters and optimizing other implemented func-tions. In much bigger applications where all parts need to be optimized to perfection it might be better to use pure WebGL.

5.3 The work in a wider context

It is safe to say not all people are comfortable to be in front of a camera. Cameras in all its forms make some people feel insecure and uneasy. Having the camera constantly on may make people feel like they are being watched. Another concern to consider is that the ap-plication is online-based and even though most people use the internet daily some still feel uneasy about using it. The combination of both being online and at the same time have a camera constantly watching your every move might affect some of the users, which makes it important to make the users feel safe but also make them safe, by having a secure platform.

(34)

Chapter 6 Conclusion

The two main purposes of this thesis was to enhance an already implemented plug-in and to achieve image comparison and recognition without reducing the applications performance.

One of the best results in this thesis is the enchantment of the color detection algorithm that was already implemented. The previously implemented color detection algorithm that O.Havsvik mentions in his report and refers to it as chromacity space is mostly known as CIE XYZ color space [7]. The algorithm implemented in this thesis is an based on the CIE XYZ color space. The CIE XYZ was invented at the same time as RGB-space and was created to make a color space more accurate to how people perceive color. The CIE Lab color space implemented in this thesis extended the XYZ space and it is a more sophisticated and more accurate than the previous. The improvement from the change of color space is clearly shown in the results chapter (chapter 4). The result fits the theory quite well. The CIE XYZ was created at the same time as the RGB-space in 1931, their formal names are CIE 1931 RGB color space and CIE 1931 XYZ color space. The CIE XYZ was improved in the 70’s and the result was CIE 1976 (L*, a*, b*) color space. The change in color space lead to a better way to extract shadows and a improvements in the pre-processing stage. The change of color space enabled the ability to change the luminosity in the scene which lead to more accurate motion detection in environments with less light.

The image comparison and recognition part was only partly achieved. The feature detec-tion created are the corner stone of image recognidetec-tion. The thesis focused a lot on finding the best algorithms to achieve the best possible feature detection without affecting the per-formance. Thereby it was important to find algorithms which could be implemented on the graphics processing unit. Both Harris algorithm and Features from accelerated segment test were eligible for GPU programming using WebGL shaders.

A few ideas of how to fully complete the image detection was considered. One idea was to use the eliminated corner candidates mentioned in the method (chapter 3) and use their Harris responses to get unique information of all the corners. The problem with this idea was the lack of information received from previous shader passes. Another idea to achieve unique information for comparison was to take a point in the middle of the image of interest and use the classified corners from the Harris detector and calculate a length and a angle between that point and all classified corners. This would lead to a robust solution to both scale and rotation since the angles does not change due to length. This idea occurred to me at the very end of my thesis work which lead to that i did not have the time to implement it.

In the end I am quite satisfied with the result, even though I generally do not like to leave work unfinished. I will most likely continue with this work when the time is right. I can clearly see the many possibilities this application has to offer in both games and in other aspects. This thesis taught me just how powerful the graphics processing unit really is and I look forward to the feature development in this area.

(35)

Bibliography

[1] Wilhelm Burger and Mark J Burge. Digital image processing: an algorithmic introduction using Java. Springer Science & Business Media, 2009.

[2] Ricardo Cabello. Three.js.URL: http://threejs.org/.

[3] cie. Commission Internationale de l’Eclairage @ONLINE.URL: http://www.cie.co. at/.

[4] Adrian Ford and Alan Roberts. Colour space conversions. August 11, 1998. 2010.

[5] Sandro Franceschini et al. “Action video games make dyslexic children read better”. In: Current Biology 23.6 (2013), pp. 462–466.

[6] Chris Harris and Mike Stephens. “A combined corner and edge detector.” In: Alvey vision conference. Vol. 15. Citeseer. 1988, p. 50.

[7] Oskar Havsvik. “Enhanced Full-body Motion Detection for Web Based Games using WebGL”. In: (2015).

[8] Linda A Jackson et al. “Information technology use and creativity: Findings from the Children and Technology Project”. In: Computers in human behavior 28.2 (2012), pp. 370– 376.

[9] khronos. This is a test entry of type @ONLINE. June 2009.URL: http://www.test. org/doe/.

[10] David G Lowe. “Object recognition from local scale-invariant features”. In: Computer vision, 1999. The proceedings of the seventh IEEE international conference on. Vol. 2. Ieee. 1999, pp. 1150–1157.

[11] Prachi R Narkhede and Aniket V Gokhale. “Color image segmentation using edge de-tection and seeded region growing approach for CIELab and HSV color spaces”. In: Industrial Instrumentation and Control (ICIC), 2015 International Conference on. IEEE. 2015, pp. 1214–1218.

[12] Sanket Rege et al. “2D geometric shape and color recognition using digital image pro-cessing”. In: International journal of advanced research in electrical, electronics and instru-mentation engineering 2.6 (2013), pp. 2479–2487.

[13] Edward Rosten and Tom Drummond. “Machine learning for high-speed corner detec-tion”. In: Computer Vision–ECCV 2006. Springer, 2006, pp. 430–443.

[14] Mehmet Sezgin et al. “Survey over image thresholding techniques and quantitative performance evaluation”. In: Journal of Electronic imaging 13.1 (2004), pp. 146–168. [15] Stephen M Smith and J Michael Brady. “SUSAN—A new approach to low level image

processing”. In: International journal of computer vision 23.1 (1997), pp. 45–78.

[16] E Torrance. “Predictive Validity of the Torrance Tests of Creative Thinking*”. In: The Journal of Creative Behavior 6.4 (1972), pp. 236–262.

[17] Miroslav Trajkovi´c and Mark Hedley. “Fast corner detection”. In: Image and vision com-puting 16.2 (1998), pp. 75–87.

(36)

Implementing Object and Feature Detection Without Compromising the Performance

Institutionen för datavetenskap

Department of Computer science

Examensarbete

Implementing Object and Feature

Detection Without Compromising the

Performance

by

Jonas Gerling

LIU-IDA/LITH-EX-A--16/010--SE

2016-05-08

Examensarbete

Implementing Object and Feature

Detection Without Compromising

the Performance

by

Jonas Gerling

LIU-IDA/LITH-EX-A--16/010--SE

2016-05-08

Contents

List of Figures

Chapter 1

Introduction

1.1

Motivation

1.2

Background

1.3

Aim

1.4

Research Questions

1.5

Delimitations

Chapter 2

Theory

2.1

Pre-Processing

2.1.1

Gaussian Filter

2.2

Color image segmentation

2.2.1

Converting RGB to L*a*b

2.3

Pixel Labeling

2.3.1

Using the Pixel’s Neighbourhood

2.3.2

Pixel by Pixel Image Scan

2.4

Shape detection

2.4.1

Converting RGB to Gray-scale

2.4.2

Binary Image

2.4.3

Morphological Operations

2.5

Feature Detection

2.5.1

Features from Accelerated Segment Test (FAST)

2.5.2

Harris Corner Detection

Chapter 3

Method

3.1

Feasibility Study

3.1.1

Performance and Quality Analysis

3.1.2

Frameworks

3.2

Implementation

3.2.1

Pre-Processing

3.3

Object Detection and Recognition

3.3.1

Color Detection and Background Subtraction

3.3.2

**Converting RGB to Lab**