Mathematical Analysis of Intensity Based Segmentation Algorithms with Implementations on Finger Images in an Uncontrolled Environment

(1)

School of Education, Culture and Communication

Division of Applied Mathematics

BACHELOR THESIS IN MATHEMATICS / APPLIED MATHEMATICS

Mathematical Analysis of Intensity Based Segmentation Algorithms with

Implementations on Finger Images in an Uncontrolled Environment

by

Lisa Svens

DIVISION OF APPLIED MATHEMATICS

(2)

School of Education, Culture and Communication

Division of Applied Mathematics

Bachelor thesis in mathematics / applied mathematics

Date:

2019-05-29

Project name:

Mathematical Analysis of Intensity Based Segmentation Algorithms with Implementations on Finger Images in an Uncontrolled Environment

Author:

Lisa Svens

Supervisors:

Anders Olsson, Linus Carlsson

Reviewer: Masood Aryapoor Examiner: Sergei Silvestrov Comprising: 15 ECTS credits

(3)

Abstract

The main task of this thesis is to perform image segmentation on images of fingers to partition the image into two parts, one with the fingers and one with all that is not fingers. First, we present the theory behind several well-used image segmentation methods, such as SNIC superpixels, the k-means algorithm, and the normalised cut algorithm. These have then been implemented and tested on images of fingers and the results are shown. The implementations are unfortunately not stable and give segmentations of varying results.

(4)

List of Figures

2.1 Example of Gaussian blurring on an image of an apple. . . 9

2.2 Example of Gaussian unsharp masking on an image of an apple. . . 9

2.3 Starting area for the active contours method. . . 10

2.4 Example of morphological operations on a binary image, with a disk shaped structuring element with radius 15. . . 11

3.1 Segmentation based on region growing method, part one in sequence. . . 14

3.2 Segmentation based on region growing method, part two in sequence. . . 15

3.3 Segmentation based on region growing method, part three in sequence. . . 15

3.4 Segmentation based on normalised cut method, part one in sequence. . . 15

3.5 Segmentation based on normalised cut method, part two in sequence. . . 16

3.6 Segmentation based on normalised cut method, part three in sequence. . . 16

A.1 Initial setup. . . 25

A.2 First iteration. . . 25

A.3 Second iteration. . . 25

A.4 Third iteration. . . 25

A.5 Fourth iteration. . . 25

A.6 Example of k-means segmentation on an image of an apple. . . 26

(7)

Chapter 1 Introduction

A digital image is defined as a set of many small squares called pixels. Giving each of these pixels a colour/intensity value creates a motif in an image. For my thesis, I want to partition these pixels into several segments. This is something we humans do automatically, we cat-egorise regions of an image by, for example, colour or belonging to the same object. We can even identify what objects are in an image. This task is however not trivial for a computer to do.

In this thesis we will use algorithms based on mathematical methods and proofs, applying these to greyscale images to perform image segmentation. Primarily we will use MATLAB for the implementation of these algorithms. This thesis is done in cooperation with the com-pany Cap Wings Sweden AB (CapWings). The product they are developing is a small device containing greyscale cameras. The device is meant to be worn one on each hand. The images from these cameras are then supposed to be segmented into two parts, one containing only the fingers and one containing all that is not fingers.

Unless something else is stated, all images and plots in this thesis have been created by the author.

1.1 Question/Goal

The goal for this thesis is to perform image segmentation on greyscale images of fingers and separate the fingers from all that is not fingers. The output image should also come with a probability that the segmentation is correct.

For this task there are some presumptions we can make and some requirements on the method to be implemented, that have been provided by CapWings:

• The fingers will always start at one edge of the image, the top.

• There will always be four fingers in the image, i.e. my implementation does not need to be able to handle any case where the image contains more or less than four fingers.

(8)

• The method that is to be implemented should be paralleliseable and memory efficient since it should be feasible to implement on an FPGA (Field-Programmable Gate Array). • An iterative algorithm is not recommended since it is complicated to parallelise.

• The size of the images I will be testing the implementations on is 240 × 320 pixels. This project is a combination of mathematics and computer science since image segmentation is a part of the computer vision area which utilises many different kinds of mathematics such as graph theory, optimisation, linear algebra, and geometry to name a few. The mathematical requirements for this thesis will be satisfied by explanations of the mathematical methods used by the algorithms implemented in this thesis.

1.2 Previous works

The algorithm that CapWings are using at the moment divides the image into segments of a few rows each and examines each segment individually using a Gaussian process to analyse the intensity in each few row segments. These separate results are then pieced together and conclusions drawn from these results. This process is costly in terms of computational power and memory usage.

At the moment they are using colour cameras, which gives colour images. In the future, these will be replaced with cameras that produces greyscale images. The reason for this is that greyscale images require less storage memory on an FPGA. This is why for this thesis the colour images will be converted to greyscale and then an attempt at a good segmentation for these will be made. CapWings could then use this finger segmentation to track finger movements of the person using the devices and in theory, create a new game console or assist in stroke rehabilitation.

1.3 Literature review

One tool for the task of creating a good segmentation is by using what is called superpixels. Xiaofeng Ren and Jitendra Malik [1] introduced the concept of superpixels, which in brief terms is a region consisting of several pixels in an area of the image that has roughly the same qualities in terms of texture, brightness, and/or contours.

Radhakrishna Achanta et al. [2] introduced a new algorithm called SLIC (Simple Linear

Iter-ative Clustering)to generate superpixels. This algorithm was superior to earlier algorithms in

the means of adherence to boundaries, memory efficiency, and increased speed in segmenta-tion. It uses a k-means clustering approach to generate superpixels. Further development of this is the SNIC (Simple Non-Iterative Clustering) algorithm by Radhakrishna Achanta and Sabine Süsstrunk [3]. It not only keeps the desirable qualities of SLIC but also improves these as well as being non-iterative, computationally cheaper, and more memory efficient.

(9)

After an image has been segmented into superpixels we still need to find clusters of super-pixels to deduce what motif the image holds, so we could use a clustering algorithm. There are several of these but one of them was created by Martin Ester et al. [4]. It uses a density-based notion to discover clusters of arbitrary shape. It outperforms similar algorithms in terms of efficiency and discovering clusters.

Another interesting approach to image segmentation is by using a split-and-merge method. This is what is proposed by F. E. Correa-Torne and R. E. Sanchez-Yanez [5]. This method uses integral images to improve the execution time allowing images to be segmented in real-time.

(10)

Chapter 2 Preliminaries on my topic

In this chapter, I will present some methods that later in the report will prove to be useful for my segmentation implementations.

2.1 Gaussian blurring

Gaussian blurringis a method to make an image more blurry, to smooth it out, this is desired

for example when removing unwanted noise and interference. This is done by convolving an image with a Gaussian kernel. That is, for each pixel in the image, we multiply it and its surrounding area with a specific matrix of size n × n where n is an odd number. This is so the current pixel can be in the center of the matrix being multiplied with the Gaussian kernel. Each value in the Gaussian kernel is decided by the Gaussian distribution function in two dimensions. That is:

G(x, y, σ ) = 1

2πσ2 exp(−

x2+ y2

2σ2 )

where x and y denote the coordinates in the kernel matrix and σ is the standard deviation [6, ch. 3.2]. Since this will be implemented using MATLAB we will be using their function for calculating the size of the kernel matrix, which is [7]:

(11)

(a) Original image (b) Gaussian blur with σ = 3 and n = 13

(c) Gaussian blur with σ = 5 and n = 21

Figure 2.1: Example of Gaussian blurring on an image of an apple.

2.2 Gaussian unsharp masking

Sometimes, when we have a slightly blurry image, we want to find a way to sharpen it and make it more clear to look at. This will in some cases also simplify and improve the quality of the subsequent segmentation. One way to do this is by Gaussian unsharp masking.

In Gaussian unsharp masking, we first apply a Gaussian filter to a duplicate of the image to get an even more blurry image. Since we can view an image as a matrix of intensity values we can then subtract the blurry image from the original image to get the difference between the two. The locations of the highest values in the difference matrix are therefore the locations of where the most blurring has been done, this will be around edges and where there is a sharp change in intensity. Adding the difference image to the original image increases the intensity value of the areas around all edges in the original image and creates a stronger border [6, ch. 3.2]. If we at any point in this process get an intensity value below 0 or above 255 these are cut off there since the range of intensities for a greyscale image is from 0 to 255 and we can not exceed that. An example is shown in Figure 2.2 using the same original image as in Figure 2.1 with the blur of σ = 3.

(a) Original image (b) Gaussian unsharp mask-ing with σ = 3 and n = 13

(12)

2.3 Active contour/region growing methods

The idea of region growing methods is to start with an image of an area covering the object to be detected, then the algorithm expands and contracts the boundaries of the area to detect the borders of the object to be found. With active contour methods, in particular [8], the idea is that the curve of the area moves inward along its interior normal and stops at the border of the object to be found. This is the method we use in Chapter 3. For our implementation, we will use the image in Figure 2.3 as the starting image since we are looking for four fingers in our segmentation.

Figure 2.3: Starting area for the active contours method.

2.4 Morphological operations

After an image has been converted to a binary image1 by a thresholding operation we

some-times want to smooth some edges out to get rid of any roughness or remove small areas. For this task, morphological operations are useful. Morphological opening is a combination of morphological erosion and dilation. Erosion and dilation are performed by convolving a bin-ary image with a structuring element. This structuring element is displayed as a binbin-ary matrix that is (almost always) smaller than the image to be convolved. It can have just about any shape although a square or a disk-shaped structuring element is more common than for example a swirl or a line. To define these operations we first need to define a thresholding function:

θ (a, b) = (

1, if a ≥ b

0, else

where a and b are integers. We will also let c denote the number of 1s inside of the structuring element as it is being scanned over the image, S the size of the structuring element in pixels, that is, the number of 1s in the structuring element matrix. We can now define erosion, dilation,

(13)

and opening of the binary image f with the structuring element s. dilate( f , s) = θ (c, 1)

erode( f , s) = θ (c, S)

opening( f , s) = dilate(erode( f , s), s)

These can be described in words by saying that dilation returns a 1 if any pixel in the structur-ing element has value 1, erosion returns a 1 only if all pixels in the structurstructur-ing element are 1s, and opening is dilation performed on an eroded image. Intuitively, dilation thickens objects and erosion shrinks them, so performing erosion first and then dilation should remove smaller holes and smoothen edges.

(a) Original image (b) dilation (c) erosion

(d) opening (e) structuring element

Figure 2.4: Example of morphological operations on a binary image, with a disk shaped struc-turing element with radius 15.

(14)

2.5 k-means clustering

k_{-means clustering is a method for partitioning a number of points in R}n into k different

clusters. It is an iterative algorithm that will run until some convergence criterion is reached.

The inputs will be a set of data points in Rn and a k-value where k is the number of desired

clusters. The output will be a set of k points. These are the so-called centers, the means, that minimise the squared distance between each data point and its closest center, i.e. the variance of the partition. The goal of this algorithm is to find these centers. It is not guaranteed that the k-means clustering algorithm will reach the global minimum, but rather some local minimum. Proof of this, as well as a mathematical definition of the method and some examples can be found in Appendix A.

2.6 Ncut

Normalised cut is a segmentation algorithm using a graph-theoretic approach to image seg-mentation. It seeks to partition a set of vertices into separate clusters. We want to maximise the value of association within a cluster while minimising the value of association between the clusters, this is what constitutes a good cut. The information in the rest of this section can be found in [9].

We have a graph G = (V, E) where V is the set of all vertices (nodes) and E is the set of all edges. Each edge has a weight, w(i, j), related to the dissimilarity in some measure between node i and j. We want to partition the vertices into two disjoint sets, A and B, so that A ∪ B = V , and A ∩ B = /0, by removing edges connecting them. The sum of weights of the removed edges is called the value of the cut. The optimal partition of a graph is one where the value of the cut is minimised. However, this tends to result in a corner or edge node being alone in a set since it does not have many edges to other nodes in comparison with for example a node in the middle of a graph. To prevent this, we introduce the normalised cut (Ncut). The value of the normalised cut is computed by dividing the value of the cut by the total sum of weights from the set A to all nodes in the graph, doing the same thing for set B, and adding these two numbers.

Algorithm 1. This algorithm describes how a run of the normalised cut method works. 1. Given an image, set up a graph G = (V, E) and compute the adjacency matrix, W , and

diagonal matrix D such that d(i) = ∑jw(i, j).

2. Solve (D −W )x = λ Dx for eigenvectors x, associated with the smallest eigenvalues λ . 3. Use the eigenvector with the second to smallest eigenvalue to bipartition the graph by

finding a threshold value, t, such that the value of the normalised cut is minimised. All pixels i such that x(i) > t belong to set A and the rest belong to set B.

4. Decide if we want to continue the partitioning.

5. Recursively repeat the segmentation on the separate sets if necessary. For details on the normalised cut algorithm, see Appendix B.

(15)

Chapter 3 Results and Conclusions

In this thesis, I have implemented two different approaches to my segmentation problem. The first has its main focus on a region growing method and the other one on a superpixel seg-mentation in combination with the normalised cut algorithm. The results of both approaches are presented below. Some discussion and comparison of these results will then be provided and also some discussion of alternative ways this problem could have been solved. If nothing else is stated, the implementations have been written in MATLAB.

As mentioned earlier, the first approach has its focus on a region growing method. This is presented in Algorithm 2.

Algorithm 2. The following methods have been utilised to get a segmentation focusing on a region growing method.

1. Image is converted to greyscale.

2. Gaussian unsharp masking is performed, using σ = 5 and a kernel of size 21x21. 3. Active contour / region growing.

4. Opening, using a disk-shaped structuring element with radius 15.

5. Area opening, which is when you remove connected components of less than a specified number of pixels, in this case, we remove components of size less than 640 pixels. This returns a binary image where the white pixels are considered to be part of a finger and the black pixels are not.

The central points of the second approach is a SNIC superpixel segmentation followed by the normalised cut algorithm. The steps followed to get this segmentation are presented in Algorithm 3.

(16)

Algorithm 3. The following methods have been utilised to get a segmentation focusing on a SNIC superpixel segmentation followed by the normalised cut algorithm.

1. Image is converted to greyscale.

2. Gaussian unsharp masking is performed, using σ = 5 and a kernel of size 21x21. 3. SNIC superpixel segmentation on 11 pixel rows at a time, roughly 40 superpixels for

each set of rows, and a comparison factor of 20.

4. All pixels in the area of each superpixel are set to have the intensity value of the mean intensity within that superpixel. This is done because the normalised cut algorithm can not run on superpixels and therefore runs on what looks like a superpixel image instead. We call this averaged image the coloured image.

5. Normalised cut algorithm with edge weight function

w(i, j) =    e− |I(i)2−I( j)2| σI − ||X(i)−X( j)||2₂ σX _, _{if ||X (i) − X ( j)||}₂_{< r} 0, otherwise

where I(a) is the intensity in pixel a, X (a) is the spatial location of pixel a, r is a max

distance between pixels to have a edge between them, and σI, σX are scaling factors.

This returns a labeled image. That is, a matrix of the same size as the image where each pixel in the original image corresponds to an entry in the labeled image and all pixels belonging to the same segment have the same distinct number in the labeled image.

3.1 Results

Presented below are the segmentations of some different images, the segmentation having been done primarily by using an active contour method. The image to the left is the original image and the image to the right is the segmented image.

(17)

Figure 3.2: Segmentation based on region growing method, part two in sequence.

Figure 3.3: Segmentation based on region growing method, part three in sequence.

Figures 3.1-3.3 show three segmentations of a sequence of images where I got good results for the first two images. The third, however, is not as good. The segmentation separates one of the fingers into two segments which is not good, it also mistakes some other parts of the image for being a finger, amongst other the screen stand. This suggests that this is not an optimal finger segmentation solution and also that the implementation is unstable.

In Figures 3.4-3.6 I will be presenting the segmentation result I got when using Algorithm 3. For simplicity of discussion and comparison of the two approaches, Figures 3.4-3.6 are of the same sequence of images as for the region growing method. The left-most image is the original image, the center one is the superpixel segmentation with borders drawn on the greyscale of the original and the right-most is the normalised cut segmentation with borders drawn on the coloured image.

(18)

Figure 3.5: Segmentation based on normalised cut method, part two in sequence.

Figure 3.6: Segmentation based on normalised cut method, part three in sequence.

As we can see, the normalised cut is good for separating areas with different intensities, it nicely follows the borders of different objects in the image, such as the computer screen and the jacket. In the case of Figure 3.6, the gap between the index and the middle finger is its own region, which is very good. The goal of the superpixel segmentation was to partition the many pixels of the image into slightly larger regions, so-called superpixels, by their intensity values. This has been done to a satisfactory level since the borders of the superpixels together cover most of the borders in the image.

The code for performing the SNIC superpixel segmentation is an implementation of the method

described in [3] and was written by Radhakrishna Achanta in 20171, and is used by permission

from him. It is a combination of MATLAB and C++ code. The code used to draw the region

boundaries was written by Peter Kovesi in 20132and is used by permission from him. It is

written in MATLAB. The code for the normalised cut was written by Timothee Cour, Stella

Yu, and Jianbo Shi in 20043and is an implementation of the method described in [9]. The

software was made public for research use only, which means I have permission to use it for my thesis. It is a combination of MATLAB and C++ code. All other code used was written by the author and can be found in Appendix C.

1_The _code _was _found _at _{https://ivrl.epfl.ch/research-2/research-current/}

research-superpixels/research-snic_superpixels/and downloaded on 2019-01-23.

2_{The code was found at https://www.peterkovesi.com/projects/segmentation/ and}

downloaded on 2019-02-07.

3_{The code was found at http://www.timotheecour.com/software/ncut/ncut.html and}

(19)

3.2 Alternative solutions

An interesting tool to use for this problem could have been a disparity map or some other kind of depth measurement, but it requires two images or some kind of stereo case so it was not applicable in my case [10, 11]. There has also been extensive research in performing image segmentation using some version of Neural Network [12, 13]. Another interesting approach to this problem could have been OpenPose [14], which has recently been developed and so far aims to perform multi-person 2D position detection in images and videos.

3.3 Conclusions

The goal of this thesis was to successfully perform image segmentation on greyscale images and partition the image into two regions, one containing fingers and one containing all that is not fingers. In addition, I was supposed to provide the probability that the segmentation of an image was correct. For some of the images Algorithm 2 gave good results but the algorithm was very unstable, so if I had provided a probability it would be quite low because of the instability. I was not able to get Algorithm 3 to return a binary image as in Algorithm 2, but it showed promising potential for improvement and creating a good segmentation.

(20)

Bibliography

[1] Xiaofeng Ren and Jitendra Malik. Learning a classification model for segmentation. In Proceedings Ninth IEEE International Conference on Computer Vision, pages 10–17 vol.1, Oct 2003.

[2] Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Süsstrunk. Slic superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11):2274–2282, Nov 2012.

[3] Radhakrishna Achanta and Sabine Süsstrunk. Superpixels and polygons using simple non-iterative clustering. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 4895–4904. Ieee, 2017.

[4] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based al-gorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.

[5] Fernando E. Correa-Tome and Raul E. Sanchez-Yanez. Integral split-and-merge meth-odology for real-time image segmentation. Journal of Electronic Imaging, 24:24 – 24 – 11, 2015.

[6] Richard Szeliski. Computer Vision: Algorithms and Applications. Springer-Verlag Lon-don.

[7] Mathworks. https://se.mathworks.com/help/images/ref/

imgaussfilt.html. Accessed: 2019-05-24.

[8] Tony F Chan and Luminita A Vese. Active contours without edges. IEEE Transactions on image processing, 10(2):266–277, 2001.

[9] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 22(8):888–905, Aug 2000. [10] Zeng-Fu Wang and Zhi-Gang Zheng. A region based stereo matching algorithm using

cooperative optimization. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2008.

(21)

[11] Michael Bleyer and Margrit Gelautz. A layered stereo matching algorithm using image segmentation and global visibility constraints. ISPRS Journal of Photogrammetry and remote sensing, 59(3):128–150, 2005.

[12] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image com-puting and computer-assisted intervention, pages 234–241. Springer, 2015.

[13] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing sys-tems, pages 1097–1105, 2012.

[14] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Openpose:

realtime multi-person 2d pose estimation using part affinity fields. arXiv preprint

arXiv:1812.08008, 2018.

[15] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):881–892, July 2002.

[16] Stuart Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137, March 1982.

[17] OJ Oyelade, OO Oladipupo, and IC Obagbuwa. Application of k means

clus-tering algorithm for prediction of students academic performance. arXiv preprint

arXiv:1002.2425, 2010.

[18] Hanaa M. Hussain, Khaled Benkrid, Huseyin Seker, and Ahmet T. Erdogan. Fpga imple-mentation of k-means algorithm for bioinformatics application: An accelerated approach to clustering microarray data. In 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pages 248–255, June 2011.

[19] Shokri Z. Selim and M. A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1):81–87, Jan 1984.

[20] Leon Cooper. N-dimensional location models: An application to cluster analysis.

Journal of Regional science, 13(1):41–54, 1973.

[21] Dimitris Bertsimas and John N. Tsitsikilis. Introduction to Linea Optimization. Optim-ization and Neural Computation Series. Athena Scientific, 4th edition, 1997.

(22)

[23] Mathworks. https://se.mathworks.com/help/images/ color-based-segmentation-using-k-means-clustering.html. Accessed: 2019-03-06.

[24] Kai-Xing Chen, Jin-Hai Wang, and Yu-Bo Yuan. Two stages adaptive normalized cuts and image segmentation. In 2017 International Conference on Machine Learning and Cybernetics (ICMLC), volume 1, pages 136–142, July 2017.

[25] Huang XianLou and Yu ShuangYuan. Image segmentation based on normalized cut and cuda parallel implementation. In 5th IET International Conference on Wireless, Mobile and Multimedia Networks (ICWMMN 2013), pages 209–214, Nov 2013.

[26] Bojan Mohar. The laplacian spectrum of graphs. Graph theory, combinatorics, and applications, 2(871-898):12, 1991.

(23)

Appendix A

k-means clustering

There are several different ways of partitioning points in Rninto clusters, the best algorithm

for this depends on the intended application [15]. One of the most well used is the k-means clustering algorithm. In most cases, when we talk about the k-means algorithm we mean Lloyd’s algorithm, the original version of this algorithm [16] was proposed as a technique for pulse code modulation (digitally representing analogue signals). This algorithm has then been adapted for a range of different implementations such as image segmentation [2], analysis [17], and bioinformatics [18]. We will present the algorithm, some proofs relating to this algorithm, and some examples of running this algorithm on points in 1d-space as well as on an image.

A.1 Algorithm

This is an iterative algorithm that will run until some convergence criterion is reached. The

inputs will be a set of data points in Rn and a k-value where k is the number of desired

clusters. The output will be a set of k points. These are the so called centers, the means, that minimise the squared distance between each data point and its closest center, i.e. the variance of the partition. The goal of this algorithm is to find these centers. Algorithm 4 describes the mathematical formulation of this minimisation problem [19].

Algorithm 4. Let x1, x2, ...xm∈ Rn be a finite number of points. We want to partition these

into k clusters, 2 ≤ k ≤ m. To do this, we consider the following optimisation problem P:

minimise f(W, Z) = k

∑

i=1 m

∑

j=1 w_{i j}D(xj, zi) subject to k

∑

i=1 wi j = 1, j= 1, 2, ..., m w_{i j} = 0 or 1, i= 1, 2, ..., k

(24)

where

W = [wi j] is a k × m real matrix where wi j is 1 if point xjis in cluster i, and 0 otherwise,

Z= [z₁, z₂, ...z_k_{] ∈ R}nk,

z_i_{∈ R}nis the center of cluster i, i= 1, 2, ..., k,

D(xj, zi) is some distance measure between xjand zi i= 1, 2, ..., k, j= 1, 2, ..., m.

The distance measure D(xj, zi) could for example be the Euclidian distance, this was proposed

in [20].

A.2 Convergence

It is not guaranteed that Lloyd’s algorithm will reach the global minimum but rather some local minimum, we will see a proof of this further down. The following theorems and definitions in this section are from [19] and [21, p.46].

Definition 1. The set S is defined as follows:

S= ( W : k

∑

i=1 w_{i j}= 1, w_{n j} ≥ 0, n= 1, 2, ..., k, j= 1, 2, ..., m ) .

Definition 2. An extreme point of a set is a point such that it is not a convex combination of any other points in that set.

Theorem 1. The extreme points of the set S satisfy the constraints of (A.1).

Proof. For points in S, all columns in W have to add up to 1 by definition, so either one

element is 1 and the rest are 0’s, or there are multiple numbers in the range [0, 1) that together add up to 1. In either case, the points in S cover a region in the shape of a polyhedron and its extreme points are found in the corners since all points except for the corners can be expressed as a convex combination of points in the set. It follows that the extreme points are where we only find 1’s or 0’s in W , i.e. the extreme points of S satisfy the constraints of (A.1).

Definition 3. A point (W0, Z0) that satisfies:

f(W0, Z0) ≤ f (W, Z0) ∀W ∈ S, and f (W0, Z0) ≤ f (W0, Z) _{∀Z ∈ R}nk,

(25)

Definition 4. To reach a partial optimal solution we need to define two problems. Here f and

Sare once again defined by Problem P.

Problem P1: Given ˆZ∈ Rnk, minimise f(W, ˆZ) subject to W ∈ S Problem P₂: Given ˆW ∈ S, minimise f( ˆW, Z) subject to Z_{∈ R}nk

The following algorithm generates partially optimal solutions and is essentially a rewrite of the k-means algorithm but it will be easier to prove its convergence in this wording.

Algorithm 5. In this algorithm, Z0 and W0are initial solutions, r is the current iteration, and

(W0, Z0) is a partial optimal solution.

1. Choose an initial point Z0∈ Rnkand solve P1 with ˆZ= Z0. Let W0 be an optimal basic

solution and set r = 0.

2. Solve P2with ˆW= Wr, and let the result be Zr+1. If f ( ˆW, Zr+1) = f ( ˆW, Zr), set (W0, Z0) =

(W_r, Zr+1) and stop. If not, go to step 3.

3. Solve P1 with ˆZ = Zr+1, and let the result be Wr+1. If f (Wr+1, ˆZ) = f (Wr, ˆZ), set

(W0, Z0) = (Wr+1, Zr) and stop. If not, increase r by 1 and go to step 2.

Theorem 2. Algorithm 5 converges to a partial optimal solution to problem P in a finite number of iterations.

Proof. First, we will show that no extreme point in S will be visited more than once by

Al-gorithm 5. For this, we will do a proof by contradiction. We assume the opposite, i.e. that W_r₁ = Wr2 for some r1, r2where r16= r2. When applying step 2 with ˆW = Wr1, and ˆW = Wr2

we get Zr1+1 and Zr2+1 respectively. This gives:

f(W_r₁, Z_r₁+1) = f (Wr2, Zr1+1), since Wr1 = Wr2

= f (Wr2, Zr2+1)

1_, _{from step 2 we know that P}

2will

return Zr2+1 when applied to Wr2. (A.2)

Since the sequence generated by f (·, ·) is strictly decreasing (with the exception of the last

two numbers at which point we have converged anyway), (A.2) can not be true and Wr16= Wr2.

There is a finite number of extreme points in S since there is a limit to the number of feasible

W’s to hold the extreme points and so we will reach a partial optimal solution in a finite

(26)

A.2.1 Comments on mathematics in an article

In section 7 in [19] Selim and Ismail state that they will be considering two metrics. However, the first of the two, quadratic metric, does not satisfy the criteria to be a metric. Although this is most likely just a mistake we will still show below why it is not a metric. From [22] we get that the third criterion to be a metric is to satisfy the triangle inequality. That is:

d(x, y) ≤ d(x, z) + d(z, y) _{∀x, y, z ∈ R}n.

We will argue that the quadratic metric is not an actual metric by a counterexample to the triangle inequality. Selim and Ismail state the following definition of the quadratic metric:

D(X , Z) := (X − Z)T(X − Z).

For X = [x, 0, ..., 0], Y = [y, 0, ..., 0], Z = [z, 0, ..., 0] in Rnwe then get: (x − y)2≤ (x − z)2+ (z − y)2 Let x = 0 and z = 1: (−y)2≤ (−1)2+ (1 − y)2 y2≤ 1 + 1 − 2y + y2 2y ≤ 2 y≤ 1

Since y should be able to take any value in Rnthe quadratic metric is not a metric.

A.3 k-means examples

The k-means algorithm can be used as a post-processing step to improve the results from almost any other clustering algorithm. This is more common than as a straightforward imple-mentation since it is very inefficient and slow because of the cost (in time and efficiency) of computing the nearest neighbours in each iteration [15].

In the next two sections, we will present two examples of a k-means clustering being per-formed. The first one will be in R on 10 numbers partitioned into 3 clusters, and the second in

R2on an image partitioned into 2 clusters.

A.3.1 Simple example

We start by generating 10 and 3 random numbers, these will be our X and initial Z respectively. For this example we have:

X = [0.3, 4.3, 3.8, 7.6, 7.9, 1.8, 4.8, 4.4, 6.4, 7.0],

Z= [4.9, 9.5, 3.4].

Figures A.1-A.5 show a sequence of figures visualising how a run of the k-mean algorithm looks like. Here, we have used the Euclidean distance metric to calculate the distances between points.

(27)

Figure A.1: Initial setup.

Figure A.2: First iteration.

Figure A.3: Second iteration.

Figure A.4: Third iteration.

Figure A.5: Fourth iteration.

We see that nothing has changed between the third and the fourth iteration, i.e. the al-gorithm has converged and we have our final partition.

(28)

A.3.2 Image example

We use the same method in 2D-space to partition an image into several clusters using MAT-LAB. In partitioning an image, the distance function gives the distance between pixels in some colour space (using the Euclidean distance metric) instead of the spatial location of the pixels [23]. This ensures that we can segment an image based on the colour instead of the location of each pixel in the image. In this example we work in the RGB colour space to segment an image of an apple, see Figure A.6.

(a) Original image (b) Segmented image, with n = 2, and k = 2

(29)

Appendix B

Normalised Cut

Normalised cut is a segmentation algorithm using a graph-theoretic approach to image seg-mentation. It seeks to partition a set of vertices into two separate clusters. We want to maxim-ise the value of association within a cluster while minimising the value of association between the clusters, this is what constitutes a good cut. In 2017, the normalised cut algorithm was the most popular algorithm within image segmentation [24], and has generated promising segmentations of images [9]. It can also be parallelized [25], which was one of the desired qualities for a method used in this thesis, hence it is used here.

Unless otherwise stated, the theory in the following two sections can be derived from [9], by Jianbo Shi and Jitendra Malik. We will describe their calculations in more detail and also present some examples for a higher level of understanding of the methods used when deriving the normalised cut algorithm.

B.1 Algorithm

We have a graph G = (V, E) where V is the set of all vertices (nodes) and E is the set of all edges. Each edge has a weight, w(i, j), related to the dissimilarity in some measure between node i and j. We want to partition the vertices into two disjoint sets, A and B, so that A ∪ B = V , and A ∩ B = /0, by removing edges connecting them. The sum of weights of the removed edges is called the value of the cut.

Definition 5. Cut:

cut(A, B) =

_∑

u∈A,v∈B

w(u, v)

The optimal partition of a graph is one where the value of the cut is minimised. However, this tends to result in a corner or edge node being alone in a set since it does not have many connections to other nodes in comparison to for example a node in the middle of a grid. To prevent this, we introduce the normalised cut (Ncut). The value of the normalised cut is computed by dividing the value of the cut by the total sum of weights from the set A to all

(30)

Definition 6. Normalised cut:

Ncut(A, B) = cut(A, B)

cut(A,V )+

cut(A, B)

cut(B,V ).

In a similar way we can define a normalised association (Nassoc), i.e. a ratio of the associativity within a partition to the rest of the graph.

Definition 7. Normalised association:

Nassoc(A, B) = cut(A, A)

cut(A,V )+

cut(B, B) cut(B,V )

where cut(A, A) is the sum of weights of edges between nodes within a set. Both cut(A, A) and cut(A,V ) hold essentially the same thing with the exception that cut(A,V ) also holds the weight of the edges from A to the rest of the graph. Therefore,

cut(A, B) =

_∑

u∈A,v∈B w(u, v) =

_∑

u∈A,v∈V w(u, v) −

_∑

u∈A,v∈A w(u, v) = cut(A,V ) − cut(A, A).

A similar argument can be made for set B. Using this we can rewrite the expression for Ncut:

Ncut(A, B) = cut(A, B) cut(A,V )+ cut(A, B) cut(B,V ) = cut(A,V ) − cut(A, A) cut(A,V ) + cut(B,V ) − cut(B, B) cut(B,V ) = cut(A,V ) cut(A,V )− cut(A, A) cut(A,V )+ cut(B,V ) cut(B,V )− cut(B, B) cut(B,V ) = 2 − cut(A, A) cut(A,V )+ cut(B, B) cut(B,V ) = 2 − Nassoc(A, B).

Thus, we can simultaneously both minimise the association between the sets (Ncut) and max-imise the association within each set (Nassoc). Therefore, the normalised cut will be used as a partition criteria for our segmentation. To find the optimal partition is an NP-hard problem though, which is why we will compute an approximation of the solution instead, since this can be done more efficiently.

(31)

B.2 Computing the partition

We want to partition the set of nodes V in a graph into two subsets, A and B. For this task we

need an indicator vector, X , of length N = |V | where xiis 1 if node i is in A and -1 otherwise,

i∈ [1, N]. We also need a way to display the total value of the connections from one node to

the rest of the graph, so we let di= d(i) where

d(i) =

_∑

j

w(i, j).

Additionally, we have a diagonal matrix, D, of size N × N with di’s on the diagonal, and a

sparse symmetric matrix W , of size N × N where wi j= w(i, j). We also have an N × 1 vector

of ones, 1, and a constant k:

k=∑xi>0di

∑idi

i.e. the ratio between the sum of weights of edges connecting to nodes in A, to all weights in the graph. Using this we can rewrite the expression for Ncut again.

Ncut(A, B) = cut(A, B) cut(A,V )+ cut(A, B) cut(B,V ) = ∑i∈A, j∈Bwi j ∑i∈A, j∈Vwi j +∑i∈A, j∈Bwi j ∑i∈B, j∈Vwi j = ∑xi>0,xj<0−wi jxixj ∑xi>0di +∑xi<0,xj>0−wi jxixj ∑xi<0di . (B.1)

It may not seem logical to write −wi jxixj when wi j would yield the same result since −xixj

always will evaluate to 1 under these conditions, but it will be made clearer why we write it this way further down in the report.

We have 1 as a vector of 1’s and X as a vector of ±1’s, so it follows that 1+X₂ and 1−X₂ hold

1’s where node i is in A or B respectively, and 0’s elsewhere. Thus, they are indicator vectors

for xi> 0 and xi< 0 respectively. The expression in (B.1) can be rewritten to the following

matrix form: Ncut(A, B) = (1 + X ) T_{(D −W )(1 + X )} 4k1TD1 + (1 − X )T(D −W )(1 − X ) 4(1 − k)1TD1 = (1 + X ) T 2 · D−W k1TD1· 1 + X 2 + (1 − X )T 2 · D−W (1 − k)1TD1· 1 − X 2 . (B.2)

Since we multiply (D − W ) with the indicator vectors we get the multiplication by −xixj in

(B.1) that might have seemed strange at that point, but it was just to make it more similar to the expression in (B.2). The major difference between these two ways of writing is that the nominators in (B.1) add up the elements from W that are in A and B respectively while (B.2)

(32)

for the denominators, we can derive the denominator in (B.1) from the denominator in (B.2): k1TD1 = ∑xi>0di ∑idi (d1+ d2+ ... + dN) = ∑xi>0di ∑idi

∑

_i d_i =

_∑

xi>0 d_i, (1 − k)1TD1 = 1 −∑xi>0di ∑idi (d1+ d2+ ... + dN) = ∑idi− ∑xi>0di ∑idi

∑

_i d_i = ∑xi<0di ∑idi

∑

_i d_i =

_∑

xi<0 di.

Example 1 shows both of these ways of writing, just to make this part a little more clear before we continue.

Example 1. We have the following W and X as inputs:

W =     0 2 5 4 2 0 2 3 5 2 0 1 4 3 1 0     , X =     −1 1 1 −1     .

From this we can calculate the matrix D.

d₁=

_∑

j w_{1 j} = w₁₁+ w₁₂+ w₁₃+ w₁₄= 0 + 2 + 5 + 4 = 11 d₂=

_∑

j w_{2 j} = w21+ w22+ w23+ w24= 2 + 0 + 2 + 3 = 7 d₃=

_∑

j w_{3 j} = w₃₁+ w₃₂+ w₃₃+ w₃₄= 5 + 2 + 0 + 1 = 8 d₄=

_∑

j w_{4 j} = w41+ w42+ w43+ w44= 4 + 3 + 1 + 0 = 8 =⇒ D=     11 0 0 0 0 7 0 0 0 0 8 0 0 0 0 8    

(33)

Using the method in (B.1) we get the following result: Ncut(A, B) =∑xi>0,xj<0−wi jxixj ∑xi>0di +∑xi<0,xj>0−wi jxixj ∑xi<0di = (−2)1(−1) + (−3)1(−1) + (−5)1(−1) + (−1)1(−1) 7 + 8 +(−2)(−1)1 + (−5)(−1)1 + (−3)1(−1) + (−1)(−1)1 11 + 8 = 11 15+ 11 19 = 374 285

Using the method in (B.2) we get the following result: k=∑xi>0di ∑idi = 7 + 8 11 + 7 + 8 + 8= 15 34≈ 0.44 Ncut(A, B) =(1 + X ) T 2 · D−W k1TD1· 1 + X 2 + (1 − X )T 2 · D−W (1 − k)1TD1· 1 − X 2 =0 1 1 0     11 −2 −5 −4 −2 7 −2 −3 −5 −2 8 −1 −4 −3 −1 8         0.44 0.44 0.44 0.44     11 7 8 8             0 1 1 0     +1 0 0 1     11 −2 −5 −4 −2 7 −2 −3 −5 −2 8 −1 −4 −3 −1 8         0.56 0.56 0.56 0.56     11 7 8 8             1 0 0 1     = 11 15+ 11 19= 374 285 OK!

(34)

Theorem 3. For a symmetric matrix S of size n × n and vectors u, v of size n × 1 we have that uTSv= vTSu.

Proof. We recall the rules in Linear Algebra which state that

(AB)T = BTAT

and

(AT)T = A.

To prove that uTSv= vTSu, we rewrite the LHS as follows:

uTSv= uT((Sv)T)T = uT(vTST)T = vTSTu.

Since S is a symmetric matrix we get that ST = S and that vTSTu= vTSu.

To continue, from (B.2) we get that

4 · Ncut(A, B) = (1 + X ) T_{(D −W )(1 + X )} k1TD1 + (1 − X )T(D −W )(1 − X ) (1 − k)1TD1 = (1 T_{+ X}T_{)(D −W )(1 + X )} k1TD1 + (1T− XT)(D −W )(1 − X ) (1 − k)1TD1

by expanding the parentheses. By Theorem 3 we can do the following:

= 1 T_{(D −W )1 + 21}T_{(D −W )X + X}T_{(D −W )X} k1TD1 +1 T_{(D −W )1 − 21}T_{(D −W )X + X}T_{(D −W )X} (1 − k)1TD1

by some cleaning up, and

= 1

T_{(D −W )1 + X}T_{(D −W )X + 2(1 − 2k)1}T_{(D −W )X}

k(1 − k)1TD1 (B.3)

by merging the two fractions into one and cleaning up. To make further calculations easier to follow, let:

α (X ) = XT(D −W )X

β (X ) = 1T(D −W )X

γ = 1T(D −W )1

(35)

Row vector 1T(D −W ) holds the sum of each column in matrix (D −W ), this is a zero-vector and thus both β (X ) and γ are equal to 0. Using these definitions we can rewrite (B.3) as follows: 1T(D −W )1 + XT(D −W )X + 2(1 − 2k)1T(D −W )X k(1 − k)1TD1 = α (X ) + γ + 2(1 − 2k)β (X ) k(1 − k)M using substitution, = α (X ) + γ + 2(1 − 2k)β (X ) k(1 − k)M − 2(α(X ) + γ) M + 2α(X ) M + 2γ M

by adding and removing 2(α(X )+γ)_M ,

= α (X ) + γ + 2(1 − 2k)β (X ) − 2k(1 − k)(α (X ) + γ )

k(1 − k)M +

2α(X ) M by merging the first three fractions into one,

= (1 − 2k + 2k

2_{)(α(X ) + γ) + 2(1 − 2k)β (X )}

k(1 − k)M +

2α(X ) M by reorganisation of the terms, and

= 1−2k+2k2 (1−k)2 (α(X ) + γ) + 2(1−2k) (1−k)2 β (X ) k 1−kM +2α(X ) M (B.4)

by reducing the first fraction by (1 − k)2. We let b = _1−kk which means that 1 + b2= 1−2k+2k_(1−k)22

and 2(1 − b2) = 2(1−2k)

(1−k)2 . This means that we can rewrite (B.4) to:

1−2k+2k2 (1−k)2 (α(X ) + γ) + 2(1−2k) (1−k)2 β (X ) k 1−kM +2α(X ) M =(1 + b 2_{)(α(X ) + γ) + 2(1 − b}2_{)β (X )} bM + 2α(X ) M . (B.5)

Here we will do another example for clarification, we will use the same numbers as in Example 1.

(36)

Example 2. We will compute 4 · Ncut(A, B) using the method in (B.5). This should be equal to 4 ·374₂₈₅ = 1496₂₈₅, since 374₂₈₅ was the result in Example 1 and our result here should be 4 times as big. In Example 1 we had the following values for W , X , and D:

W =     0 2 5 4 2 0 2 3 5 2 0 1 4 3 1 0     , X =     −1 1 1 −1     , and D=     11 0 0 0 0 7 0 0 0 0 8 0 0 0 0 8     .

By definition, we have that:

α (X ) = XT(D −W )X =−1 1 1 −1     11 −2 −5 −4 −2 7 −2 −3 −5 −2 8 −1 −4 −3 −1 8         −1 1 1 −1     = 44, β (X ) = 0, M= 1T(D −W )1 =1 1 1 1     11 −2 −5 −4 −2 7 −2 −3 −5 −2 8 −1 −4 −3 −1 8         1 1 1 1     = 34, γ = 0, b= k 1 − k = 15 34 1 −15₃₄ = 15 19. We get that: 4 · Ncut(A, B) = (1 + b 2_{)(α(X ) + γ) + 2(1 − b}2_{)β (X )} bM + 2α(X ) M = (1 + 15 19 2 )(44 + 0) + 2(1 − 15₁₉2) · 0 15 19· 34 +2 · 44 34 = 1496 285 OK!

We now continue rewriting our expression for 4 · Ncut(A, B) from (B.5) and get:

(1 + b2)(α(X ) + γ) + 2(1 − b2)β (X ) bM + 2α(X ) M =(1 + b 2_{)(α(X ) + γ)} bM + 2(1 − b2)β (X ) bM + 2bα(X ) bM − 2bγ bM

by splitting the first fraction into two fractions, and subtracting 2bγ_bM. Since γ = 0 this fraction

is equal to 0 and does not change the value of 4 · Ncut(A, B). Continuing, we get that:

=(1 + b 2_)(XT_{(D −W )X + 1}T_{(D −W )1)} b1TD1 + 2(1 − b2)1T(D −W )X b1TD1 +2bX T_{(D −W )X} b1TD1 − 2b1T(D −W )1 b1TD1

(37)

by re-substitution, and = (1 + b 2_{+ 2b)X}T_{(D −W )X} b1TD1 + (1 + b2− 2b)1T_{(D −W )1} b1TD1 + 2(1 − b2)1T(D −W )X b1TD1 (B.6)

by reorganisation. We have that

(1 + X )T(D −W )(1 + X ) = XT(D −W )X + 1T(D −W )1 + 21T(D −W )X

(1 − X )T(D −W )(1 − X ) = XT(D −W )X + 1T(D −W )1 − 21T(D −W )X

(1 − X )T(D −W )(1 + X ) = −XT(D −W )X + 1T(D −W )1.

Combining this with (B.6) gives us that

(1 + b2+ 2b)XT(D −W )X bIT_DI + (1 + b2− 2b)IT_{(D −W )I} bIT_DI + 2(1 − b2)IT(D −W )X bIT_DI = (1 + X ) T_{(D −W )(1 + X ) + b}2_{(1 − X )}T_{(D −W )(1 − X ) − 2b(1 − X )}T_{(D −W )(1 + X )} bIT_DI = [(1 + X ) − b(1 − X )] T_{(D −W )[(1 + X ) − b(1 − X )]} bITDI = 4 · Ncut(A, B). (B.7) Let1 y=(1 + X ) − b(1 − X ) 2 ,

that is the same as saying that

y_i= (

1, if i ∈ A

−b, otherwise .

We can then show that yTD1 = 0. First, we show that

yTD1 =

_∑

xi>0

d_i− b

_∑

xi<0

d_i.

Beginning with the LHS, yD1 is a summation of all elements in D where all elements in set Bare multiplied by "−b" since that is how y was defined. The first term of the RHS, ∑xi>0di,

is a summation of all di’s that are in set A. The second term, ∑xi<0di is a summation of all

elements in set B, this is then multiplied by "−b". Therefore,

∑

xi>0

d_i− b

_∑

xi<0

d_i= yTD1.

We can show that ∑xi>0di− b ∑xi<0di= 0 by

∑

xi>0 d_i− b

_∑

xi<0 d_i=

_∑

xi>0 d_i− k 1 − k_x

∑

i<0 d_i =

_∑

xi>0 d_i− ∑_xi>0di ∑idi 1 −∑xi>0di ∑idi

∑

xi<0 d_i

(38)

by re-substitution in two steps, =

_∑

xi>0 d_i− ∑_xi>0di ∑idi ∑_xi<0di ∑idi

∑

xi<0 d_i =

_∑

xi>0 d_i−∑xi>0di ∑xi<0di

∑

xi<0 d_i =

_∑

xi>0 d_i−

_∑

xi>0 d_i = 0. (B.8)

So, yTD1 = 0. We also want to show that

yTDy= b1TD1.

In yTDy, the elements in set A are multiplied by 1 twice, and the rest are multiplied by "−b"

twice. That gives us that

yTDy=

_∑

xi>0

di+ b2

_∑

xi<0

di.

From (B.8) we get that

∑

xi>0 di= b

∑

xi<0 di, so yTDy=

_∑

xi>0 di+ b2

∑

xi<0 di = b

_∑

xi<0 di+ b2

∑

xi<0 di by (B.8), = b(

_∑

xi<0 di+ b

∑

xi<0 di) = b1TD1. (B.9)

by breaking out b and using the same reasoning as in (B.8) for the last part. Combining (B.8) and (B.9) with (B.7) gives us that

4 · Ncut(A, B) = [(1 + X ) − b(1 − X )] T_{(D −W )[(1 + X ) − b(1 − X )]} bIT_DI =y T_{(D −W )y} yT_Dy . (B.10)

So, minimising Ncut(A, B) is the same as minimising yT(D−W )y_yT_Dy under the conditions that yi∈

{1, −b}, and yT_{D1 = 0, this comes from the definition of y. If y is relaxed to take on all real}

values, (B.10) can be minimised by solving the generalised eigenvalue system

(39)

It can be shown that the second constraint, yTD1 = 0, is automatically satisfied by the solution of the generalised eigenvalue system. The expression in (B.11) can be rewritten to a standard eigenvalue system on the form Av = λ v, where v is an eigenvector and λ is the corresponding eigenvalue, for which we can show that the second condition holds.

Definition 8. Taking a diagonal matrix to the power of n ∈ R is the same as taking each element to the power of n. For a diagonal matrix D of size m × m, that is:

Dn=      d₁n 0 . . . 0 0 d₂n . . . 0 .. . ... . .. ... 0 0 . . . d_mn      . We rewrite (B.11): (D −W )y = λ Dy ⇒ D12_D− 1 2(D −W )y = D 1 2_D 1 2_{λ y} ⇒ D−12_{(D −W )y = D} 1 2_{λ y} ⇒ D−12_{(D −W )D}−12_D12_y_{= λ D}12_y. _(B.12)

Let z = D12yand (B.12) gives:

⇒ D−12(D −W )D−

1

2_z= λ z (B.13)

which is in the standard form of an eigenvalue system.

Definition 9. Given a graph G, its Laplacian matrix can be defined as D − A where D is the weighted degree matrix and A is the adjacency matrix of G.

Definition 10. A Hermitian matrix is a square matrix of complex numbers such that it is equal to its conjugate transpose. For a square matrix containing only real numbers it is considered Hermitian if it is symmetric.

Definition 11. For a non-zero complex column vector z, its conjugate transpose z∗, and a Hermitian matrix H, the matrix H is positive semi-definite if the scalar z ∗ Hz is always non-negative.

Any Laplacian matrix is symmetric positive semi-definite, hence its smallest eigenvalue is

0 [26]. For our case, the matrix D−12(D −W )D−

1

2 is symmetric positive definite since (D −W ),

which is a Laplacian matrix, is symmetric positive semi-definite. Thus our smallest eigenvalue

λ0is equal to 0. We can easily verify that the corresponding eigenvector to λ0, i.e. the smallest

eigenvector, is z0= D

1

(40)

since (D − W )1 gives a zero-vector and anything multiplied with that will give 0. The first eigenvector, z0, corresponds to y0= D− 1 2z₀= D− 1 2D 1

21 = 1. Minimising the expression in

(B.10) is the same as minimising zTD

− 1₂

(D−W )D− 12_z

zT_z since this can be derived from (B.10). For

our case, the vector to minimise this is the second smallest eigenvector that is orthogonal to

z₀, we call it z1. The corresponding eigenvalue, λ1, is the value of our cut [9]. We get that

z1= arg min zT_z 0=0 zTD−12(D −W )D− 1 2z zT_z

where arg min gives the instance of z that minimises the expression and zTz₀= 0 is the

condi-tion on z. In rewriting this we can similarly get that

y₁= arg min yT_D1=0 yT(D −W )y yT_Dy where zTz0= (D 1 2y)T_D 1 2_y₀= yT_D 1 2_D 1 21 = yTD1

so that y1 is also a solution to the generalised eigenvalue system in (B.11). To summarise, y1

is the real valued solution to our normalised cut problem. The reason it is not necessarily the solution to our original problem is that it does not automatically satisfy the constraint on y that

y_icould only take one of two specific values.

B.3 Normalised cut example

Figure B.1 is an example of the normalised cut algorithm segmenting an image into 2 and 5 regions respectively, this has been done using MATLAB. This is the same image as in Figure A.6, only in a greyscale version.

(a) Original image (b) Segmented image, 2 regions (c) Segmented image, 5 regions

(41)

Appendix C

Code

In the sections below I have presented all code that I have written. For long lines I have added

some line break symbols, pre-break there is a&, and post-break there is a →.

C.1 Segmentation

This is the main code we want to call to run either of the two segmentation approaches. Parts of the code is commented out to use the region growing method or normalised cut method. Currently the region growing method is commented out so running this code would generate a segmentation using the normalised cut algorithm.

f u n c t i o n [ img1 , img2 ] = s e g m e n t a t i o n ( img ) g r e y = r g b 2 g r a y ( img ) ; b l u r r e d = i m g a u s s f i l t ( g r e y , 5 ) ; s h a r p e n e d = g r e y −b l u r r e d + g r e y ; % do SNIC s u p e r p i x e l s e g m e n t a t i o n l a b e l s = mySNIC ( s h a r p e n e d , 1 1 , 4 0 , 2 0 ) ; img1 = d r a w r e g i o n b o u n d a r i e s ( l a b e l s , s h a r p e n e d , 2 5 5 ) ; % " c o l o u r " t h e s u p e r p i x e l i m a g e newImage = c o l o u r L a b e l ( s h a r p e n e d , l a b e l s ) ; % do N c u t a l g o r i t h m s e g L a b e l = myNcut ( newImage , l a b e l s , 2 0 ) ; img2 = d r a w r e g i o n b o u n d a r i e s ( s e g L a b e l , newImage , 2 5 5 ) ; % a c t i v e c o n t o u r / r e g i o n g r o w i n g

(42)

% mask ( 1 : 5 0 , 1 : e n d ) = 1 ; % mask ( 5 0 : 1 7 0 , 2 0 : 6 0 ) = 1 ; % mask ( 5 0 : 1 7 0 , 1 0 0 : 1 4 0 ) = 1 ; % mask ( 5 0 : 1 7 0 , 1 8 0 : 2 2 0 ) = 1 ; % mask ( 5 0 : 1 7 0 , 2 6 0 : 3 0 0 ) = 1 ; % ac = a c t i v e c o n t o u r ( s h a r p e n e d , mask , 2 0 0 ) ; % o p e n i n g on ac i m a g e s % s e = s t r e l ( ’ d i s k ’ , 1 5 ) ; % img1 = i m o p e n ( ac , s e ) ; end

C.2 colourLabel

This function "colours in" each superpixel in a superpixel segmentation with the average in-tensity value in that superpixel.

f u n c t i o n [ newImage ] = c o l o u r L a b e l ( image , l a b e l s )

% c o l o u r s i n t h e mean v a l u e o f i n t e n s i t y i n t h e e n t i r e r e g i o n&

→ o f a

% s u p e r p i x e l , o n l y f o r g r e y s c a l e i m a g e s

newImage = d o u b l e ( z e r o s ( s i z e ( image , 1 ) , s i z e ( image , 2 ) ) ) ;

meanArr = supMean ( image , l a b e l s ) ;

f o r i = 1 : max ( max ( l a b e l s ) )

temp = meanArr ( i ) ∗ ( l a b e l s == i ) ; newImage = newImage + temp ;

end end

C.3 getW

Creates the adjacency matrix W for the normalised cut algorithm.

f u n c t i o n W = getW ( image , l a b e l s )

% R e t u r n s a s i m i l a r i t y m a t r i x W f o r t h e a s s o c i a t i o n s b e t w e e n &

→ t h e sp ’ s i n % t h e i m a g e

(43)

% I n p u t s : i m a g e − t h e image we a r e c u r r e n t l y w o r k i n g on % l a b e l s − l a b e l d image m a t r i x % O u t p u t : W − s i m i l a r i t y m a t r i x r = 2 5 ; % max d i s t a n c e b e t w e e n s p c e n t e r s s i g m a I = 1 0 0 0 ; % f e a t u r e s i m i l a r i t y f a c t o r sigmaX = 4 0 0 ; % s p a t i a l p r o x i m i t y t e r m n = max ( max ( l a b e l s ) ) ; W = z e r o s ( n , n ) ;

mean = supMean ( image , l a b e l s ) ;

mid = supMid ( l a b e l s ) ;

f o r i = 1 : n f o r j = i : n

d i s t = s q r t ( ( mid ( i , 1 ) − mid ( j , 1 ) ) ^2 + ( mid ( i , 2 ) −&

→ mid ( j , 2 ) ) ^2 ) ; i f d i s t < r

W( i , j ) = exp (− abs ( mean ( i ) ^2 − mean ( j ) ^ 2 ) / &

→ s i g m a I − d i s t ^2 / sigmaX ) ; end end end W = W + W’ − e y e ( n ) ; end

C.4 myNcut

Runs the normalised cut algorithm.

f u n c t i o n s e g L a b e l = myNcut ( g r e y , l a b e l s , numSegments ) W = getW ( g r e y , l a b e l s ) ; % g e t a d j a c e n c y m a t r i x W i f n a r g i n < 3 numSegments = 1 0 ; end [ N c u t D i s c r e t e , N c u t E i g e n v e c t o r s , N c u t E i g e n v a l u e s ] = ncutW (W, & →numSegments ) ;

(44)

% g e n e r a t e s e g m e n t a t i o n l a b e l map

[ n r , nc ] = s i z e ( g r e y ) ; % n r = number o f rows , nc = number o f&

→ c o l u m n s , i n image s e g L a b e l = z e r o s ( n r , nc ) ;

[ nRows , n C o l s ] = s i z e ( N c u t D i s c r e t e ) ; % nRows = l e n g t h o f &

→ e i g e n v e c t o r s , n C o l s = # e i g e n v e c t o r s f o r i = 1 : nRows s u m D i s c E i g V e c = 0 ; f o r j = 1 : n C o l s s u m D i s c E i g V e c = s u m D i s c E i g V e c + ( n C o l s +1− j ) ∗& → N c u t D i s c r e t e ( i , j ) ; end temp = s u m D i s c E i g V e c ∗ ( l a b e l s == i ) ; s e g L a b e l = s e g L a b e l + temp ; end end

C.5 mySNIC

Runs the SNIC superpixel algorithm,

f u n c t i o n m u l t i = mySNIC ( image , noRows , n o S u p P i x , comp )

% I n p u t s : i m a g e − g r e y s c a l e image t o s e g m e n t i n t o SNIC &

→ s u p e r p i x e l s

% noRows − number o f rows t o work on a t a t i m e , I &

→ h a v e u s e d 11

% n o S u p P i x − t h e number o f s u p e r p i x e l s you want &

→ e a c h s e t o f rows % t o be d i v i d e d i n t o ( r o g h l y ) , I h a v e u s e d >=27 and& → >= 32 ( f o r % t h e l a s t row t o be i n c l u d e d ) % comp − c o m p a c t n e s s f a c t o r [ 1 0 , 4 0 ] , I h a v e u s e d & →20 % O u t p u t s : m u l t i − a m a t r i x o f l a b e l s %noRows = 1 1 ; i m a g e A r r = s p l i t ( image , noRows ) ; %d i s p ( ’ d o n e s p l i t t i n g ’ ) ; l e n = l e n g t h ( i m a g e A r r ) ; c o l u m n s = s i z e ( image , 2 ) ;

(45)

% do s n i c s e g m e n t a t i o n h e r e on e a c h s m a l l i m a g e l a b e l s = z e r o s ( noRows , c o l u m n s , l e n ) ; n u m L a b e l s = z e r o s ( l e n , 1 ) ; % do s u p e r p i x e l s e g m e n t a t i o n on e a c h s e t o f r o w s f o r i = 1 : l e n [ l a b e l s ( 1 : s i z e ( i m a g e A r r { i } , 1 ) , : , i ) , n u m L a b e l s ( i ) ] = & → s n i c _ m e x ( i m a g e A r r { i } , noSupPix , comp ) ; % a r g 2 = # s u p e r p i x e l s ( >= 2 7 , >= 32 f o r t h e l a s t row t o & →be i n c l u d e d ) % a r g 3 = c o m p a c t n e s s ( 10 − 40 ) end % a s s e m b l i n g s e g m e n t a t i o n s d i s p S e g = c e l l ( l e n , 1 ) ; d i s p S e g {1} = l a b e l s ( 1 : s i z e ( i m a g e A r r { 1 } , 1 ) , : , 1 ) + 1 ; f o r i = 2 : l e n

d i s p S e g { i } = max ( max ( d i s p S e g { i −1}) ) + l a b e l s ( 1 : s i z e (&

→ i m a g e A r r { i } , 1 ) , : , i ) + 1 ; end

m u l t i = c a t ( 1 , d i s p S e g { : } ) ; end

C.6 split

Divides an image into parts consisting of nrows rows each.

f u n c t i o n i m a g e A r r = s p l i t ( image , n r o w s )

[ h e i g h t , w i d t h ] = s i z e ( image ) ;

r o w s = f l o o r ( h e i g h t / n r o w s ) ; i m a g e A r r = c e l l ( r o w s + 1 , 1 ) ; f o r row = 1 : 1 : r o w s

i m a g e A r r ( row , 1 ) = { image ( ( row −1) ∗ n r o w s + 1 : row ∗ nrows , &

→ : ) } ; end

i m a g e A r r ( r o w s + 1 , 1 ) = { image ( r o w s ∗ n r o w s + 1 : end , : ) } ; end

(46)

C.7 supMean

Calculates the mean intensity of each superpixel in an image.

f u n c t i o n [ meanArr ] = supMean ( image , l a b e l s )

% C o m p u t e s t h e mean i n t e n s i t y o f e a c h s e t i n l a b e l s and &

→ s t o r e s i t

% i n meanArr . W i l l o n l y work f o r g r e y s c a l e i m a g e s .

n = max ( max ( l a b e l s ) ) ; % t h e number o f s u p e r p i x e l s we h a v e

meanArr = z e r o s ( n , 1 ) ; % i n i t i a l i s a t i o n

f o r i = 1 : n

temp = image ( l a b e l s == i ) ;

meanArr ( i ) = sum ( temp ) / s i z e ( temp , 1 ) ;

end end

C.8 supMid

Calculates the midpoint of each superpixel by taking the median of the included pixels loca-tions in both x- and y-direcloca-tions.

f u n c t i o n m i d p o i n t s = supMid ( l a b e l s )

n = max ( max ( l a b e l s ) ) ; % number o f s u p e r p i x e l s

m i d p o i n t s = z e r o s ( n , 2 ) ; % i n i t i a l i s a t i o n

f o r i = 1 : n

[ row , c o l ] = f i n d ( l a b e l s == i ) ; % f i n d l o c a t i o n s o f &

→ a l l e l e m e n t s e q u a l t o i i n l a b e l s row = s o r t ( row ) ;

m i d p o i n t s ( i , 1 ) = median ( row ) ; % g e t row−median

m i d p o i n t s ( i , 2 ) = median ( c o l ) ; % g e t column−median

end end

Mathematical Analysis of Intensity Based Segmentation Algorithms with Implementations on Finger Images in an Uncontrolled Environment

School of Education, Culture and Communication

Division of Applied Mathematics

BACHELOR THESIS IN MATHEMATICS / APPLIED MATHEMATICS

Mathematical Analysis of Intensity Based Segmentation Algorithms with

Implementations on Finger Images in an Uncontrolled Environment

by

Lisa Svens

DIVISION OF APPLIED MATHEMATICS

School of Education, Culture and Communication

Division of Applied Mathematics

Contents

List of Figures

Chapter 1

Introduction

1.1

Question/Goal

1.2

Previous works

1.3

Literature review

Chapter 2

Preliminaries on my topic

2.1

Gaussian blurring

2.2

Gaussian unsharp masking

2.3

Active contour/region growing methods

2.4

Morphological operations

2.5

k-means clustering

2.6

Ncut

Chapter 3

Results and Conclusions

3.1

Results

3.2

Alternative solutions

3.3

Conclusions

Bibliography

Appendix A

k-means clustering

A.1

Algorithm

∑

∑

∑

A.2

Convergence

∑

A.2.1

Comments on mathematics in an article

A.3

k-means examples

A.3.1

Simple example

A.3.2

Image example

Appendix B

Normalised Cut

B.1

Algorithm

∑

∑

∑

∑

B.2

Computing the partition

∑

∑

∑

∑

∑

∑

∑

∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑

_∑