Clustering Methods as a Recruitment Tool for Smaller Companies

(1)

IN

DEGREE PROJECT MATHEMATICS, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2020

Clustering Methods as a

Recruitment Tool for Smaller

Companies

LINNEA THORSTENSSON

(2)

(3)

Clustering Methods as a

Recruitment Tool for Smaller

Companies

LINNEA THORSTENSSON

Degree Projects in Mathematical Statistics (30 ECTS credits) Master's Programme in Applied and Computational Mathematics KTH Royal Institute of Technology year 2020

(4)

TRITA-SCI-GRU 2020:077 MAT-E 2020:040

Royal Institute of Technology

School of Engineering Sciences KTH SCI

(5)

iii

Abstract

(6)

(7)

Klustermetoder som ett verktyg

i rekrytering för mindre företag

Sammanfattning

Ny teknologi har förenklat processen för att söka arbete. Detta har resulterat i att företag får tusentals ansökningar som de måste ta hänsyn till. För att för-enkla och påskynda rekryteringsprocessen har många stora företag börjat an-vända sig av maskininlärningsmetoder. Mindre företag, till exempel start-ups, har inte samma möjligheter för att digitalisera deras rekrytering. De har oftast inte tillgång till stora mängder historisk ansökningsdata. Den här uppsatsen undersöker därför med hjälp av topologisk dataanalys hur klustermetoder kan användas i rekrytering på mindre datauppsättningar. Den analyserar också hur abstraktionsnivån på datan påverkar resultaten. Metoderna visar sig fungera bra för jobbpositioner av högre nivå men har problem med jobb på en lägre nivå. Det visar sig också att valet av representation av kandidater och jobb har en stor inverkan på resultaten.

(8)

(9)

Chapter 1 Introduction

New online technology has made it much easier to apply or a job. Nowa-days anyone can apply for a job anywhere in the world. While job advertise-ments now reach an ever-growing audience, it also results in companies getting thousands upon thousands of applications to consider. A study made in 2010 showed that independent on the industry, on average only 1 out of 120 ap-plicants gets selected [1]. A large amount of apap-plicants creates a burden in the recruitment process when the recruiters need to spend a lot of time go-ing through irrelevant candidates. As a result of this it has become popular to outsource this part of the recruiting process to an external recruiting firm. Although this can save the company a lot of time, it can on the other hand also be very costly.

Human resource management is often called the backbone of the organisa-tion and an organisaorganisa-tions success is closely related to the type of individuals it employs [2]. Therefor it is of great importance to put the right person in the right job. Another problem that has aroused lately in the human resource management industry is the so called recruitment bias problem. This refers to when a recruiter makes poor decisions due to prejudices or preconceived no-tions. This is a very relevant subject since a poor decision in the recruitment process is a huge loss in both time and money. At the same time, choosing the right candidate for the job does not only save time and money but can also be a great investment for the company.

With the new technology and rising attention and research for artificial in-telligence and statistical learning methods, the digitalisation of the human re-source management, so called e-recruitment, has become the new solution to

(12)

2 CHAPTER 1. INTRODUCTION

the problems mentioned before. Studies have shown that e-recruitment sys-tems performed consistently compared to humans [3]. So by letting statistical algorithms take care of the early stages in a recruitment process, companies can not only save money and time but also avoid to miss out on relevant can-didates due to prejudices and human bias selection [4]. For larger companies with a lot of historical applicant data it is easier to train statistical algorithms to perform well, since models trained on small data sets tend to give fragile results [5]. Smaller and younger companies, for example start ups, do not al-ways have access to historical or big data sets of applicant information due to confidentiality for example. This can result in problems when introduc-ing statistical learnintroduc-ing methods as a tool in their recruitment process. Also, the needs for smaller companies might be different than for larger companies for instance when it comes to desired candidates. Another problem that has aroused in e-recruitment is how the strategies used to categorise keywords to job categories tend to fail [6]. Let us for example look at three different can-didates, one having the skill of the programming language Java, one having Python and a third not having any programming skills. If we then have a job announcement for a Java-developer and we only look at the skills of our can-didates, only the first will be a match to that job. This means that the second candidate, also being a developer, is considered as irrelevant as the third can-didate which is not a developer. But if we instead compare job categories, the first and second candidate will be categorised as Developers. Now the first and second candidate will both be considered as relevant for the job and only the third candidate is classified as irrelevant. This is one example of how the categorisation and representation in e-recruitment can influence the results. To make categorisations some kind of tool, for example a taxonomy database, is needed. This enables both to find relations between concepts and also the possibility to compare the data on different levels of abstraction. In this thesis, a job taxonomy provided by the Swedish public employment service will be used and it is further explained in section 2.2.

1.1 Problem formulation

(13)

represen-CHAPTER 1. INTRODUCTION 3

tation impact the results in the analysis. This results in the following research questions:

1. How can clustering methods be used as an e-recruitment tool for small data sets?

(14)

Chapter 2 Background

Already since the 1970’s job categorisation has been studied and in the past decade internet and technology has come to play a crucial role in recruitment processes [8][4]. The newest building block in recruitment is artificial intelli-gence, AI, which in 2018 was one of the most embraced trends among hiring professionals [9]. There are many ways of implementing AI in a recruitment process. Some examples are candidate engagement, scheduling, employee and career development, and screening of candidates [10]. In the screening part of the recruitment process which refers to the part where recruiters screen all applications and appoint candidates to move forward to the next step in the process. This part of the process, especially with a lot of applications, can be time consuming and contains a lot of repetitive tasks for the recruiter. The main idea of using AI in recruiting is to support the recruiters, not replace them, and increase the efficiency of the process. While an e-recruitment sys-tem can process large volumes of applicants it has also been shown that it can speed up the process without losing quality [9]. Generally speaking, using e-recruitment in the screening process refers to letting a computer find the right candidates for the job.

2.1 Matching Algorithms

Matching algorithms are widely used today and is most common in commer-cial applications, such as product recommendations. For example in 2006, Netflix released a data set and handed out a prize to the data scientists who could build a matching algorithm that performed better then the already ex-isting [11]. Another area where the use of matching algorithms has exploded lately is in the online dating service. In the United States 40 million out of 54

(15)

CHAPTER 2. BACKGROUND 5

million singles are using online dating sites [12]. The most common matching methods are called recommendation systems and can be used for both item-user or item-user-item-user matching, where the latter case is used in dating. But it has also been shown that recommendation systems can result in a lot of bias in the matching, resulting in the popular items/users getting more and more rec-ommendations [13]. This rule of popularity however does not apply on re-cruitment. In recruitment we want to consider all candidates, regardless of their historical matches. Because of this other approaches are suggested to be used in recruitment, for example scemantic approaches, genetic algorithms, decision trees and cluster analysis [14].

2.2 Taxonomy

Matching candidates and jobs is a complex task and a common approach to handle this is by using categorisation and classification of jobs and concepts in the labour market [8]. In 2015 Malherbe, Cataldi, and Ballatore [6] proposed a data-driven similarity comparison for job categorisation which showed sev-eral improvements leading to a more effective categorisation. Taxonomy is the science of classification and is commonly used in the sense of classifying things or concepts. Using a taxonomy scheme in the recruitment area, is one way of handling the problem of classification and categorisation of jobs and concepts.

2.2.1 JobTech

JobTech is a digital platform provided by the Swedish public employment service, with the main purpose to improve the functioning of the labor mar-ket in Sweden. JobTech can be loosely described as an ecosystem for the different concepts in the labor market and is a tool provided to simplify and increase digitalisation and data accessing regarding employment. Some ex-amples of usage are job advertisements, informatics and statistics.

(16)

6 CHAPTER 2. BACKGROUND

occupation classes or to classify occupation names to different classes. This is interesting for us since we want to study how the classification of concepts influence the results when matching candidates and jobs. The JobTech tax-onomy enables this comparison of different abstraction levels.

(17)

Chapter 3 Theoretical Background

3.1 Topological data analysis

Topological Data Analysis (TDA) is a relatively new field in data analysis. It focuses on the shape of the data, determined by proximity relations between data points as embedded in a metric space. In order to extract the shape prop-erties, TDA methods represent the data as simplicial complexes or by graphs, i.e. a set of vertices (which correspond to sets of data points) and edges be-tween these vertices. It is a powerful tool for finding patterns in the structure of data sets [7]. In this thesis TDA will be used to analyse the data and the meth-ods used on the data. Moreover, the mapper algorithm in section 3.1.6 will be used as an explanatory tool, to give a better understanding of the data and the analysis. In this chapter some relevant definitions in the field of TDA are stated. This intents to give a more fundamental understanding and description of the topological approach used in the data analysis.

3.1.1 Graphs

Definition 3.1.1. A graph G is defined by a set V and a subset E of V × V

of pairs of elements in V. An element v ∈ V is called a vertex and an element e ∈ E is called an edge. If e = (v, w) we say that the edge e is connected to the vertices v and w. If e = (v, v), it is called a loop [17].

A graph can be directed or undirected. In a directed graph the order of the pair of vertices defining an edge is important. The edge e = (v, w) is in fact directed from v to w. In an undirected graph the edges are defined by undirected pairs of vertices. An undirected graph where every pair of vertices

(18)

8 CHAPTER 3. THEORETICAL BACKGROUND

is connected by an edge is called a complete graph. In figure 3.1 an example of an undirected graph and a complete undirected graph is shown.

Figure 3.1: Example of a graph to the left (blue) and a complete graph to the right (red).

Convex set

A convex set, C, is a subset of an affine space over the reals with the property that any line segment joining any two points lies entirely within the C. The intersection of all the convex sets that contain a given subset A of an affine space is called the convex hull of A. It is the smallest convex set containing A [18].

3.1.2 Distances

A distance is a function that satisfies both reflexivity and symmetry. If a dis-tance, in addition to reflexivity and symmetry also fulfills the triangle inequal-ity it is called a pseudo metric.

Definition 3.1.2. A distance, d, is defined as the function

d : X × X → [0, ∞], such that for any x, y ∈ X,

d(x, y) = d(y, x) d(x, x) = 0

Definition 3.1.3. The Triangle inequality is defined as

(19)

CHAPTER 3. THEORETICAL BACKGROUND 9

3.1.3 Simplices

Definition 3.1.4. Consider an affine space Ak_{and n + 1 affinely independent} vertices, {v0, ..., vn} in this space. The simplex, σ, generated by such vertices is the convex set of Ak given by the set of points

σ = {w1x1+ ... + wn+1xn+1 | Pn+1_i=1 wi = 1, wi ≥ 0 ∀i}

The edges of a n-simplex are represented by a complete graph with n + 1 vertices. For example a 0-simplex is a point or vertex, a 1-simplex is a line or edge and a 2-simplex is a triangle, which is shown in figure 3.2. The convex hull of any nonempty subset of the n + 1 points that define an n-simplex is also a simplex.

Figure 3.2: Example of a 0-simplex, 1-simplex and 2-simplex.

3.1.4 Face

Definition 3.1.5. Any non-empty subset A of a point set {p0, p1, ..., pd} spans a simplex σA⊆ σ, where σAis the face of σ [19].

This implies that faces are simplices themselves.

3.1.5 Simplicial Complexes

Simplicial complexes are sets composed of simplices. Simplicial complexes are often used to capture the shape of the data as defined by proximity relations between data points. We can think about a simplicial complex as a set of subsets of data points with a visual representation.

Definition 3.1.6. A simplicial complex, K, is defined as a set of simplices such

that,

1. If σ ∈ K, then for any face σ0 of σ, σ0 ∈ K

(20)

Abstract simplicial complexes

An abstract simplicial complex, K, on a finite set of vertices, V = {v1, ..., vn}, is a non-empty subset of the power set of V so that the simplicial complex is closed under the formation of subsets [20]. In this way we do not need to think about the embedding of the complex in space and an abstract simplicial com-plex is in other words a simplicial comcom-plex without the associated geometric information.

Definition 3.1.7. An abstract simplicial complex is a finite collection of sets

K such that if a set σ ∈ K, then for any subset σ0 ⊆ σ, σ0 _{∈ K [19].}

Clique complexes

There is a straight forward construction that can be used to associate a sim-plicial complex to a graph and this is through a so called clique complex. A clique complex of a graph G = (V, E) is an abstract simplicial complex. It has the complete subgraphs as simplices and the vertices of the graph, G, as its vertices so that it is essentially the complete subgraph complex [20].

Definition 3.1.8. The clique complex of a graph, G, is defined as the simplicial

complex with all complete subgraphs of G as its faces.

3.1.6 Mapper

Mapper is a mathematical tool that is used for simplification and visualisation of high dimensional data. Mapper uses topological data analysis to build a geometric representation, a simplicial complex, of the data which can be used both for a better understanding of the data but also as an explanatory model for machine learning algorithms. The main idea is to apply one or several filter functions on the data and then apply any clustering algorithms on subsets of the data.

Given a data set X with N points x ∈ X, define one or several filter functions fi such that,

f : X → Z

(21)

the percentage of the overlap, p.

For each interval Ij ∈ S we define the set of points Xj such that, Xj = {x | f (x) ∈ Ij}.

And for each of these sets, Xj, we find clusters {Xjk} by applying any clus-tering algorithm, for example k-means clusclus-tering or hierarchical clusclus-tering. Based on these clusters, we build a simplicial complex where every cluster becomes a vertex and all non empty intersections between clusters becomes edges, i.e.

{Xjk} → vertex Xjk ∩ Xlm 6= ∅ → edge

This obtained graph, which we can call the mapper graph, is the 1-skeleton of the mapper simplicial complex which is then defined as its clique complex. It is worth to note that the simplicial complex structure of the Mapper output is rarely exploited in applications, instead it is the Mapper graph which is com-monly used for visualisation and data analysis purposes [21]. In this thesis the mapper algorithm will be used as a tool for analysis of the data set and as a comparison and option to the other standard methods.

3.2 Clustering analysis

Cluster analysis is a data analysis tool for grouping similar data points into clusters based on the observed values of variables for each data point. It max-imises the similarity of data points within each cluster while also maximis-ing the dissimilarity between clusters. It is an unsupervised learnmaximis-ing method which means that no prior information about the partitioned data or the cluster membership is needed [22]. In this thesis two different clustering algorithms will be used, k-means clustering and hierarchical clustering.

3.2.1 K-means clustering

(22)

following objective function:

J = N X i=1 K X k=1 wi,k||xi − µk||2 (3.1) where, wi,k = ( 1, _{if x}ibelongs to cluster k 0, _otherwise (3.2)

The k-means algorithm is described in detail in algorithm 1. K-means cluster-ing can be rather slow since it computes the euclidean distance for all points in every iteration and it also important to note that due to the random initialisation of centroids, it may not converge to a global optimum [23].

Algorithm 1 K-means clustering

1. Specify number of clusters K 2. Initialise some random points µk 3. Minimise J w.r.t µk, keeping wi,k fixed:

wi,k =    1, _{if k = argmin} j ||xi− µj||2 0, _otherwise

4. Minimise J w.r.t wi,k, keeping µkfixed:

µk = PN

i wi,kxi PN

(23)

3.2.2 Hierarchical clustering

Hierarchical clustering is, as the name states, a clustering method that builds a hierarchy of clusters. It can be either top-down or bottom-up. A top-down ap-proach starts with all data points in the same cluster and then proceeds by split-ting clusters recursively until all clusters are singletons. A bottom-up method, also called agglomerative, works the other way around starting with all data points in their own cluster. It proceeds by merging together clusters based on some predefined metric or similarity. The agglomerative algorithm is de-scribed in algorithm 2.

Algorithm 2 Agglomerative Hierarchical Clustering

1. Initialise with all data points being its own cluster C = {c1, c2, ..., cN} 2. Calculate the distance d(ci, cj) between all clusters.

3. Merge the clusters, cm, cn, with smallest distance. m, n = argmin

i,j

d(ci, cj)

4. Repeat step 2 - 3 until all points are in the same cluster (or until predefined K number of clusters are obtained).

Linkage

Linkage in hierarchical clustering refers to the dissimilarity between sets as a function of distance between pairs of points. In other words, the linkage defines the rules for which clusters to be merged in each step in the clustering algorithm. One famous linkage method was introduced by Joe Ward and it minimises the variance within clusters. This method is named after him and called ward linkage [24]. The distance used in ward linkage is the squared Euclidean distance between points, d(xi, xj) = ||xi− xj||2.

Dendrogram

Dendograms are often used as visualisation tools for hierarchical clustering. A dendrogram is a graph with three kinds of vertices:

1. Vertices called objects or data points.

2. Vertices called nodes which corresponds to clusters. 3. One vertex called root node.

(24)

called objects are singletons and are clusters that only contain one data point. The nodes are clusters with more than one data point but do not contain all data points.

3.3 Dimensionality reduction

Dimensionality reduction is the process of reducing random variables to a set of principal variables [25]. In high dimensions, data can often be very sparse and dissimilar in many dimensions which can result in that organisa-tion strategies become inefficient and give poor results. In machine learning such problems are often referred to as the curse of dimensionality [26]. But it is important to note that losing dimensions also means losing information. In principle with dimensionality reduction techniques, one would like to lower the dimension while preserving essential information.

3.3.1 Principal Component Analysis

Principal componen analysis, PCA, can be used as a dimensionality reduction method. It aims to find a low-dimensional representation of the data set that contains as much of the variation in the data as possible. It works by projecting the data on a chosen number of principal components. The principal compo-nents lies in the directions of maximum variance. Moreover, maximising the variance is the same as finding the projection with the smallest mean-squared distance between the original data points and their projections on the principal components [27].

Proof. Given a data set X with n data points, xi and d features we make the assumption that the data has been centered, i.e. all variables have zero mean. The covariance matrix, V , is then given by XTX = nV . We then start by looking at the projection in one dimension along the line represented by the unit vector p. The projection of a data point on to that line will be (xi · p)p. We then want to minimise the mean square distance, MSE, with respect to p:

min_p M SE(p) = 1 n( n X i=1 ||xi||2− n X i=1 (p · xi)2

(25)

This is the sample mean of (p · xi)2. The mean of a square is always equal to the square of the mean plus the variance, resulting in 1n

Pn i=1(p · xi) 2 ₌ (_n1Pn i=1p·xi) 2_{+V AR[x}

i·p]. The mean of the projection will be zero implying that minimising the MSE is the same as maximising the variance.

The variance is given by pTV p and we want to maximise it with the con-straint that pTp = 1. This results in the following optimisation problem:

max

p p

T_{V p}

s.t. pTp = 1

(3.3)

This can be solved with lagrange relaxation, by introducing the lagrange mul-tiplier λ:

max

p L(p, λ) = p T

V p − λ(pTp − 1) _(3.4) Solving this problem will result in the two following equations:

pTp = 1 _(3.5)

V p = λp _(3.6)

(26)

Chapter 4 Methods

This section covers the methods used in this thesis. An explanation of the data set studied is given. This is followed by the representation of the data in the different dimensions together with the dimensionality reduction. A step by step presentation of the data analysis implementation is given. Lastly we present a short description of how the final proposed candidates are chosen.

4.1 Data

The data set studied in this thesis contains 35 data points, where 30 of the data points are representing candidates, C = {c1, c2, ..., c30} and five of them rep-resenting job advertisements, J = {j1, j2, ..., j5}. The small size of the data set allows us to have a better understanding of the data and the significance of the results. Since there is no prior knowledge of correct matches between jobs and candidates that can be used to evaluate the results in this study it is impor-tant to have a deeper knowledge of the observations in the data set. A deeper understanding of the data is easier to have for a smaller data set for which it is possible to check the variables manually for every data point.

The data points are represented by vectors in four different categories skills, experiences, occupations and personality traits. The four categories represent different aspects to consider when recruiting people. The occupation category in this study represents occupations the candidates are interested working in. This category is not important when selecting candidates that has applied for a job, since they then already have shown interest in that particular job. The data set studied in this thesis is however a set of candidates looking for an em-ployment in general and in this case their interest of job, which is represented in the occupation vector, can increase the accuracy of the match.

(27)

CHAPTER 4. METHODS 17

4.1.1 Skills

Both candidates and jobs are represented by a set of skills together with a level of knowledge for each skill. In the JobTech taxonomy there are more than 7000 different skills but in this thesis a subset containing 1041 skills will be considered, this is because the candidates in our data set only could choose between skills in this smaller set. Examples of skills are project management, Java, programming language, customer support or Ms Excel just to name a few. There are five different knowledge levels for a skill, which are normalised to a number between 0 to 1 to form a level vector. An example of skill data for a candidate/job is shown in table 4.1 and the translation of levels is represented in table 4.2. Skill Level MS Excel 1 Law of contract 1 Staff responsibility 2 Telesales ₃ Customer service 2

Table 4.1: Example of skill data for a candidate or job.

Level Level in vector

0 not having the skill 0

1 basic knowledge of skill 0.25 2 medium knowledge of skill 0.5 3 good knowledge of skill 0.75 4 excellent knowledge of skill 1

Table 4.2: Levels of skills

This results in a vector representation of the skills where a candidate/job is represented by two vectors, one skill vector and one skill level vector. The skill vector for a candidate will be denoted as cskilland for a job as jskill, with corresponding skill level vectors, cskill,leveland jskill,level.

∀i ∈ Skills cskill_i , j_iskill= (

(28)

18 CHAPTER 4. METHODS cskill,level_i , j_iskill,level =               

0.25, _{if candidate/job has/requires basic knowledge of skill i} 0.5, _{if candidate/job has/requires medium knowledge of skill i} 0.75, _{if candidate/job has/requires good knowledge of skill i} 1, _{if candidate/job has/requires excellent knowledge of skill i} 0, _otherwise

4.1.2 Experiences

Experiences refers to earlier employments for a candidate and relevant em-ployments for a job advertisement. The experiences is a set of 3252 different occupations in the taxonomy. This results in the candidate vectors, cexp, and job vectors, jexp, having a dimension of 1 × 3252. The levels of experiences are different between candidates and jobs. For a candidate the level of experi-ence is given in years of employment in that occupation. For a job it is given by a number between 0 to 5 which intends to reflect the required level of ex-perience. Example of experience data for a job and a candidate can be found in table 4.4 and table 4.3.

Experience Level (years)

Waiter / waitress 2 Graphic Designer 5 Project Manager 9 Table 4.3: Example of a candidate

Experience Level

Software Developer / Programmer 4

UX-designer 2

Test conductor / QA lead 1 Table 4.4: Example of a job

To obtain the resulting level vectors for our experience data, the translation scheme in table 4.5 was used. Resulting in two vector representations for each candidate and job, the experience vectors, cexprespectively jexp, and level vec-tors, cexp,level respectively jexp,level.

(29)

Candidate level Job level Level

0 years 0 0 >0 years 1 0.2 >1 years 2 0.4 >3 years ₃ 0.6 >5 years 4 0.8 >10 years 5 1

Table 4.5: Experience level translation scheme

cexp_i , j_iexp = (

1, _{if candidate/job has/requires experience i} 0, _otherwise cexp,level_i , j_iexp,level =                     

0.2, _{if candidate/job has/requires more than 0 years / level 1} 0.4, _{if candidate/job has/requires more than 1 years / level 2} 0.6, _{if candidate/job has/requires more than 3 years / level 3} 0.8, _{if candidate/job has/requires more than 5 years / level 4} 1, _{if candidate/job has/requires more than 10 years / level 5} 0, _otherwise

4.1.3 Occupations

Occupations is the same set of 3252 variables as in experiences. For a job the occupation is the title of the job, hence a job will only be represented by one occupation variable. For the candidates on the other hand the occupation data is representing occupations that the candidates are interested in. This results in the two following vectors coccand joccwhere for every occupation i,

cocc_i = (

1, _{if candidate is interested in occupation i} 0, _otherwise

j_iocc = (

1, _{if job title equals occupation i} 0, _otherwise

4.1.4 Personality traits

(30)

20 CHAPTER 4. METHODS

that personality traits are linked to a specific occupation but is on the other hand something that is often considered important in recruitment [28]. In this study a set of 195 different personality traits are used where all of them belong to one out of seven main categories. The personality vectors for candidates and jobs are cpersonalityrespectively jpersonality and for every trait i:

cpersonality_i , j_ipersonality = (

1, _{if candidate/job considered having/requiring trait i} 0, _otherwise

4.1.5 Job descriptions

The set of jobs contains five different job announcements. These are as de-scribed above, represented in the same way as the candidates. To get a better understanding of the analysis and as a tool to use when reviewing the results, a short description of the jobs is provided below.

Job 1

The first job, referred to as job 1, is a job advertisement for a web designer. There are no requirements for having a lot of years of experience in the area but some experience of system development or UX design is preferred. Some desired skills for the job are Wordpress, JavaScript and CSS.

Job 2

Job 2 is a developer position where they are looking for someone with some more years of experience as a developer. Also experience of working as a project manager within IT and as a test manager / QA lead is wanted. This results in skills such as Scrum, Java, programming and software testing being desirable. Compared to job 1, this job is looking for a more experienced, senior developer.

Job 3

(31)

Job 4

Job 4 is a position as a receptionist and only basic level of experiences and skills are eligible. Basic level of experience of reception, service and sale support is preferred.

Job 5

The fifth job is also only requiring basic level and not several years of expe-rience. The job is a position as a recruiter and therefor also skills such as recruiting and sale are desired. Basic level jobs like this one will also be re-ferred to as low level jobs in this thesis.

4.2 Dimensions

To study how the results of the analysis depends on the dimensions of the data, the analysis was performed in the following spaces: All concepts space and Main category space. All concepts space is the space where the data is expressed in all original concepts while the main category space refers to the space where the taxonomy has been used to map the original concepts to the existing 401 different job categories. Furthermore we also studied how the data expressed in these two formats behaves in even lower dimensions by re-ducing the data to the dimensions of the job. In other words, where the vector representing a job is non zero. Also, principal component analysis was applied to reduce the data to two dimensions. In this section a further explanation of the different spaces in which the data analysis was performed is presented.

4.2.1 All concepts

(32)

4.2.2 Main categories

The main category vector space was obtained through the JobTech taxon-omy described in section 2. It is based on the SSYK classification code which relates concepts to job categories and in this thesis all the 401 categories pro-vided by the taxonomy was used. All basic concepts are related to one or several main job categories. From this a transition matrix, Tmain, was con-structed. The transition matrix has the basic concepts as columns and the main job categories as rows. For every concept, i, and main category, j we have:

T_i,jmain = (

1, _{if i is related to j} 0, _otherwise

To map a candidate to the main category vector space we simply just need to multiply it by the transition matrix giving us the following:

cmain = Tmain· c _(4.1) This setting is referred to as the Main category space, M. An example of how the concepts can be mapped to one or several main categories is shown in figure 4.1. A candidate or job that has the skill/experience Java is considered experienced in Software developing in the main category setting, while a per-son that has the skill MS Excel is considered experienced in both Accountant and Administration.

4.2.3 Mapped to job

When matching a candidate to a job it can be of interest to only study the skills and experiences required for that specific job. What arguments for mapping candidates to the job is that other skills or experiences outside the scope of the job should not contribute to a greater dissimilarity between a candidate and a job. But on the other hand, overqualified candidates and candidates that matches other jobs better might have a greater chance at being selected. When mapping candidates to the job we can both use the vectors in the original space and the vectors in the main category space. To map a candidate to a job we multiply the candidate by the transpose of the job vector. Both the candidates represented in the original space and in the main category space can be mapped to the job.

[c]job = jT · c _(4.2) [cmain]job = (jmain)T · cmain

(33)

Figure 4.1: Example of concepts mapped to main categories.

4.2.4 PCA

Principal component analysis, PCA, was used as a dimensionality reduction tool where we studied the data expressed in its first and second principal com-ponents. By reducing the data to only two dimensions, visualisation of the analysis is much easier and it can contribute to a better understanding of the results. However, reducing data to only two components may also suffer from a lot of information loss. PCA was used on data in the space of All concepts, in the space of Main categories and both of those spaces mapped to the job. PCA was also used as the filter function in the Mapper algorithm.

4.3 Analysis

(34)

procedure was carried out for the five different jobs, considering all candidates and one job at a time. For every job, four different representations of the data were considered: All concepts, Main categories, All concepts mapped to the job and Main categories mapped to the job. These different settings are fur-ther explained in section 4.2. For each of these representations of the data, the following analysis were made:

Step 1 - Preparation and Dendrogram

When doing the analysis in the space of All concepts, no further preparation of the data was needed. But before the analysis was performed in the space of Main categories, the data was transformed using the transition matrix pre-sented in section 4.2.2. In addition the data was also mapped to the job, using the job vector, when the analysis was performed on data in the space of the job. When having the data in the right space, a dendrogram was constructed using hierarchical clustering with ward linkage explained in section 3.2.2.

Step 2 - Clustering

Next step was to perform clustering methods on the data. Two different cluster-ing methods were used, Hierarchical clustercluster-ing with ward linkage and k-means clustering. Both clustering algorithms are explained in section 3.2. The num-ber of clusters was decided based on the dendrogram constructed in step 1.

Step 3 - PCA

After performing cluster analysis, principal component analysis was carried out. The data was reduced to the two first principal components, according to the procedure explained in section 3.3.1. After that the data was plotted in a two dimensional plot with the principal components on the axes. In the PCA plot the points were colored according to the clustering results in step 2. This results in two plots, one with coloring based on the Hierarchical clustering and one with coloring based on the k-means clustering.

Step 4 - Mapper

(35)

4.3.1 Analysis of personalities and occupation of

in-terest

To investigate the impact or relation of personalities and occupation of interest for the candidates an analysis using mapper was implemented. The mapper graphs are constructed as described in the previous section but the coloring of the nodes was based on either personality or occupation of interest instead. For both of the cases, two mapper graphs were constructed. One with hierarchical clustering and one with k-means clustering. Then the color of the nodes are calculated using principal component analysis on the personality data or the occupation of interest data. The idea of this kind of analysis is to see if some parts of the mapper graph get a specific color which implies that those data points that are similar in experience and skills, also share the same personality or occupation of interest.

4.4 Proposing candidates

To decide on a set of proposed candidates for each job some rules for scoring needed to be introduced. To be able to compare and see how the set of pro-posed candidates changes, three different approaches were used. The propro-posed candidates based on the mapper graph results, the proposed candidates based on the clustering results and the proposed candidates based on the intersection between the two. After the scores were calculated they were normalised using min-max scaling and a candidate was proposed as relevant for the job if the normalised score was larger than 0.5.

Mapper

For the proposed candidates based on the mapper graph the number of nodes shared with the job were calculated. In other words, studying a candidate we counted the number of nodes in the Mapper graph that the candidate shared with the job. This was calculated in all dimensions and the final result was the sum over all the dimensions. For dimensions, d, clustering method, m and nodes, ni, containing the job, and the candidate, c, we have

scorec= 4 X d=1 2 X m=1 N X i=1 1ni(c) (4.4)

(36)

Clustering

To calculate the score for the candidates based on clustering analysis a similar procedure was used. Instead of summing over the nodes with the job, we sum over the clusters with the job as a member. So for every dimension, d, clustering method, m, cluster containing the job, ki, and candidate, c, we have,

scorec= 4 X d=1 2 X m=1 N X i=1 1ki(c). (4.5) Intersection

The intersection score is the score where both the mapper graph results and the clustering results is taken into account. For a candidate to get a score it needs to both be in the same node or nodes as the job in the mapper graph and be in the same cluster as the job. This results in the following score:

(37)

Chapter 5 Results

In this section the results from the analysis are presented. First the dendro-grams constructed with all jobs and candidates in the original space and Main categories are shown. Then the analysis for each job is presented in the follow-ing order: analysis made on data in All concepts, Main categories, All concepts mapped to the job and Main categories mapped to the job. In all PCA plots the job is marked with an ’x’. For each job a table of the clustering results is presented. How to interpret those tables is explained below.

Interpretation of the clustering results table

A summation of the clustering results of the different methods and spaces are presented in a table for each job. The column names are written in the format "X-Y" or "X-x-Y", where the X:s refers to the space of the data analysed and "Y" to the clustering algorithm used. The "A" stands for "All concepts" mean-ing the original data, "M" for "Main categories" which refers to the data in the main category space. "A-m" is the original data mapped to the job and "M-m" is the data expressed in Main categories mapped to the job. "-Km" refers to k-means used as clustering method and "-H" to hierarchical clustering. The grey highlighted cells are all the candidates that are in the same cluster as the job. Cells containing a star ’*’ represents a candidate that are in the same node as the job in the mapper graph. If it has two stars ’**’, that candidate is found in two of the nodes together with the job and so on.

(38)

28 CHAPTER 5. RESULTS

5.1 Dendrograms

In figure 5.1 the dendrograms achieved using average linkage in different map-pings are presented. The original space containing all skills and experiences and the Main categories which are the SSYK-level occupation categories. As seen in figures, there is a more clear cluster structure in the main category space.

(a) Original vector space (b) Main categories vector space

Figure 5.1: Dendrogram of all jobs and candidates where index 0 − 29 are the candidates and 30 − 34 are the jobs.

5.2 Matching candidates to a job

In this section results of the analysis on all candidates and one jobs are pre-sented. For each job, the results in the different dimensions are presented, i.e. the results of the analysis of the data in the space of All concepts, Main cat-egories, All concepts mapped to the job and Main categories mapped to the job.

5.2.1 Job 1

All concepts

(39)

CHAPTER 5. RESULTS 29

50% overlap. The mapper plots looks quite similar and contains rather similar node information which can be seen in the cluster results table in figure 5.13.

Figure 5.2: Dendrogram job 1 (’j1’) - data in all concepts.

(a) Hierarchical clustering (b) K-means clustering

(40)

(41)

Main categories

In the Main categories space more distinct clusters are present, as seen in figure 5.10. Even though there are not totally distinct clusters in the PCA plots, the result is better compared to the plots in the All concepts space. The mapper plots were constructed using six clusters and 60% overlap. The mapper graphs reflect the dendrogram pretty well with one bigger cluster where the job can be found.

Figure 5.5: Dendrogram job 1 (’j1’) - data in main categories.

(42)

(43)

All concepts mapped to job 1

The data in the space of All concepts mapped to the job results in a lot of candidates being represented by zero-vectors. This results in many candidates being represented by the same point in the PCA plots. Another result of this is that the mapper graph becomes uninformative, with all points in the same nodes. Therefor no mapper graphs are presented for this setting.

Figure 5.8: Dendrogram job 1 (’j1’) - data in all concepts mapped to job.

(44)

Main categories mapped to job 1

When mapping the Main categories to the job we get a more clear cluster structure. The PCA plots differ a bit between the clustering methods. In the mapper graphs this difference is also visible, where the k-means clustering gives a more connected graph. The mapper plots were constructed using six clusters and 70% overlap. The mapper graphs reflect both the dendrogram and PCA plots in this setting and we can note that the mapper graph with k-means clustering is a bit more connected.

(45)

Figure 5.11: PCA plot of data in main categories mapped to job 1 with color according to cluster.

(46)

Cluster results job 1

(47)

(48)

5.2.2 Job 2

All concepts

Just as seen for job 1 there are no distinct clusters when we represent the data in All concepts. In the dendrogram we can find the job, j2, in the biggest cluster. This is also the case in the PCA plots. The mapper plots are quite similar and the job is found in the largest connected component of the graph. The mapper plots were constructed using six clusters and 55% overlap. In the mapper graphs there are two different nodes containing the job, one larger and one smaller.

(49)

Figure 5.15: PCA plot of data in all concepts with color according to cluster -job 2.

(a) Hierarchical clustering

(b) K-means clustering

Figure 5.16: Mapper plot of data in all concepts with color according to eu-clidean distance to job 2.

Main categories

(50)

not result in more connections between the disconnected parts in the graphs.

(51)

(52)

In this setting the job is very distanced from all the other data points, which can be seen in both the dendrogram and the PCA plots. In the dendrogram candidates are clustered together before being clustered with the job. In the PCA plots the point representing job 2, marked as an ’x’, is far away from the other points. This also results in the mapper graphs not showing any results and they are not presented in this section.

(53)

The dendrogram in this setting is more informative compared to the previous and shows six distinct clusters. In the mapper graphs the nodes containing the job only contains the job and no other data point. They are also not connected to any other node in the graph. This is the small "ball" shaped complex with six yellow nodes in the mapper graphs.

Figure 5.22: Dendrogram job 2 (’j2’) - data in main categories mapped to job.

(54)

(a) Hierarchical clustering _{(b) K-means clustering}

Figure 5.24: Mapper plot of data in main categories mapped to job with color according to euclidean distance to job 2.

(55)

(56)

5.2.3 Job 3

All concepts

The results from the analysis of job 3 and the candidates in the space of All concepts are presented below. The results of the PCA plots are not showing any obvious clusters. From the clustering methods we see that the job ends up in the largest cluster. This is also seen in the mapper graphs where the job is to be found in one of the tails with the yellow nodes in the largest complex.

(57)

Main categories

(58)

clusters of candidates are represented by the smaller complexes in the mapper graphs and the bigger cluster is represented by the bigger complex, which is also the one containing the job.

(59)

Figure 5.31: Mapper plot of data in main categories with color according to euclidean distance to job 3.

The data in the space of All concepts mapped to the job results in only two dis-tinct clusters which is seen in figure 5.32. This results in the mapper graph just having a few nodes with all data points as members and therefor the mapper graph is not presented here.

(60)

Figure 5.33: PCA plot of data in all concepts mapped to job 3 with color according to cluster.

When mapping the candidates in the space of Main categories to job 3, we get a more interesting result than with the All concepts space. The dendrogram in figure 5.34 shows several clear clusters. The clustering in the PCA plots differ a bit between the two clustering methods and the difference can be seen in the mapper graphs as well. The mapper graphs were constructed with 70% overlap and 6 number of clusters.

(61)

(62)

commented in section 5.4.

(63)

5.2.4 Job 4

All concepts

As seen earlier, when analysing the data in the space of All concepts it results in one larger cluster and some outliers and smaller clusters. The same result can be seen in the figures below. The outliers in the dendrogram, candidate 16, 18 and 24, can also be found in the mapper graphs, represented by nodes or small complexes. The mapper graphs were constructed with 50% overlap and 6 clusters. In the mapper graphs the job is a member of the larger nodes, together with a lot of candidates. This might be an indication that a lot of candidates are suitable for the job or that the result is insignificant.

(64)

Main categories

(65)

cluster-CHAPTER 5. RESULTS 55

ing on the other hand, there is a larger and more connected complex containing the job.

(66)

(67)

(68)

Yet again when mapping the data to the job, the space of Main categories shows more interesting results then the space of All concepts. More distinct clusters are shown and in both the dendrogram and the PCA plots there are four candidates in the same cluster as the job. The mapper graphs are con-structed with 4 number of clusters and 50% overlap. The mapper graphs are very similar except for the graph constructed with k-means clustering being a bit more connected. K-means clustering also results in the nodes with the job being larger, i.e. containing more candidates.

(69)

(70)

(71)

5.2.5 Job 5

All concepts

The result of job 5 is very similar to the result of job 4 in the space of All concepts. Both jobs are of lower level and pretty general skills are required which might explain the similarity in the results. The mapper graphs were constructed with 6 number of clusters and 50% overlap. They have a very similar structure as the graphs for job 4 which can be seen by comparing figure 5.40 and figure 5.52.

(72)

(73)

Main categories

In the space of Main categories we see more distinct clusters in the dendro-gram. With 6 number of clusters, the job ends up in the biggest cluster. The mapper graphs are constructed with 6 number of clusters and 70% overlap. One interesting result is that in the PCA plots, k-means clustering generates a larger cluster but the opposite holds for the mapper graphs where hierarchical clustering generates a bigger complex.

(74)

As expected the results for job 5 in the space of All concepts mapped to the job are also bad. This is explained by a lot of candidates again being represented by zero vectors in this space. No mapper graphs are presented for this setting.

(75)

Figure 5.57: PCA plot of data in All concepts mapped to job 5 with color according to cluster.

The job can be found in the largest cluster but we still have clear clusters in the dendrogram. One thing to note is that the closest candidates in the dendro-gram has changed from the dendrodendro-grams obtained with the other spaces. The mapper graphs are constructed with 4 number of clusters and 70% overlap. The job is in the most yellow parts of biggest complex but is still only sharing nodes with a few candidates, this can be seen in figure 5.60 and in figure 5.61 where the clustering results are presented.

(76)

(77)

(78)

(79)

5.3 Occupation of interest and personality traits

In figure 5.62 a mapper graph of all the candidates and jobs are presented. The coloring of the nodes are according to the principal component analysis of the personality data. The figures does not show a specific color in any part of the graphs and the color of the members of the different nodes are very scattered. This implies that a personality can not be linked to a specific type of job. In figure 5.63 the mapper graphs have a more clear partition of the color of the members between the nodes but is still not distinct enough to make any further conclusions. For example we can see nodes of brighter colors in one of the tails. This might imply that there is a relation between a candidates occupation of interest and their previous experiences. However, in the present study we did not test for statistical significance of this relation.

(80)

Figure 5.63: Mapper plot with color according to principal component analy-sis of occupation of interest.

5.4 Proposed candidates

In this section the proposed candidates for the different jobs based on their mapper score, cluster score and intersect score are presented. For each job a short review on the relevance of the candidates proposed by the intersection scored based on their experiences and skills is also presented. The candidate skill data can be found in the appendix and can be used to compare the candi-dates with each other and the jobs.

5.4.1 Job 1

In table 5.1 we see the set of proposed candidates. As noticed there are more candidates proposed as relevant based on the clustering results. Looking at the raw data of the proposed candidates with the intersect score only candidate 30 seem to be very relevant. Candidates 8, 20 and 28 have some relevant skills, such as a programming language, and could be classified as slightly relevant. But both candidate 11 and 17 are irrelevant and have no skills applicable on the job. Some relevant candidates that are not part of the proposed set of candidates are candidate 4, 13, 14, 18 and 21.

Score type Proposed candidates

Mapper 11, 17, 20, 30

Clustering 4, 6, 7, 8, 9, 10, 11, 15, 17, 20, 21, 22, 23, 28, 29, 30 Intersect 8, 11, 17, 20, 28, 30

(81)

5.4.2 Job 2

For job 2 the proposed candidates are presented in table 5.2. Candidate 14 is the most relevant one based on the review of its skills but candidates 29 and 30 have also some relevant skills. In this set of candidates number 9, 12 and 23 does not seem to have any relevant skills but are still proposed as candidates.

Mapper 7, 14

Clustering 4, 7, 8, 11, 14, 17, 20, 28, 29, 30 Intersect 7, 9, 12, 14, 23, 29, 30

Table 5.2: Proposed candidates for job 2

5.4.3 Job 3

This result differs a bit from the previous jobs when looking at the number of candidates proposed. By looking at the skills data of the candidates, which can be found in the appendix, all of them (4, 8, 21) are relevant for this job. Candidates 6, 15 and 27 do also have relevant skills for this job but are not proposed with our methods.

Mapper 21 Clustering 4, 8, 21 Intersect 4, 8, 21

5.4.4 Job 4

(82)

Mapper 11, 17, 20

Clustering 1, 4, 8, 11, 17, 20, 21, 28, 30 Intersect 1, 8, 11, 17, 20

5.4.5 Job 5

The proposed candidates for job 5 are presented in table 5.5. This set of pro-posed candidates shares some of the same candidates propro-posed for both job 1 and job 4. Candidate 11 is the only one having a little relevance for this job while the rest of the candidates, by looking at their skills, seem irrelevant for this job. A candidate not showing in these result is candidate 10, which has relevant skills for this job.

Mapper 9, 11, 17, 20

Clustering 4, 7, 8, 9, 11, 14, 17, 20, 21, 23, 25, 28, 29, 30 Intersect 7, 9, 11, 17, 20, 28, 30

5.4.6 Comment on the proposed candidates

(83)

Chapter 6 Discussion

What can be noted in the results is that there is no clear difference between the two clustering algorithms. In most of the cases the results of the clusterings are very similar. So the choice of cluster method does not seem to have a greater impact on the results. The same holds for the mapper graphs which also do not seem to differ a lot between the two clustering methods. The mapper al-gorithm on the other hand seems to give a finer picture in the results. For all of the jobs, the set of candidates recommended by the mapper is smaller than the one retrieved from clustering. The mapper graphs also appear to reflect information about the data well except for the data in all concept space when mapped to the job. In the other settings, the job ends up in one part of the graph and is not appearing in many different scattered nodes.

The PCA plots do not show any distinct clusters in most of the cases. This implies that reduction of the data to two principal components suffers from in-formation loss. Since the dendrograms show more clear clusters which are not seen in the points plotted in the PCA plots, a conclusion that only two dimen-sions can not reflect the data very well can be drawn. This is not surprising since the data is very sparse and high dimensional. Persons and jobs are very complex to describe and two dimensions are probably not enough to capture the characteristics of the data. Another phenomenon that is frequently seen for all of the jobs, is that the analysis of data in the space of All concepts mapped to the job gives poor results. This is probably because the candidate vectors in the space of All concepts are very sparse and often have zero elements in the di-mensions of the job. Representing the data in the space of All concepts, which consists of over 4000 dimensions, shows to be too vague and not resulting in any distinct clusters. Better results were obtained after transforming the data

(84)

74 CHAPTER 6. DISCUSSION

to the space of Main categories which also decreases the space to about 400 di-mensions. Having the candidates represented in the space of Main categories gives opportunities for candidates having similar skills as required for the job to be proposed as relevant. For example a person knowing two programming languages but not the one required for the job, can probably learn the required programming language easily and should therefor be considered as a relevant candidate. This is also an argument for the main category space giving better results than the All concepts space. Looking at the clustering tables it is also seen that the results for the main category space give a more informative result than the all concept space. The all concept space seems to end up with a larger cluster with the job as a member. This is not very informative since we know that a lot of the candidates in the data set are not suitable for all of the jobs. In consideration of the Main categories giving better results, also the scoring technique of the candidates could be modified according to this fact. In ad-dition, our analysis shows that exploration of how the data behaves in even higher levels of abstraction can be an interesting future direction to pursue. The analysis of personality and occupation of interest does not yield any com-pelling results. The findings suggest that matching candidates on their skills does not imply a match on personality. This highlights the importance of spec-ifying personality traits in job advertisements if certain personalities are de-sired for the position. However, a disadvantage of this framework is that we can not draw any general conclusions for the labour market. To examine the possibility of personalities being linked to specific occupations and skills more data is needed. The data set in this study only contains 35 candidates and jobs and it is therefor difficult to draw any general conclusions for such a small set. Consequently it is relevant to further investigate this topic and perform a sim-ilar analysis on a larger data set.

(85)

CHAPTER 6. DISCUSSION 75

Clustering Methods as a Recruitment Tool for Smaller Companies

Clustering Methods as a

Recruitment Tool for Smaller

Companies

LINNEA THORSTENSSON

Clustering Methods as a

Recruitment Tool for Smaller

Companies

LINNEA THORSTENSSON

Abstract

Klustermetoder som ett verktyg

i rekrytering för mindre företag

Sammanfattning

Contents

Chapter 1

Introduction

1.1

Problem formulation

Chapter 2

Background

2.1

Matching Algorithms

2.2

Taxonomy

2.2.1

JobTech

Chapter 3

Theoretical Background

3.1

Topological data analysis

3.1.1

Graphs

3.1.2

Distances

3.1.3

Simplices

3.1.4

Face

3.1.5

Simplicial Complexes

3.1.6

Mapper

3.2

Clustering analysis

3.2.1

K-means clustering

3.2.2

Hierarchical clustering

3.3

Dimensionality reduction

3.3.1

Principal Component Analysis

Chapter 4

Methods

4.1

Data

4.1.1

Skills

4.1.2

Experiences

4.1.3

Occupations

4.1.4

Personality traits

4.1.5

Job descriptions

4.2

Dimensions

4.2.1

All concepts

4.2.2

Main categories

4.2.3

Mapped to job

4.2.4

PCA

4.3

Analysis

4.3.1

Analysis of personalities and occupation of