Oscar Melin

(1)

Matching Performance

Metrics with Potential

Candidates

A computer automated solution to

recruiting

OSCAR MELIN

K T H R O Y A L I N S T I T U T E O F T E C H N O L O G Y

I N F O R M A T I O N A N D C O M M U N I C A T I O N T E C H N O L O G Y

DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, FIRST CYCLE

(2)

Matching Performance Metrics

with Potential Candidates

A computer automated solution

to recruiting

Oscar Melin

2017-06-02

Bachelor’s Thesis

Examiner

Gerald Q. Maguire Jr.

Academic adviser

Anders Västberg

KTH Royal Institute of Technology

School of Information and Communication Technology (ICT) Department of Communication Systems

(3)

Abstract | i

Abstract

Selecting the right candidate for a job can be a challenge. Moreover, there are significant costs associated with recruiting new talent. Thus there is a requirement for precision, accuracy, and neutrality from an organization when hiring a new employee. This thesis project focuses on the restaurant and hotel industry, an industrial sector that has traditionally used a haphazard set of recruiting methods. Unlike large corporations, restaurants cannot afford to hire dedicated recruiters. In addition, the primary medium used to find jobs and job seekers in this industry often obscure comparisons between relevant positions. The complex infrastructure of this industry requires a place where both recruiter and job seeker can access a standardized overview of the entire labor market.

Introducing automation in hiring aims to better address these complex demands and is becoming a common practice throughout other industries, especially with the help of internet based recruitment and pre-selection of candidates. These solutions also have the potential to minimize risks of human bias when screening candidates.

This thesis aims to minimize inefficiencies and errors associated with the existing manual recruitment screening process by addressing two main issues: the rate at which applicants can be screened and the quality of the resulting matches.

This thesis first discusses and analyzes related work in automated recruitment in order to propose a refined solution suitable for the target area. This solution – semantic matching of jobs and candidates - is subsequently evaluated and tested in partnership with Cheffle, a service industry networking company. The thesis concludes with suggestions for potential improvements to Cheffle´s current system and details the viability of recruiting with the assistance of an automated semantic matching application.

Keywords:

Automation, Recruitment, Semantic Matching, Service industry, Hotel, Restaurant

(4)

(5)

Sammanfattning | iii

Sammanfattning

Att välja den rätta kandidaten för ett jobb kan vara en utmaning. Det finns dessutom betydliga kostnader i att rekrytera ny arbetskraft. På grund därav finns det ett behov för noggrannhet och neutralitet från en organisation vid rekrytering av ny personal. Detta examensprojekt fokuserar på restaurang och hotellbranschen. Denna branchsektor har traditionellt sett använt undermåliga rekryteringsmetoder. Till skillnad från stora företag så kan inte restauranger avvara resurser för egna rekryterare. Därtill så försvårar de primära medierna för rekrytering i sektorn jämförelser mellan relaterade lediga jobb. Denna komplexa infrastruktur skapar ett behov av en plats där både företag och arbetssökande har tillgång till en standardiserad översikt av hela arbetsmarknaden.

Introduktionen av automatisering har som syfte att bemöta dessa komplexa krav och blir alltmer vanligt inom andra branscher. Speciellt med hjälp av internetbaserad rekrytering och förval av jobbkandidater. Dessa lösningar har även potentialen att minimera risken för mänsklig subjektivitet och opartiskhet vid förval av jobbkandidater.

Detta examensprojekt har som syfte att minimera ineffektiviteter och fel samhörande med den nuvarande manuella rekryteringsmetoden genom att tackla två huvudproblem: takten i vilken förvalet av arbetssökande kan göras och kvaliteten av detta förval.

Detta examensprojekt inleder med en diskussion och analys av relaterade arbeten inom automatiserad rekrytering för att sedan presentera en möjlig lösning för det behandlade målområdet. Denna lösning – semantisk matchning av jobb och jobbsökande - är senare utvärderad och testad i samarbete med Cheffle, ett nätverksföretag inom serviceindustrin. Detta examensprojekt avslutar med lösningsförslag för potentiell förbättring till Cheffles nuvarande system och en slutsats om genomförbarheten av automatisering inom rekrytering.

Nyckelord:

Automatisering, Rekrytering, Semantisk matchning, Serviceindustri, Hotell, Restaurang

(6)

(7)

Acknowledgments | v

Acknowledgments

I would like to thank:

Professor Gerald Q. Maguire Jr. for providing valuable input and advice.

Emil Karlsson at Cheffle Handelsbolag for offering and supervising this Bachelor’s thesis project.

A special thanks to Megan Henry for useful discussion and suggestions.

Stockholm, June 2017 Oscar Melin

(8)

(9)

Table of contents | vii

Abstract ... i

Keywords: ... i

Automation, Recruitment, Semantic Matching, Service industry, Hotel, Restaurant ... i

Sammanfattning ... iii

Nyckelord: ... iii

Acknowledgments ... v

Table of contents ... vii

List of Figures ... ix

List of Tables ... xi

List of acronyms and abbreviations ... xiii

1 Introduction ... 1

1.1 Background ... 1

1.2 Problem ... 3

1.2.1 Preprocessing data from proposed candidates and positions ... 3

1.2.2 Weighting the data with individual preferences ... 3

1.2.3 Matching the weighted and preprocessed data ... 4

1.2.4 Summary of problem ... 4

1.3 Purpose ... 4

1.4 Goals ... 4

1.5 Research Methodology ... 5

1.6 Delimitations ... 5

1.7 Structure of the thesis ... 5

2 Background ... 7

2.1 Recruitment ... 7

2.1.1 What does a general recruitment process look like?... 7

2.2 Matching two sets ... 9

2.2.1 Matching skills and properties of people ... 10

2.3 Semantic Matching ... 11

2.3.1 Structural Overview of this project ... 12

2.3.2 Semantic Matching of Conceptual Graphs ... 13

2.4 Implementation ... 15

2.4.1 Finding shortest path in a graph. ... 15

3 Methodology ... 17

3.1 Research Process ... 17

3.1.1 Phase 1: Information gathering and Literature study phase ... 17

3.1.2 Phase 2: Developing the application ... 17

(10)

viii | Table of contents

3.2 Data Collection ... 18

3.2.1 Sampling ... 18

3.2.2 Sample Size ... 19

3.2.3 Target population ... 19

3.3 Experimental design/Planned Measurements ... 20

3.3.1 Test Environment ... 20

3.3.2 Software and data structures to be used ... 20

3.4 Assessing reliability and validity of the data collected ... 20

3.4.1 Reliability ... 21

3.4.2 Validity ... 21

4 The Application ... 23

4.1 Design ... 23

4.1.1 Python ... 23

4.1.2 Description of the application ... 23

4.2 Building the ontology ... 25

4.2.1 Fetching ontological elements ... 26

4.2.2 Structuring the retrieved data ... 27

4.3 Testing ... 28

4.4 Functionality and Implementation ... 29

4.4.1 Functionality ... 29

4.4.2 Implementation ... 31

5 Results and Analysis ... 33

5.1 Major results ... 33

5.1.1 Ability to access additional knowledge ... 33

5.1.2 User tests ... 33

5.2 Reliability Analysis ... 34

5.3 Validity Analysis ... 35

5.4 Discussion ... 35

6 Conclusions and Future work ... 37

6.1 Conclusions ... 37

6.2 Limitations ... 37

6.3 Future work ... 38

6.4 Reflections ... 38

(11)

List of Figures | ix

List of Figures

Figure 1-1: Overview of problem ... 3

Figure 2-1: The vacancy identification and publishing step. ... 7

Figure 2-2: The screening of applications step. ... 8

Figure 2-3: Describes the interviews and background checks step. ... 9

Figure 2-4: Euclidian distance between polyhedral A and B ... 10

Figure 2-5: The connection between “French cuisine” and “grilling” in DISCO ...11

Figure 2-6: Overview of the project structure ... 13

Figure 2-7: Segment of a possible CG with corresponding milestone values... 14

Figure 3-1: Rough timeline of the thesis project. (Figure appears here courtesy of G. Q. Maguire Jr.) ... 18

Figure 3-2: Main parts of the application ... 20

Figure 4-1: Node class and its attributes ... 24

Figure 4-2: Graph class and its attributes ... 24

Figure 4-3: Main class and its attributes ... 25

Figure 4-4: Adding skills to a job profile or job ad on Cheffle. ... 27

Figure 4-5: Overview of the ontology used in the application, showing 10 out 109 total elements ... 28

(12)

(13)

List of Tables | xi

List of Tables

Table 2-1: Examples of terminology match versus semantic match ... 12 Table 4-1: Most frequent skills ... 26 Table 4-2: Example of an Education or Experience entry in a job

profile ... 27 Table 4-3: Comparison of the different resulting values between

matching methods. Simple matching refers to the example given in section 2.2 ... 29 Table 4-4: Comparison of the different resulting values between

matching methods. Simple matching refers to the example given in Section 2.2 ... 30 Table 4-5: Comparison between unweighted and weighted for 2x

“Japanese Food” in the Job set of performance metrics

from the previous example. ... 30 Table 4-6: Shows an example of the non-commutativity in matching .... 31 Table 4-7: Result of requesting the top 5 matches to

{"restaurangchef", "hovmästare", "bordsservering",

"vinkunskap"} of all users in the Cheffle database. ... 32 Table 5-1: Shows the relationship of similarity span (similarity of

correct answers with posed performance metrics) to frequency of agreement (between test participant and

(14)

(15)

List of acronyms and abbreviations | xiii

List of acronyms and abbreviations

API Application Programming Interface

BFS Breadth First Search

CG Conceptual Graph

DISCO European Dictionary of Skills and Competences GUI Graphical User Interface

ICT Information and Communication Technology

(16)

(17)

Introduction | 1

1 Introduction

This Bachelor’s thesis project was conducted during the spring of 2017 at Cheffle Handelsbolag.

This chapter gives an introduction to what the company’s recruiting process usually looks like today and the problems within it. These problems will be addressed in this thesis. Additionally, the chapter includes a problem statement, the purpose of this thesis, and the methods that have been used. Chapter 2 gives more details concerning the specific sub area of matching jobs with candidates and vice versa.

1.1 Background

Selecting the right candidate for a job is a challenge. Selecting the right candidate from a massive pool of applicants can be an overwhelming task. Large corporations or recruiting agencies typically receive a very large number of applications [1] and even small businesses can receive hundreds of applications per job listing. In some countries such as Sweden, an unemployed person is required to apply for a certain number of jobs per month in order to be eligible for unemployment benefits [2]. For small businesses recruiting for low-skill positions, this may cause undue and heavy strain on their limited human resources, as sifting through so many applications is costly, error prone, tedious, and often simply overwhelming.

The screening of applications and resumes is followed by interviews, background checks, and offers of employment. However, a large fraction of applicants who pass all the aforementioned steps may nonetheless decline offers, further increasing recruitment costs [3]. Equally pressing, and significantly more financially burdensome; an employee may leave the company shortly after recruitment, necessitating a doubled investment in the hiring process. These potential issues create a demand for an effective screening process: one that selects only the most promising and suitable individuals for interviews and thus helps minimize employee churn rates.

Even organizations with sufficient human resources to do high capacity processing of candidates face setbacks due to psychological bias and general human error. Furthermore, discrimination stemming from inferred ethnicity and tendencies towards homogenous hiring can decrease the likelihood of an interview for qualified applicants [3, 4]. Narrow bracketing or the assessment of individual subsets in isolation can lead to results that differ from objective consideration of the whole set [5]. For instance, after giving five consecutive applicants high ratings, an interviewer might be reluctant to do the same for the sixth.

The significant costs associated with recruiting new talent require precision, accuracy, and neutrality from an organization's hiring sector. Automation in hiring intends to better address these demands and is becoming a common practice throughout larger companies with the help of internet based recruitment and

(18)

Introduction | 2

pre-selection of candidates [6]. In practice, this process entails matching available candidate information with an employer's requirements while weighing individual preferences from both sides in order to calculate a compatibility rating between both parties. Automated solutions certainly have the potential to minimize the risks of human biases when screening candidates.

The restaurant and hotel industry addressed in this thesis project, specifically this industry sector in Sweden, has a haphazard set of recruiting methods. Unlike large corporations, restaurants can not afford to hire dedicated recruiters. At some of the largest hotel chains in the country, recruiting is done by receptionists or department managers - employees whose primary focus has little to do with headhunting*_{. These employees lack fundamental knowledge about recruitment} that is necessary for bringing on the best staff. Moreover, the primary medium used to find jobs and job seekers in the service industry is Facebook, with some additional recruiting done on popular job portals such as Monster†_{, Blocketjobb}‡_, and Platsbanken§_{. As discussed in [7, 8], the traditional recruitment process} (discussed further in Section 2.1) has several shortcomings with regard to discovering and assessing candidates through social media, especially when it comes to candidates who lack formal education or experience. Furthermore, these information islands are isolated from one another, therefore obscuring comparisons between relevant positions. The complex infrastructure of this industry requires a place where both recruiter and job seeker can access a standardized overview of the entire labor market.

Cheffle seeks to provide this meeting point between job seekers and employers in the hotel industry. It functions as both job portal and hands on recruiting agency. The combination of these two functions means that by manually screening the applications and job ads via the job portal, suitable applicants can be recommended to the advertising employers. Access to an industry wide meeting point provides a broad spectrum of job seekers and the open positions represented on Cheffle offers ample opportunity for improved hiring practices through automation.

This thesis seeks to minimize inefficiencies and avoid the errors associated with the existing manual recruitment screening process by addressing two main issues: the rate at which applicants can be screened and the quality of the resulting matches.

*_{Claims about hiring practices within the restaurant industry were asserted in interview with}

Cheffle in March of 2017.

†_{https://www.monster.se/} ‡_{https://jobb.blocket.se/}

(19)

1 In c th 1 jo F 1. T ru q d m o p 1. E p fo u D m e 1.2 Probl n order to candidates hree priori -1 shows t obs). Figure 1-1 .2.1 Prep The input d ules. Dubi quality mu discussed in missing val of an autom positions co .2.2 Weig Employers performanc or prefere user's indiv Depending may be mo education o lem o create an with open ities: prepr the overal 1: Overvie processing data must ious value ust each b n [9], user lues or sma mated sys ould poten ghting the will be ab ce metrics ences in ed vidual prefe on the pe ore releva or vice vers n effective n positions rocessing, w ll process ew of pro g data from be structu s, impossi e addresse rs often lea all data set stem. Insu ntially gene data with i le to clearl they most ducation, erences to rformance ant; e.g. a sa for a giv system th s, a recruit weighting, of matchi oblem m proposed c ured and r ible data c ed before ave some f ts impose s ufficient in erate lower ndividual p ly define t t value. Th skills, and the availab e metric a recruiter ven position at pairs el tment syst , and candi ng candid candidates a represente combinatio data can fields emp significant formation r quality an preferences their desira ese metric d experien ble data an recruiter i might va n. ligible and em must a idate/posit dates with and position d accordin ons and ge be subjec pty in an on constraint on applic nd/or unrel able candid cs could inc nce. The a nd weights s looking a lue years d interested address th tion match open vaca ns ng to a gen eneral issu ct to analy nline form ts on the fu cants or e eliable matc date by pr clude diffe algorithm s the data a at, differen of experie Introduction | d employe he followin hing. Figur ancies (i.e neral set o ues of dat ysis [8]. A m. However unctionalit mploymen ches. roviding th ering value applies th accordingly nt variable ence abov | 3 ee ng re e., of ta As r, ty nt he es he y. es ve

(20)

Introduction | 4

1.2.3 Matching the weighted and preprocessed data

With resources made available through the previous steps, it is possible to pair the best potential combinations of job seekers and open positions. Matching can be done either through manual analysis and reasoning or through a predefined automated algorithm.

1.2.4 Summary of problem

Letting the three aforementioned steps define the candidate/job matching process, this project attempts to address the following question: how can the automation process improve recruiting in terms of efficiency and quality?

1.3 Purpose

The purpose of this thesis and degree project is to research and develop a practical solution for job and candidate matching. This thesis seeks to provide job seekers with better opportunities for finding their most desired job, and to provide recruiters with a tool to assist them in finding better candidates for the positions they are aiming to fill.

This task cannot be accomplished without handling sensitive information, both personal and corporate. Dealing with confidential data always comes with a moral responsibility to neither purposefully nor mistakenly mishandle information.

Additionally, there is the issue of possible bad matches. A user will put a certain level of trust in the system will deliver on what it promises; hence, if the proposed solution produces incorrect job suggestions, it could lead to users missing out on a job they really want.

1.4 Goals

The goal of this project is to produce and analyze an automated job and candidate matching tool for Cheffle, in order to improve on their current manual job-candidate screening process. The following three sub-goals will define the direction, pace, and strategy for this project:

1. Understand the general practical needs of a job and candidate matching tool and the specific needs of Cheffle and its customers,

2. Translate these requirements into a working prototype using the available resources provided by Cheffle, and

3. Produce a result that satisfies the degree project requirements at KTH [10], Cheffle, and myself.

(21)

Introduction | 5

1.5 Research Methodology

The research methodologies used in this thesis project include:

• A literature study to gain the necessary fundamental understanding of the relevant topic areas and

• Iterative and continuous development, testing, and evaluation of the application.

1.6 Delimitations

This thesis is limited to matching sets of performance metrics in order to rank matches between jobs and candidates and does not include an implementation of data mining or parsing of resumes.

1.7 Structure of the thesis

Chapter 2 presents relevant background information about the context of which the solution is to be implemented as well as different theoretical matching concepts. Chapter 3 presents the methodology and method used to solve the problem. Chapter 4 presents the development, functionality, and implementation of the application. Chapter 5 presents results and analysis. Finally, chapter 6 presents conclusions and future work.

(22)

(23)

2

T th s 2 T m S 2 o 2 th p p a s h in 2 T F c p F th c

2 Backg

This chapte hat could emantic m 2.1 Recru This section might look Subsection 2.1.1.2 desc outlines how 2.1.1 What Recruiti hree gener published. position. Fi applicants [ creening p hence they A basic nitial steps 2.1.1.1 Vac This first s Figure 2-1) candidate f plan to hire Figure 2-1 A well-c he recruit candidate.

ground

er provides be used f matching. uitment n provides k like. Sub 2.1.1.1 de cribes the s w interview t does a gen ing a new ral phases Next, scre inally, inte [11]. Altho process (st are briefly outline of s. Each of t cancy iden step lays f ) the goals for this po e the neces 1: The vac constructed ting entity s basic bac for job-can s a general bsection 2 escribes th screening ws and bac neral recruit employee s. First, de eening narr erviews an ough the s tep two), t y described the recruit these is des ntification a foundation s of hiring osition are sary emplo cancy ide d plan pair y with the ckground in ndidate m l descripti 2.1.1 gives he pre-rec and select ckground c tment proce is a multi emand for rows the a nd backgro scope of th the first a d. tment and scribed in and publish n for the e g an emplo identified oyee(s). entificatio red with a e tools n nformation matching. T on of wha a overvie cruiting ph tion proces checks of c

ess look like i-step proc r a position applicant p ound check his projec and third s d selection a subsectio hing entire proc oyee and t d. A recruit on and pu accurately e ecessary t n about rec The main at a typical ew of the hase and p ss and fina andidates e? cess that ca n is recog pool to can ks are con t is center steps prov hiring pro on below. cedure. In the compet ting body ublishing evaluated n to pinpoin cruiting an method d l recruitme process a planning, ally Subsec are done. an be sum gnized and ndidates th nducted wi red on th vide valuab ocess consi n this step etencies req develops a g step. needs shou nt the be Background | nd method described i ent proces as a whole Subsectio ction 2.1.1. mmarized i d the job i hat suit th ith suitabl e applican ble context ists of thre (shown i quired of a recruitin uld provid st possibl 7 ds is ss e, n 3 n is he le nt t, ee n a ng de le

(24)

th b w a 2 A th in F a sp e w su re c c c 2 T M re h ti m st Addition he most su brands and will be nob applicants. 2.1.1.2 Scr After the jo heir applic nterviewee Figure 2-2 In manu assemblage pecialist, a entity is to were determ ubjects, it elevant su crucial requ As appl comparing can product 2.1.1.3 Inte The intervi Meeting an ecruitmen heart a test ime and e minimize e teps must nally, in or uitable emp d effectively body to hi reening of a ob listing cations, a s es. This pro

2: The scr ual applica e of people and a poten evaluate c mined in t is importa ubjects. In uirements lications t one set of tively lend erviews an ew is argu n applican t decision t of an app energy from excessive o be accurat rder to attr ployee for y markets re if the c application has been p creening e ocess is sho reening o ation revie e that toge ntial future candidates the previo ant to hav addition t such as tea tend to be f known st d itself to au nd backgro uably the m nt face to [1]. This p plicant's tec m the com or wasted i

tely and car

ract the mo their team themselve company i ns published entity must own in Fig of applica ews, the s ether poss e team mem s against a us step. A ve a screen to evaluati am/compa e very sim tructure (a utomation ound checks most impo face is an rocess is sh chnical ski mpany to s investment refully con ost talented m, it is cruc es to their p s not able and possi t review th gure 2-2. ations step creening e sess the p mber for th set of req As these sk ning entity ing the ap any chemis milar in th application . s rtant part n indispen hown in Fi ills. It requ secure a c t in unqua nducted. d employe cial that th potential c e to reach ible candid em to prod p. entity pref roficiency he applican quired skill kills often that is we pplicant’s t stry need to heir struc ns) with an of the app nsable com igure 2-3. T uires an ext close look alified appl ee for the p he employe candidates and attrac dates have duce a list ferably con of a man nt [11]. The ls and prop overflow i ell informe technical s o be factor cture, the nother (job plication p mponent o The interv xtensive inv at possibl plicants, pr Background | position an er positivel [12]. Ther ct qualifie e submitte of potentia nsists of a nager, a job e job of thi perties tha into relate ed about al skills, othe ed in. process o b positions process [11] of the fina view is at it vestment o le hires. T re-interview | 8 d ly re d d al n b is at d ll er of s) ]. al ts of To w

(25)

F 2 d th th in J A A A A T si si D si D si F th a n v sp e d u st d Figure 2-3 2.2 Matc Matchin done in sev hen scored hat the app

si

The follo n order to f Job = {Wok Applicant1 Applicant2 Applicant3 Applicant4 The similar imilarity(J imilarity(J Desserts} = imilarity(J Desserts} = imilarity(J Food, Woki The resu hat this m applicants. needed in o value. Other p pace wher ecological a diversities b useful [15]. tatistical d difficult. Th 3: Describ ching two ng two sets veral ways. d for simil plicant pos

imilarity

owing exam find the be king, Japa = {Pizza, I = {Wokin = {Wine, J = {Sushi, J rity values b Job, Applic Job, Applic = 2/3 Job, Applic = 1/3 Job, Applic ing} = 2/3 ult is that method o As the go order to av possible m e a Euclide application between tw However data sets a he set elem bes the in sets s or determ . Most sim arity by ca ssesses [13]

y (Job, Ap

mple comp est match. nese Food, Italian Foo g, Chinese Japanese F Japanese F between th cant1) = {D cant2) = {W cant3) = {J cant4) = {J several ap f matchin oal is to f void the pr ethods inc ean distanc ns, statisti wo sets, nam r, people's nd the gra ments that nterviews mining an mply, the el alculating ]. Formally

pplicant)

putes the si Consider t d, Desserts} od, Desser e Food, Des Food, Chin Food, Wok he job and Desserts} / Woking, De Japanese F Japanese F pplicants h ng is insu find the be roblem of clude repr ce [14] cou ical coeffic mely Søren s skills an adation of define ski s and back n arbitrary ements in the numbe y expressed

= |Job

∩

imilarities that we hav } ts} sserts} nese Food} king} the applic {Woking, esserts} / { Food} / {W Food, Desse have the s ufficient to est possibl two or mo resenting s uld be deriv cients use nsen-Dice nd abilities these skill ills and ab kground distance b each set m er of skills d as:

Applican

between a ve one job } ants are th Japanese F {Woking, J oking, Jap erts} / {Wo same simil o distingu le match, ore applica sets in som

ved (as sho d to comp or Jaccard s can be ls can mak bilities mig checks st between th may be com s required

nt

| / |Job

a job and a with four a hen: Food, Dess Japanese F panese Foo oking, Japa larity valu uishing be more info ants scorin me hyperd own in Figu pare simil d indices, h more com ke compar ght imply i Background | tep. hem can b mpared an for the job

b|

pplicants applicants serts} = 1/3 Food, od, anese ue, meanin etween th ormation i ng the sam dimensiona ure 2-4). I larities an have proven mplex than risons mor nformatio | 9 be d b : 3 ng he is me al n d n n re n

(26)

c th c s e p e a re c m p F 2 T a R E O o th d il te g *_h crucial to th he data co carefully ca et. For in experienced probably b experience also be a elationship component must be car potentially Figure 2-4 2.2.1 Matc The extrapo a need fo Resources t European Occupation of Occupati heir relatio determine h llustrates a erms “fran grilling). http://disco he success onfiguratio alibrated to nstance, if d in only t better suit in only C+ skilled p ps betwee t of theate refully con great matc 4: Euclidi ching skills a olation and or clear a to help def Dictionar nal Classifi ion [18]. T ons to one how precis an exampl nsk matlag -tools.eu/dis s of a matc on (i.e., le o address a reviewin theater, the ted for th ++. This is public spea n skills st er. These g nsidered w ch. ian distan and propert d fuzzy me and define fine schem ry of Sk cation Sys These taxon another. R sely they c e of the ap gning” (En sco2_portal/ ching syste eft out of all possib ng body w ey could r he role of not to say aker, it is temmed f grey area r when match nce betwe ties of peop eaning of te ed relatio mes of class kills and stem [17], a nomies cla Relationsh correlate. F pplication nglish: Fr /terms.php em, but co the data). ble semant were to an ather safel f “public y that some s an assu from the f relations b hing skills een polyh ple erms discu onships be sifications Competen and Intern assify occup hips betwee For the pur of the Sw ench cuisi uld be lost . Therefor ic nuances nalyze the ly assume speaker” eone well v umption b fact that p between do and peopl hedral A a ussed abov etween co already exi nces (DIS national St pational ca en these te rposes of t wedish vers ine) and “ t if not pre re processe s within a resume o that this a than som versed in C based on public spe domains of le to avoid and B ve asserts t omparable xist; among SCO) [16], tandard Cl ategories a erms are th this thesis, sion of DIS “grillning” Background | 1 edefined b es must b given dat of someon applicant i meone with C++ canno predefine eaking is f propertie d ignoring that there i e elements g them: Th Standar lassificatio and identif hen used t , Figure 2-SCO*_{to th} (En glish 10 by be ta ne is h ot d a es a is s. he d n fy to 5 he h:

(27)

a F 2 S u re e m S m si s sy ta m e th m *_h Conclus applicant's Figure 2-5 2.3 Sema Semantic m using prede epresented elements, b matched to Suppose tw matching c imilar to e emanticall ynonymou axonomies matches, w expensive” he major s match betw http://disco sions are d probability 5: The con DISCO* antic Mat matching is efined and d as a grap but rather o yield a wo partially compares each other ly equivale us in the s and onto whereas qu from Tabl semantic d ween the tw -tools.eu/dis drawn fro y of succes nnection tching s a techniq d controlle h [19]. To in related more acc y ordered s and ident r [20]. Wh ent terms context ologies, sem ueries such le 2-1 will ifference b wo queries. sco2_portal/ om the rel ssfully mat between que used to ed vocabula better deri d and/or im curate dep sets or grap tifies node hen applie might be of the ap mantically h as “Why not, althou between “fl . /terms.php levancy be tching a cer n “French o identify r aries, assu ive inform mplied da piction of ph like data es in both ed to matc e “maître pplication. y similar te y is flying ugh three lying” and etween ter rtain job. cuisine” relationshi uming all d ation not e ta, inform f an appli a structure h graphs t ching cand d” and “w In accor erms will b expensive out of four “gold” sign rms to det and “gril ips betwee data can g explicitly s mation is se icant's qua es are given that are se didates and waiter” - rdance wi be evaluat e” and “W r words ar nificantly n Background | 1 termine a lling” in en data set generally b tated in th emanticall alifications n; semanti emanticall d jobs, tw if they ar ith applie ted as goo Why is gol re identica negates an 11 n ts be he ly s. ic ly wo re d d d l, ny

(28)

Background | 12

Table 2-1: Examples of terminology match versus semantic match Term

match

Semantic match

Chef Stockholm Chefs Stockholm partial yes

pool billiards no yes

Why is flying expensive Why is gold expensive partial no Java C# no partial

2.3.1 Structural Overview of this project

The basic outline for this project is illustrated in Figure 2-6. The blue shaded boxes describe the semantic matching process, the white boxes describe pre-existing data, and the green boxes represent the anticipated product.

As depicted in Figure 2-6, the semantic matching process utilized for this project will incorporate both concept taxonomy and relational hierarchy in order to better analyze and address the lexicon variability of users. The concept taxonomy acts as a thesaurus and provides the possibility to match words with similar meanings. The purpose of a relational hierarchy is to provide a structure describing the similarity between terms. When both are combined in a conceptual graph (CG), distance between two concepts in the CG is a measure of the similarity between them.

(29)

F 2 T b d b v w n m w b Figure 2-6 2.3.2 Sema The approa between tw distance be between c1a Addition Where k value decre where l(roo nodes at th milestone v Due to t will go thro be calculate 6: Overvie antic Match ach used in wo concept etween the and c2 is de nally, every k is a prede eases along ot) = 0. Th he deepest value indica

the fact tha ough their ed by their

d

c ew of the hing of Conc n this proje ts, c1 and em (denote efined as:

sim

c

(c

y node in a

miles

efined fact g the hiera he numera level with ates degree at the shor common p milestone

d

c

(c

1

, c

(c, c

cp

)=m

e project s ceptual Gra ect is based c2, in a co ed as dc(c1

c

1

, c

2

) = 1

a CG is assi

stone(n)

tor (such a archy and ator is set t the root a e of differe rtest path b parent, ccp es and their

c

2

) = d

c

(c

1

milestone

structure phs d on ideas onceptual 1, c2)). Usi

1 – d

c

(c

1

, c

igned a mi

= (1 / 2) /

as k>1) that l(n) is the to ½ so th as their clo entiation as between tw , the dista r ccp as foll 1

, c

cp

) + d

c

e(c

cp

) - mi

from [1, 2 graph [23] ing this di

c

2

) [1]

lestone val

/ k

l(n) t indicates e depth of hat dc(c1, c2 osest comm s one proce wo nodes in nce betwee ows: c

(c

2

,c

cp

)

ilestone(c

21, 22]. Th ] is derive istance, the lue: s the rate a f node n in 2) = 1, if c1 mon parent eeds down n a hierarc en the two

c).

Background | 1 he similarit ed from th e similarit at which th n the graph 1 and c2 ar t ( ccp). Th n the graph chical graph o nodes wil 13 ty he ty he h re he h. h ll

(30)

b c sp “s th “C F d th th u s This mo between hi concepts. T pecialized siblings” i he distance For exa Cold Smok Figure 2-7 By first distance be The sim This sim he user, t hreshold w user may d imc(c1, c2) odel is de igher level That is, ge ones. Thi s larger th e is calcula mple, to f king”, begin 7: Segmen values identifying etween the

d

c

(c

1 milarity betw milarity can then only will appear dynamically to increase esigned un concepts eneralized is model a han the dif ated throug find the di n with the nt of a po g their clos two concep

c

1

, c

2

) = d

c

= (1/16

ween the tw

sim

c

(c

1

,

n be comp results w r. If too ma y adjust th e match pr nder the a is greater concepts also implie fference be gh their clo istance bet ontology s ossible CG sest comm pts can be

c

1

= Pastr

c

2

= Cold

c

(c

1

, Cook

6 - 1/32)

= 0.0

wo concep

c

2

) = 1 -

pared to a with a sim any or too he value of recision or assumption r than the will diffe es that th etween “pa osest comm tween the shown in F G with cor on parent, calculated

ry Baking

Smoking

king) + d

c

+ (1/16

-0625

pts is then:

0.0625 =

reference mc(c1, c2) v few result f their thre lowering i n that the difference er more fr e semanti arent” and mon parent two skills Figure 2-7. rrespond “Cooking” d as follows

g

c

(c

2

,Cooki

- 1/32)

= 0.9375

similarity value abov ts are pres eshold. For t. e semantic e between rom each ic differen d “child” c t [21]. “Pastry B ding miles ”, and usin s:

ing)

threshold ve the use sented (i.e. r example, Background | 1 c differenc lower leve other than ce betwee concepts, a Baking” an stone ng k = 2, th defined b er’s define .,), then th , increasin 14 ce el n n as d he by d he ng

(31)

Background | 15

2.4 Implementation

This subsection describes the methods and technologies needed to implement the previously described concepts.

2.4.1 Finding shortest path in a graph.

In order to find the distance between two concepts according to the formula given in Section 2.3.2, the shortest path between them must first be established.

Dijkstra’s algorithm [24] is probably the first solution that comes to mind when trying to find the shortest path between nodes in a graph. It finds the shortest path in a weighted graph (containing only positive edges) between a given node and every other node by extending the best path found so far. Dijkstra’s algorithm works even for unweighted graphs, such as the one used in this thesis project, however it will not be the most efficient solution. Even when all edge weights are identical, the algorithm will spend time unnecessary looking for alternative paths throughout the entire graph.

To take advantage of the fact that the graph is unweighted, we use the Breadth-First Search (BFS) method*_{as it produces a more efficient solution. BFS} traverses the graph breadth wise from the source node and when first coming to any node v, it will have done so by the guaranteed shortest path, i.e., the path with the lowest number of edges between the source node and v. As Dijkstra’s algorithm has the time complexity of O(V2_{) and BFS has O(E + V log V) [25], hence BFS is} theoretically the best solution for this application.

(32)

(33)

Methodology | 17

3 Methodology

The purpose of this chapter is to provide an overview of the research method used in this thesis. Section 3.1 describes the research process. Section 3.2 focuses on the data collection techniques used. Section 3.3 describes the experimental design. Section 3.4 explains the techniques used to evaluate the reliability and validity of the data collected.

3.1 Research Process

The research process was divided into three phases. Phase 1 represents the pre-study phase, whereas Phases 2-3 concerned developing and evaluating the application. Figure 3-1 shows the general timeline of this thesis project.

3.1.1 Phase 1: Information gathering and Literature study phase

The first two weeks were to be spent preparing for the practical and analytical parts of the project. This included a literature review of theoretical and practical contributions within those topics closely related to this project in order to discover where there was room for improvement and to develop a theoretical foundation for the following phases.

3.1.2 Phase 2: Developing the application This phase is divided into three sub phases:

1. Plan the development process and research technologies to be used when developing the application.

2. Build the application piece by piece in conjunction with weekly meetings with Cheffle to ensure that development proceeds in a direction that satisfies all parties.

3. Test, analyze, and refactor each piece of code to ensure functionality and scalability.

3.1.3 Phase 3: Evaluation and Analysis

In this phase, the entire application is evaluated and analyzed to determine how well it performs and to find any missing functionality. A performance evaluation was conducted regularly together with Cheffle to determine if any additions or improvements needed to be made.

(34)

F 3 D o a o ta in a 3 T d d 3 T p c in *_h Figure 3-1: 3.2 Data C Data collect of the avai analyzed fo ontological axonomies ndustry in application 3.2.1 Samp Two differe describes g describes ga 3.2.1.1 Gat The relation performanc created as a n order to https://peop : Rough t courtes Collection tion for th ilable rele or the purp elements s make up nsiders on n’s perform pling ent kinds gathering athering fe thering da nal hierarc ce metrics accurately create a se ple.kth.se/~m timeline o sy of G. Q. n his project vant data pose of adju from the the struct the final mance will b of samplin data for eedback to ta for the r chy/ontolo are calcula as possibl et of elemen maguire/b-ex of the thes . Maguire was done from reg usting the e user dat ture of the applicatio be evaluate ng will be the relat aide in the relational h ogy is the fo ated. Cons le. To achi nts that po xjobb-time-l sis project Jr.) during tw gistered Ch taxonomie ta as well e main ont on will pr ed. done for ional hier e final eval hierarchy o oundation sequently, ieve this, d opulate the line-2007090 t*_{. (Figure} o different heffle user es mention l as relev ology. Sec ovide data this projec rarchy and luation. of terms on which it is crucia data was co e ontology. 06a.gif appears h t periods. I rs was col ned in Cha vant eleme condly, feed a on whic ct. Subsec d Subsect all matchi al that the ollected an Methodology | 1 here Initially, al llected an apter 2. Th ents in th dback from ch the fina ction 3.2.1. tion 3.2.1. ng betwee ontology i nd analyze 18 ll d he he m al .1 2 n is d

(35)

Methodology | 19

Three methods were to be used to identify which terms the ontology would require:

1. Frequency and relevancy of the term’s usage in job ads and job profiles

2. Frequency and relevancy of the term’s usage in international skills and competences classifications.

3. Analysis of user needs

Frequency and relevancy of a term’s usage in job ads and job profiles will be determined through sampling of Cheffle user profile data. The user profile database contains 4 columns of data relevant to this task. They are:

Education A list of schools the candidate has attended and a list of any specific qualifications.

Work experience

Title The candidate’s work title, e.g. “Chef” or “Waitress”

Skills Any combination of 74 predefined skills selected when creating a profile, e.g. “Sushi making” and “Pastry baking”

Due to the fact that only one of the 4 columns in each user row is formatted according to a controlled vocabulary, while the rest are free text, problems will occur when parsing and analyzing this data. Lexical differences between “Bartender/Waitress”, “bartender and waitress”, and “Hi, I am someone who bartends” need to be taken into account for a practical solution. As the scale of this project does not encompass dealing with the full range of possible nuances, within the project I have only used those entries containing clear and readable data; this is believed to be sufficient to demonstrate the relevant concepts.

3.2.1.2 Gathering feedback from industry insiders

The performance of the finished application will be evaluated based upon feedback from Cheffle employees.

3.2.2 Sample Size

The sample size for the data described in Section 3.2.1.1 is 400 users with registered job profiles. As for the evaluation of the project (Section 3.2.1.2), a minimal number of people will be ask to provide feedback – sufficient to provide some feedback for this proof of concept prototype.

3.2.3 Target population

The target population is Cheffle’s registered users. Due to ethical and legal responsibilities and restrictions, such as PUL [26] and the Cheffle terms of service*_, all of the user data is stripped of personal information and does not contain the applicant’s name, sex, or age.

(36)

3 T a th u in 3 A c G it 3 T re p s th d li a 3 T e 3.3 Exper This thesis and tested w he test env used. Finall n the imple 3.3.1 Test All softwar compatible GNU/Linux t is the only 3.3.2 Softw This thesis epresentin The con performanc eparate fil he relation down the o imiting the The adv are: • Vi in vi • Th en 3.4 Asses This section explains rel Figure 3-2 rimental d project is e with real w vironment, ly, Section ementation Environme re and da , hence th x platform y platform ware and da project w ng any data ncept taxon ce metrics e. Each ele nal hierarc ntology. Th e need for r vantages of isual repre ntuitive to sual aids s here is a w ncoding an ssing relia n explains liability an 2: Main p design/Pla evaluated t world data , Section 3 n 3.3.3 desc n. ent ata models hey should s. Neverth m on which ata structure

ill use Pyt a. The main nomy is the relate to e ement in th chy, and a he relation redundant f using JSO esentation. edit, even uch as json well-docum nd decoding ability and the reliab nd Section 3 parts of the anned Mea through a in conjun .3.2 descri cribes the s used for d work on heless, usin this projec es to be use thon 2.7 [2 n parts of t e main stru each other he ontolog a list of JS nal hierarch t elements ON for rep . The struc n for very nviewer [2 mented buil g of JSON. d validity o bility and v 3.4.2 expla e applicat asurement compariso ction with ibes the sof way in wh r this thes Microsoft ng Ubuntu ct has been ed 27] for the the applica ucture des r and is re gy will cont SON objec hy acts as with the sa presenting cture of JS y large file 9]. lt-in librar . of the data validity of t ains validit tion ts on with oth Cheffle. S ftware and hich python sis project ’s Window 16.04 Linu n extensive applicatio ation are sh scribing ho presented tain a title cts contain a thesauru ame meani the relatio SON make es, especia y [30] in P a collected the collecte ty. her matchin Section 3.3 d data struc n and JSO t are cros ws, Apple’s ux is recom ely tested. on and JSO hown in Fi ow differen in JSON f e, a link to ning eleme us with the ing in the o onships bet es it easy t ally when Python for d ed data. Se Methodology | 2 ng method .1 describe ctures to b ON are use ss platform s OSX, an mmended a ON [28] fo gure 3-2. nt skills an format in the term in ents furthe e purpose o ontology. tween skill to view an assisted b supportin ection 3.4. 20 ds es be d m d as or d a n er of ls d by ng .1

(37)

Methodology | 21

3.4.1 Reliability

The information gathered from Cheffle’s employees is mostly based on their professional experience and understanding of skills and competences relevant to the hotel and restaurant industry. However, only a small number of people participated in the evaluation of the final product. This means that the feedback given is dependent upon the personal experience of each individual and that the reliability of this feedback is based on qualitative sources and does not in and of itself provide quantitative results.

3.4.2 Validity

The information gathered from Cheffle’s employees is presumed valid as it is based on their personal feedback regarding the potential impact of the prototype produced in this thesis project on their work, in which they possess extensive experience and knowledge.

(38)

(39)

The Application | 23

4 The Application

The purpose of this chapter is to describe the application’s development process and the final prototype as well as its functionality. Section 4.1 describes the design and development of the application. Section 4.2 describes the process of creating the ontology. Finally, Section 4.3 describes possible functionality and discusses different implementations of the application.

4.1 Design

This section describes the design process and design decisions made during the development of the prototype.

4.1.1 Python

Python was chosen as the programming language for several reasons. First and foremost, Python is platform independent; hence it can run on a wide variety of platforms [31] including Microsoft’s Windows, Apple’s Mac OS X, and Linux operating systems, such as the Debian based Ubuntu. Secondly, this project was written to be implemented behind a web interface which runs a Python-Flask [32] backend. These reasons, especially the ability of simple integration with a Python written web framework, motivated the choice of Python as the programming language for the entire project.

4.1.2 Description of the application

This subsection describes the different modules that make up the application and what they do. The application consists of three modules: graph.py, node.py, and main.py.

4.1.2.1 Node module

The Node module represents an element in the ontology and contains 4 different attributes as depicted in Figure 4-1.

The “title” attribute describes the skill or performance metric that each individual node represents and additionally works as a unique identifier for the object. The purpose of the “meta” field is to hold metadata used when pairing a node with a search term. As there will not be enough nodes to cover every possible search term, information in the metadata field describes each element in more depth than is possible using only a single “title” field.

(40)

F 4 T th v a n c su in F Figure 4-1: 4.1.2.2 Gra The purpos he form of value pairs and the va nodes that correctly. In upplies sev n Figure 4-Figure 4-2: : Node cl aph module se of the “G f a Python , where all alues corre lead to cy n addition veral class -2. : Graph c lass and i e and calcu Graph“ mo dictionary l keys mus espond to cles or dup n to realizin methods f class and its attribu ulating dis odule is to y of sets, i st be uniqu each nod plicates ex ng a graph for mainta its attribu utes tance betw o realize a .e., an asso ue. Keys re de's childr xist to ensu h data stru aining and utes ween terms custom gr ociative ar epresent ea en. Safegu ure that th cture, the analyzing T s raph data s rray made ach node in uards agai he graph is “Graph” m the graph The Application | 2 structure i up of key n the graph inst addin s structure module als as outline 24 n : h ng d o d

(41)

u m th re re a c fi S d 4 T a th p in a th a F 4 T th w The mo under the n method is t he graph a equired as eturned if are nodes common pa To calcu ind_shorte Second, the distance. 4.1.2.3 Ma The Main m and candid his databa personally nitializing and the ont

his data an analyze use Figure 4-3: 4.2 Build The ontolog he base on was built an ost importa name get_ to calculate according t s input to p the two “ in the de arent. ulate the s est_path fi e mileston ain module module, de date data is ase is restr identifiab the applica tology is lo nd provide er and onto : Main cla ing the on gy defines n which thi nd what pr ant metho _distance(n e the distan to the form produce as Node” obj epest leve hortest pa inds the sh e values of and buildi efined as sh s provided icted for p ble inform ation, Mai oaded from s an API w ology inform ass and it ntology how differ is applicati roblems oc od of this a node1 : No nce(c1, c2) mula given output a n ects are id el of the t ath, get_di hortest pa f the node ing the gra hown in Fi as a datab privacy rea mation (a in parses th m JSON int with metho mation. ts attribute rent perfor ion stands ccurred alo application ode, node2 ) between t n in Section number bet dentical, w tree with t istance wo ath betwee s along the aph igure 4-3, base dump asons. The s describ he filtered to a Graph ods to matc es rmance me s. This sect ong the way

n resides i 2 : Node). two concep n 2.3.2. Tw tween 0 an while 1 indi the root n rks in two n node1 an e path are ties everyt into a text file was fi ed in Se database i h-object. Th ch skills as etrics relate tion describ y. T in the Gra . The purp pt nodes, c wo “Node” nd 1. An ou icates that node as th o steps: fir nd node2 e used to ca thing toget xt file. Dire iltered to r ection 3.2 into candid The Main m s well as m te to each o ibes how th The Application | 2 aph modul pose of thi c1 and c2, i objects ar utput of 0 i t the input heir closes st, a call t using BFS alculate th

ther. All job ect access t remove an 2.3). Whe date object module use aintain an other and i he ontolog 25 le is n re is ts st to S. he b to ny n ts es d is gy

(42)

4.2.1 Fetching ontological elements

The purpose of the ontology is to provide relationships between concepts used for describing candidates and job postings. The best place to look for concepts to populate the ontology is in real world user data and comprehensive international skills repositories, such as DISCO [16].

In order to apply for a job on the Cheffle platform, a user must first register a profile. To do so, the user provides information about themselves including: title, skills, experience, and education. The elements used to populate the ontology were collected from these fields for a total of 476 registered user profiles. Sections 4.2.1.1-4.2.1.3 gives a short rundown of these 4 different fields.

4.2.1.1 Title

The title element is perhaps the most crucial when defining a candidate. It is important that any potentially occurring titles are represented in the ontology. Table 4-1 shows the most frequently used titles in the job profiles. Note that the titles are given in Swedish as this is the language that most users used to enter the data.

Table 4-1: Most frequent skills

Title Frequency Kock 64 Servitris 36 Bartender 21 Student 18 Servitör 17 Hovmästare 7 Kökschef 7 Restaurangchef 6 4.2.1.2 Skills

The skills field is the only field in which the entries come from a predefined vocabulary consisting of 74 different terms of which zero or more can be chosen. Additionally, when submitting a job posting, these terms can be used to describe the skillset required for the position. Figure 4-4 shows how skills can be added to either a job profile or a job posting.

This thesis project initially used only predefined skills for ontology building and searching. However, later when doing user data analysis, it was discovered that few

(43)

u n m (s F 4 T c T fi fi A c c T 4 C d users actua necessary f made to exp specifically Figure 4-4: 4.2.1.3 Edu The educat contain zer Title/Qualif inding per ield was co All 4 sectio contained l consuming Table 4-2: Employer Title/Qua Date Notes 4.2.2 Struc Collecting duplicates ally entere for usabilit pand the s y, Title, Ed : Adding ucation and tion and e ro or more fications, D rformance onsidered. ons are fr long and to be inclu Exampl r/School alifications cturing the r all of the and irrele ed anythin ty as the s scope of th ducation, an skills to a Experience experience entries wi Date, and N metrics to This field ree text. P sometimes uded within e of an Ed EBS (Eur Bartende Augusti 2 Högklass Teorikun bartränin pouring retrieved da terms ga evant or m ng in the sole resour he search to nd Experie a job profi e e sections with each en Notes. See o populate proved to Parsing inf s unclear n the scope ducation o ropean Ba er 2013 sig grundu nskaper om ng, preppn etc. ata athered fro meaningles skills fiel rce when s o find perf ence). ile or job a both follo ntry havin Table 4-2 the ontolo be the mo formation text) prov e of this th or Experie rtender Sc utbildning i m sprit och ning, bar ve om the us ss data, re ld. This is searching. formance m ad on Che ow the sam g 4 section for an exa ogy only th ost useful a in the “N ved too cu hesis projec ence entry chool) i allt som i h dess histo

ett och etik

ser databa esulted in T s far less Thus, a de metrics in effle. me format ns: Employ ample. For he Title/Qu and was ea Notes” sect umbersome ct. y in a job p ingår i bary oria, drinkr kett, flairin ase and fi a list of 1 The Application | 2 data tha ecision wa other field t. They can yer/Schoo r the sake o ualification asily parsed tion (which e and tim profile yrket. recept, ng, free iltering ou 109 uniqu 27 n as ds n l, of ns d. h me ut ue

(44)

te h in in th m st F 4 A a it h s p p erms. To b hierarchy th ndustry. T nput from he DISCO market stan tructure of Figure 4-5: 4.3 Testin A short ex algorithms tem and tw human par econd or t participants program th build the o hat model To accompl Cheffle em O definition ndards wh f the result : Overvie 109 tota ng xperiment performan wo respon rticipant w third item) s complete hat was crea

ntology, th ed the rela lish this, t mployees. T ns and to ich could b ting ontolo ew of the o al element was cond nce in com nse items) was asked ) to the ad ed 50 uniq ated to con hese terms ationships two differe The purpo create an be refined ogy is laid o ontology u ts ducted to mparison w were ran to identify ddressed qu que compa nduct the t s needed to between t ent tools w se of this c n ontology to suit Ch out as show used in th investigate with a hum ndomly sel y the mos uestion (th arisons. Fi testing. o be struct terms after were utiliz combined m y based on heffle’s spe wn in Figur e applicat e the succ man. Three ected from t closely r he first item gure 4-6 s T tured into r their use zed: DISCO method wa n internat ecific needs re 4-5. tion, show cess of the e items (on m the onto related res m). Each o shows the The Application | 2 a relationa e within th O [16] an as to utiliz tional labo s. The mai wing 10 ou e matchin ne question ology. Each sponse (th of the thre GUI of th 28 al he d ze or n ut ng n h he ee he

(45)

F e c n T T P C C A 4 T p 4 T p re c m s Figure 4-6: The res employees correct/inc not in accor Table 4-3: Test Person Cheffle 1 Cheffle 2 Recruiter Average 4.4 Funct This sectio potential im 4.4.1 Func The main f performanc eferred to calculated w metrics by econd set : Test pro sults of th and one orrect sim rdance wit Compar method 2.2 # Correc 35 33 35 34.33 tionality a on describ mplementa ctionality function of ce metrics in this re with the f averaging with the h ogram gra his experim independ ilarities” r h the unde rison of th ds. Simple t # Inco 15 17 15 15.67 nd Implem bes the p ations of th f the applic s based o eport as “o formula de the simila highest re aphical int ment are dent recrui refers to th erlying mat he differen matching rrect Me cor sim 7 mentation prototype’s he applicati cation is to n an und ontology”. escribed in arity of eac espective si terface summariz iter partic he mean of tching algo nt resultin g refers to an of rect milarities. 0.72834 0.71993 0.69531 0.71453 s function ion. o rank ma derlying re Similarity n Section ch term in imilarity. T ed in Tab cipated in f all answer orithm. g values b the exam Mean of incorrec similari 0.410 0.473 0.308 0.397 nality and atch quality elational h y between 2.3.2 and n the first s Table 4-4 T ble 4-3. T the test. rs chosen between m mple given f ct ities. 041 380 833 751 d discusse y between hierarchy, two single between t set to the t shows a c The Application | 2 Two Cheffl “Mean o either in o matching in section s differen two sets o commonl e metrics i two sets o term in th compariso 29 le of or n nt of ly is of he n

(46)

between the example matching method from Section 2.2 and the new semantic method using the same parameters.

Table 4-4: Comparison of the different resulting values between matching methods. Simple matching refers to the example given in

Section 2.2

Applicant Simple matching Semantic method Similarity Position Similarity Position

Applicant1 0.33 3, 4 0.9375 3

Applicant2 0.66 1, 2 0.9896 1

Applicant3 0.33 3, 4 0.9323 4

Applicant4 0.66 1, 2 0.9427 2

As shown in Table 4-4, the issue of simple matching being unable to sufficiently distinguish between applications does not exist for the same example job and applicants when using the semantic method. One more feature of the matching function is the possibility of prioritizing certain metrics above others. Following the same example, we could choose to value “Japanese Food” higher by adding it a second time in the set of requested performance metrics. I.e. Job = {Woking, Japanese Food, Desserts, Japanese Food}. Table 4-5 shows the new matching values after weighting.

Table 4-5: Comparison between unweighted and weighted for 2x

“Japanese Food” in the Job set of performance metrics from the previous example.

Applicant Unweighted 2x Japanese Food

Similarity Position Similarity Position

Applicant1 0.9375 3 0.9296 4

Applicant2 0.9896 1 0.9843 1

Applicant3 0.9323 4 0.9492 3

Applicant4 0.9427 2 0.9570 2

Additionally, the matching method is a non-commutative operation, meaning that similarity (Set 1, Set 2) is not the same as similarity(Set 2, Set 1). Table 4-6

(47)

shows an example of the difference in matching score when swapping the order of a match. This occurs because Set 1 is intended to describe the set of required metrics while Set 2 represents the set to be evaluated according to the requirements.

Table 4-6: Shows an example of the non-commutativity in matching

Set 1 Set 2 Matching score

{Bartender} {Bartender, Server} 1

{Bartender, Server} {Bartender} 0.796875

4.4.2 Implementation

This subsection describes potential implementations of the prototype. Section 4.3.3.1 describes the planned implementation at Cheffle. Section 4.3.3.2 describes possible implementations for application in other areas.

This application is intended to serve two purposes at Cheffle. Namely:

1. The ability for companies buying job ads to instantly receive a list of candidates with high matching skillsets.

2. The ability to provide a better service for job seekers by providing a more accurate search function as well as automated suggestions/notifications of open positions that fit both the candidates’ preferences and qualifications.

For example, if a posted job ad required the skillset {"restaurangchef", "hovmästare", "bordsservering", "vinkunskap"}, the best matching candidates from the database of registered users would be as shown in Table 4-7.

(48)

Table 4-7: Result of requesting the top 5 matches to {"restaurangchef", "hovmästare", "bordsservering", "vinkunskap"} of all users in the Cheffle database.

Title Summary of skills, education and experience.

Matching score

1 Restaurangchef {”hovmästare”, ”restaurangchef”, ”kock”, ”sommelier”, ”servitör”}

0.9609375 2 Restaurangchef /

hovmästare / barmästare

{”bar”, ”runner”, ”hovmästare”, ”restaurangchef”, ”bartender”, 'servitör}

0.9140625

3 Restaurangchef {”administration”,

”restaurangchef”, ”restaurang”, ”hovmästare”, ”kock”, 'kökschef}

0.875

4 Servitör/Hovmästare {”bartender”, ”restaurangchef”, ”servitör”} 0.859375 5 Caféansvarig, Servitris och Bartender {”bartender”, ”marknadsföring”, ”administration”, ”receptionist”, ”servitris”} 0.828125 4.4.2.1 Other uses

As the scope of matching skills is only limited by the scope of the ontology, this application could easily serve other industries if provided with the corresponding industry specific ontology or if the of current ontology were expanded. However, in the case of expansion, possible application wide scaling performance issues due to using a much larger ontology might need to be addressed.

(49)

Results and Analysis | 33

5 Results and Analysis

In this chapter, the results are presented and discussed.

5.1 Major results

The major results of this thesis project are divided into two parts. Subsection 5.1.1 describes the applications ability to derive additional knowledge from provided concepts and Subsection 5.1.2 discusses results from the user test.

5.1.1 Ability to access additional knowledge

One of the main issues this thesis project sought to address was the insufficient capability of distinguishing between two sets of performance metrics, as described in Section 2.2. This insufficiency resulted in different applications that could too easily score the same similarity value when compared to a set of required metrics. As the goal was to find the best possible match, more information was needed in order to avoid such occurrences.

To solve this, Chapter 4 introduced specialized relations among performance metrics, thus forming a hierarchy between the different concepts. The application could then use the additional knowledge derived from these relations in order to find the most suitable match between two sets of performance metrics with higher precision than the simple matching method described in Section 2.2. As shown in Table 4-4, the issue of insufficient differentiation between applications due to simple matching does not exist for the same example job and applicants when using the new proposed semantic method.

5.1.2 User tests

In order to create a system able to pair jobs with potential candidates, it is of paramount importance that the matching algorithm is able to accurately match the performance metrics required for a job with those representing each candidate. In order to measure this, a test was created to compare matching done by the application against matching done by qualified recruiters.

At first glance, the test results shown in Section 4.3 seem poor in comparison with the matching ability of the application. Only 69% of the best matches chosen by the test participants aligned with those of the application. However, all performance metrics used in the tests were chosen at random, which produced some difficult questions. For example, one question might be: Which of the following is closest related to “Indian food (cooking)”: “Receptionist” or “Cleaner (Hotel)”? Questions like this one are a dice throw for the human participant and would seldom be relevant in a real world scenario, as the matching algorithm would rank both the options so low that they would be considered irrelevant.