A Comparative Study between GeneticAlgorithm, Simulated Annealing and a Hybrid Algorithm for solving a University Course Timetabling Problem

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2018,

A Comparative Study between GeneticAlgorithm, Simulated

Annealing and a Hybrid Algorithm for solving a University Course Timetabling Problem

ALZAHRAA SALMAN

ROUWAYD HANNA

(2)

A Comparative Study between Genetic Algorithm, Simulated Annealing and a Hybrid Algorithm for solving a University

Course Timetabling Problem

En jämförande studie mellan Genetisk algoritm, Simulerad glödgning och en hybrid algoritm för att lösa ett universitetsschemaläggningsproblem

ALZAHRAA SALMAN, ROUWAYD HANNA

Degree Project in Computer Science, DD142X Supervisor: Alexander Kozlov

Examiner: ¨ Orjan Ekeberg

EECS, KTH. Stockholm, Sweden. June 3, 2018.

(3)

Abstract

Every year, universities are faced with the problem of having to schedule events to various resources such as lecturers, classrooms and time slots while considering different constraints. The University Course Timetabling Problem is a NP-complete combinatorial optimization problem that, if solved manually, requires great investment in time and money. Thus finding an algorithm that automates this process would prove beneficial for society.

The aim of this thesis is to compare the performance of a Genetic Algorithm- Simulated Annealing hybrid implementation with the performance of each of the algorithms individually for solving the University Course Timetabling Problem.

The data sets used were inspired by the Royal Institute of Technology in Stock- holm. The results showed that Simulated Annealing performed better than the other two algorithms, with respect to time consumption. However, the hybrid algorithm showed great promise of actually producing a feasible solution before terminating as the complexity of the problem increased, for example in the biggest data set tested.

(4)

Sammanfattning

Varje ˚ar st˚ar universitetet inför problemet med att planera händelser till oli- ka resurser, s˚asom föreläsare, klassrum och tidsluckor, med hänsyn till flertal fördefinierade villkor. Universitetsschemaläggningsproblemet är ett NP-fullständigt kombinatoriskt optimeringsproblem som kräver mycket tid och pengar om det löses manuellt. Att hitta en algoritm som automatiserar denna process skulle därför vara till nytta för samhället.

Syftet med denna avhandling är att jämföra prestandan för en Genetisk Algoritm- Simulerad Glödning hybrid implementering med prestandan för var och en av algoritmerna individuellt för att lösa Universitetsschemaläggningsproblemet. Da- tamängderna som används är inspirerade av Kungliga Tekniska Högskolan i Stockholm. Resultaten visade att Simulerad Glödning presterade bättre än de andra tv˚a algoritmerna, med hänsyn till tidskonsumtion. Hybrid algoritmen visade dock ett stort potential att faktiskt ta fram en acceptabel lösning, innan den terminerar, när komplexiteten av datat ökade, till exempel i det största datasetet som testades.

(5)

Introduction

Scheduling is a difficult problem that can be found almost everywhere in society.

We take for granted that traffic lights schedules when you can drive, public transport is on time and that the school schedules are fair, while in fact solving the scheduling problem requires investment in time and money.

Every semester, all universities face and have to solve the University Course Timetabling Problem (UCTP). UCTP is a task of assigning events of an university to various resources such as lecturers, classrooms and time slots while considering different constraints [1]. Those constraints can be divided into two groups, hard constraints and soft constraints [2]. Hard constraints must not be violated for a solution to be considered feasible while soft constraints only improve the quality of the solution but do not have to be fulfilled.

The UCTP is a NP-hard combinatorial optimization problem, which means that there does not exist a known polynomial time algorithm for solving the problem optimally. However, since scheduling problems, such as UCTP, often appear in society, scientists have come up with different approaches and algorithms for solving them. These algorithms are mostly meta-heuristic such as evolutionary algorithms and local search algorithms. In addition to UCTP belonging to the class of NP-hard problems, it is also difficult to find a general solution to the problem since every university has a unique set of constraints which must be considered.

Evolutionary algorithms consist of several heuristics that are able to solve optimization problems by imitating some aspects of natural evolution. One example of an evolutionary algorithm is the Genetic Algorithms (GA) that is inspired by natural selection and was created by John Henry Holland in 1970 [3].

Local search algorithms move between different solutions in the search space by applying changes until the optimal solution is found or the time bound is elapsed. Simulated Annealing (SA) is one particular local search algorithm that

(8)

is inspired by the process of annealing in metallurgy and was first proposed in 1983 [4].

Useful and efficient features of different algorithms are often applied together to solve problems. The combination of different algorithms may eliminate the weakness of individual methods and lead to a more suitable algorithm [5]. The evolutionary algorithms mostly perform better in the early stages of the process whereas the local search algorithms perform better in the late stages. To combine these characteristics, many hybrid algorithms have been implemented for solving the UCTP [4].

1.1 Problem Statement

Dahl and Fredriksson came to the conclusion that GA performs relatively better than SA in the early stages in the process, whereas the latter performs better in the final stages [4]. Therefore it seems appropriate to implement and investigate a hybrid between GA and SA, where an unfinished solution of GA is fed as initial solution to SA.

This thesis compares the performance of a GA-SA hybrid implementation with the performance of each of the algorithms of the hybrid individually for solving the UCTP. The hybrid algorithm is constructed by the authors of this report with GA constructed by Pertoft and Yamazaki [3] and SA constructed by Dahl and Fredrikson [4].

The research question of this report is: How well does the hybrid algorithm perform, in terms of runtime, compared to GA and SA individually when solving UCTP?

1.2 Purpose

Scheduling is one of the problems that has been thoroughly researched over the years. The results of this study could potentially improve approaches for finding solutions for UCTP. Therefore it is an interesting topic to investigate.

The purpose of this study is to investigate the possibility of improving approaches for finding solutions for UCTP by combining different algorithms in practical application, i.e. investigate the potential of a hybrid algorithm. The algorithms in the hybrid are inspired by Pertoft and Yamazaki [3] and Dahl and Fredriksson [4].

(9)

1.3 Scope

There are different classes of the academic timetabling problem: school timetabling, course timetabling and examination timetabling [6]. This report focuses on the class of course timetabling. The algorithms used to solve UCTP in this report are compared in respect to runtime.

The GA and SA used in this study were originally constructed by Pertoft and Yamazaki [3] and Dahl and Fredriksson[4] respectively, where they did not take soft constraints into consideration. Therefore, the hybrid algorithm does not implement these either.

Furthermore, the data sets used in this study are inspired by the Royal Institute of Technology, KTH, but are scaled down and only certain constraints will be taken into account. Therefore, the result will not be directly applicable to the real-life scheduling problem of KTH but could yield guidance for further research.

1.4 Outline

The report is divided into six chapters. The first chapter introduces the subject, the problem statement and the purpose of the study. In chapter 2, Background, the university timetabling problem and the three different algorithms are described in general and previous research is introduced. In the third chapter, Method, the methodology for how the study was carried out is explained. The fourth chapter consists of the results which are later discussed in the fifth chapter, Discussion. Lastly the results are concluded in the final chapter, Conclu- sion.

(10)

Chapter 2

Background

2.1 The University Course Timetabling Prob- lem

The university course timetabling problem is a NP-hard combinatorial optimization problem. Before most of the calculations were done by computers, this problem had to be solved manually which took a lot of time and would not guarantee a satisfactory solution [6]. Briefly, the problem could be explained as: given a set of data and constraints find a solution that violates as few of the constraints as possible [4]. The data sets could for example consist of rooms, lecturers and student groups whereas a constraint could for example be that a student can not have two different events at the same time. Due to the unique- ness and the complex nature of this problem, a general solution is infeasible to create.

A. Schaerf [6] classifies the academic timetabling problem into three main classes:

• School timetabling

• Course timetabling

• Examination timetabling

The school timetabling class consists of weekly scheduling for all classes of a school, avoiding teachers meeting two classes at the same time. The second class is the course timetabling which consists of the scheduling of all classes in a set of university courses while minimizing the overlaps for students. The last class is the examination timetabling and here the problem is to schedule the exams for a set of university courses while avoiding overlapping of exams of courses having common students and spreading the exams as much as possible [6].

(11)

2.1.1 Constraints

In general, constraints can be divided into two categories: hard constraints and soft constraints. Hard constraints can be seen as requirements and have the highest priority [7]. For a solution to be considered feasible, all hard constraints must be fulfilled. Some examples of hard constraints regarding the UCTP are as follows:

• No student group has two events at the same time.

• No lecturer has two events at the same time.

• A room can not hold more students or instructors than its capacity.

Soft constraints are constraints that do not have to be fulfilled for a solution to be considered valid. They are a way of improving the quality of the solution. If soft constraints are met, given that all hard constraints are fulfilled, the solution would be more desirable but if not, the solution would still be feasible. Some examples of soft constraints regarding the UCTP are as follows:

• No student has a class in the last time slot of the day.

• A student should not have a long free time between classes.

• A student should not have a day with a single class.

When dealing with constraints, a fitness function must be defined to determine the quality of a solution and whether or not the solution is feasible. The fitness function enables comparison between different solutions [7]. Considering UCTP, the fitness function will add negative punishment points for each violation of a constraint with each constraint violation having a certain penalty value. The goal in solving the UCTP would therefore be to converge the fitness function to zero.

2.2 Meta-heuristic Algorithms

Meta-heuristics are a group of procedures which benefit from some sort of intelligence when searching for the solution of a problem. Meta-heuristic algorithms calculate an approximate solution rather than the optimal one. However, expe- riences during the past two decades have shown that meta-heuristics find the solution rapidly and effectively [8]. SA and GA are two examples of meta- heuristic algorithms.

2.2.1 Genetic Algorithm

Melanie Mitchell states that there is no rigorous definition of Genetic algorithm that is accepted by everyone [9]. However Mitchell suggests that most GAs

(12)

have at least the following elements in common: populations of chromosomes, selection according to fitness, crossover to produce new offspring, and random mutation of new offspring.

The genetic algorithm consists of many steps and has its own terminology. It begins with a set of random solutions to the problem. These initial solutions are randomized and crude, therefore the quality of them are not guaranteed at this stage. In GA, each candidate solution is called a chromosome and a group of chromosomes form a population. Each chromosome consists of a set of values that represent certain properties of the solution, also called genes. A fitness function can be calculated for a chromosome using its genes, thus determining the quality of the chromosome. Knowing the quality of each chromosome in a population is beneficial for selecting parent chromosomes to cross and create offspring. The offspring’s genes will mostly consist of the genes of their respective parents and the rest of the genes are created from a process called mutation.

Mutation is a random procedure in which randomly selected genes are changed or swapped. If a mutated chromosome becomes infeasible, an optional repair algorithm may be applied to turn the chromosome into a feasible solution. This process is repeated until a satisfactory chromosome is found [9].

Selection

Selection is one of the main stages of the GA which involves selecting parent chromosomes from the current population to create new offspring. There are many selection methods and some of them are elitism selection and roulette- wheel selection. The main idea of elitism selection is to select parent chromosome with the best fitness values to create offspring. In roulette-wheel selection, the parent chromosomes are randomly selected, where chromosomes of higher fitness are more likely to be selected [3].

Crossover

After selecting the chromosomes, the parents are crossed to create offspring.

A crossover method describes how to combine the parent genes to create the offspring. There are different crossover methods such as single point crossover, two point crossover and uniform crossover. Single point crossover works by randomly choosing a single gene index and creating offspring by swapping the parents’ tails [3].

(13)

Figure 2.1: Single point crossover [10]

Mutation

Mutation is a random procedure that can occur for each newly created offspring chromosome with a certain probability, given by the mutation rate. Mutation helps by expanding the search space and avoid getting stuck at local optima.

Mutation involves swapping the properties of the genes that are being mutated [3].

2.2.2 Simulated Annealing

SA is a local search algorithm with the special property of accepting solutions that are worse during the search for the sake of expanding the search space. SA is inspired by the annealing process in metallurgy to shape and form a solution until it is fit enough [4]. Annealing in metal work involves heating and cooling a metal to alter its physical properties and change its moldability.

SA works in a similar way where a solution represents the metal that is being shaped and a temperature variable is kept to simulate the heating process. The temperature variable determines the susceptibility of the solution and cooling it down will close in on a fixed and final solution. Initially, the temperature is set to a high value to allow the algorithm to more likely accept solutions that are worse. This will allow the algorithm to avoid getting stuck in local optima early on during execution. As the algorithm runs, the temperature value is slowly reduced and so is the chance of accepting worse solutions. This in turn narrows down the search space and will hopefully find a close to optimum solution [11].

Just as with GA, the quality of a solution in SA is based on a fitness value calculated with a fitness function. In SA, the fitness value of a solution is also used to calculate an acceptance value. The acceptance function determines the probability of accepting a worse solution. Variables such as old fitness, new fitness and temperature affects the outcome of the function. Higher temperature means higher acceptance value [4].

(14)

2.2.3 Hybrid Algorithms

Current UCTP state of the art methods often include hybridization of different algorithms. Hybrids of several different algorithms are commonly used nowadays to solve the timetabling problem. The main motivation behind the hybridization of different algorithms is to gain advantage of the complementary character of different optimization strategies. In other words, hybrids are believed to benefit from synergy where only the strongest parts of each algorithm is used for the hybrid [12].

Both GA and SA have their advantages and disadvantages. Dahl and Fredriks- son came to the conclusion that GA performs relatively better than SA in the early stages in the process, whereas the latter performs better in the final stages [4]. Intuitively, a hybrid algorithm between them would therefore be appropriate to get the best of both worlds.

2.3 Previous Research

Research on UCTP has been conducted since early 1960’s and various solution techniques have been applied to the problem ever since [13]. Due to the complex nature of this problem, many approaches for solving it have been tested, including hybridization of different algorithms. It is found that meta-heuristic techniques are the most suitable for approximating UCTP [14]. These meta- heuristic include local search algorithms such as tabu search and simulated annealing and evolutionary algorithms like particle swarm and genetic algorithm.

Previous research has shown that evolutionary algorithms are good for exploring the whole search space. Evolutionary algorithms have the capability of quickly exploring and finding promising regions in the search space [15].

Pertoft and Yamazaki implemented a Genetic Algorithm and applied it to solve UCTP for multiple data samples inspired by KTH [3]. The sizes of the input data were carefully designed to study the scalability of the GA. The GA in their study successfully solved UCTP and the results showed that the GA rapidly increases the quality of a solution during the early stage of the process for all sizes of input data.

Dahl and Fredriksson implemented a Simulated Annealing algorithm and compared it to the GA constructed by Pertoft and Yamazaki to investigate which of them were fastest when solving UCTP [4]. The results of their study showed that the implemented SA performs much better than the GA, with respect to runtime. An interesting result was that the GA performs relatively better than SA in the early stages, whereas the latter performs better in the final stages.

Current UCTP state-of-the-art include hybridization of different algorithms to solve the problem. Different local and population based techniques are merged

(15)

with each other to eliminate disadvantages of one another [16]. Jat and Yang present a hybrid approach, which combines genetic algorithm and a tabu search heuristic, to solve the post enrolment course timetabling problem which is one type of UCTP [17]. The experimental results from their thesis showed that the proposed hybrid algorithm is competitive and is able to efficiently find optimal or near-optimal solutions for the problem. Al-Betar and Khader have implemented a hybrid between harmony search algorithms and hill climbing optimizers for approximating the UCTP [18]. The results showed that their hybrid can find a high quality solution within a reasonable time.

(16)

Chapter 3

Method

3.1 Test Approach

Each algorithm was tested by feeding the data set as an input file to the Java test suite program, which can be found in Appendix A. In the test suite program the variables are initialized, an algorithm is chosen and a timestamp variable is saved to compare it with an end timestamp after the algorithm is finished. The time is calculated in seconds (s). A test is considered done when the algorithm finds a solution with a fitness value of 0. Because of the randomness of all algorithms used, each algorithm was tested with 20 tests and the average time was calculated.

Because of the size and complexity of the largest data sets (XL and XXL) used in the tests, a time limit of 500 seconds was introduced for GA.

To analyze the change in the fitness function level, a test was made that kept track of time and fitness value pairs to retrieve data about the rate at which each fitness level converged to 0 for each algorithm.

3.1.1 Environment

The algorithms were run on a MacBook Pro with a 2.7 GHz Intel Core I5 processor and 8GB of RAM. The algorithms were run one at a time. During the runs, no other programs were running.

(17)

3.2 Algorithm Implementations

The GA that is used in this report was constructed by Pertoft and Yamazaki [3]

and the SA by Dahl and Fredrikson [4]. The data structures that were used by the aforementioned are also used for the hybrid algorithm. A solution consists of a 5x4 matrix for each room in the data set where a column represents a weekday and the row represents each time slot of that day. Each element in the matrix consists of an integer value that represent the ID of the event that is being scheduled and a value of 0 means that no event is being scheduled in that time slot. For the problem to be considered solved, all events must be assigned a time slot without violating any of the hard constraints. All of the source code used in this report can be found in Appendix A.

3.2.1 Genetic Algorithm

Pertoft and Yamazaki made extensive testing of different methods and start conditions to use in GA, therefore the same are used for the GA in this report.

The algorithm uses roulette-wheel selection and single point crossover. The initial population size is 100 timetables and the mutation rate is set to 6%.

The GA can be described with the pseudo code in algorithm 1, quoted from the study made by Pertoft and Yamazaki [3].

Algorithm 1 Genetic Algorithm Implementation

1: function GA

2: create random population and evaluate fitness of its chromosomes

3: while most fit individual is not fit enough do

4: while offspring population is not full do

5: select two parent chromosomes with roulette-wheel selection

6: perform single point crossover with the two parent chromosomes

7: mutate offspring chromosome

8: repair offspring chromosome

9: evaluate fitness of offspring chromosome

10: add offspring chromosome to offspring population

11: end while

12: merge the parent and offspring populations

13: delete the rest of the chromosomes

14: end while

15: return most fit chromosome from population

16: end function

(18)

3.2.2 Simulated Annealing

The SA in this report was implemented by Dahl and Fredrikson where they followed the concept of simulated annealing strictly. The same acceptance function and parameters is used in this study. The start temperature (T_start) is set to 100 and the final temperature (Tf inal) is set to 0.7. The cooling rate (k ) is set to 0.9995. Dahl and Fredrikson mean that these values created an even spread and right amount of iterations while still being time efficient, therefore the same values are reused.

Initially, a random solution is generated and fed to SA as a parameter. The algorithm iterates over the temperatures in the interval (Tstart− Tf inal), and cools it with a factor k each time. A randomly modified solution is produced, and compared to the current.

The SA can be described with the pseudo code in algorithm 2, quoted from the study made by Dahl and Fredrikson [4].

Algorithm 2 Simulated Annealing Implementation

1: function SA(sol^bad, Tstart, Tf inal, k)

2: solcurrent← solbad 3: solbest← solbad 4: T ← Tstart

5: while T > Tf inal do

6: solnew← M ODIF Y (solcurrent)

7: if accept(f itness(sol_current), f itness(sol_new), T )) > rand(0, 1) then

8: sol_current← solnew

9: end if

10: if f itness(sol_new) > f itness(sol_best) then

11: sol_best← sol_new

12: end if

13: T ← T ∗ k

14: end while

15: return solbest 16: end function

3.2.3 GA-SA Hybrid

The hybrid algorithm implemented in this report starts with running GA with a random population. The main loop of GA is run, with each generation creating better chromosomes, until a stopping condition is met and the top chromosome of the current population is returned. The incomplete solution retrieved from GA is fed as initial solution to SA which will continue the search for the near optimal solution. When switching algorithms from GA to SA, a lower start temperature of 80 degrees is used to avoid redundancy. GA will already have

(19)

approached the better part of the search space so SA does not have to be so willing to accept worse solutions but rather converge into the best in that area.

The cooling rate is set to 0.9998 to insure that more time is spent on finding a better solution.

Having a stopping condition in the hybrid algorithm determines when GA should stop running and switch to SA. There are many approaches for choosing a stopping condition for GA, for example stop running after the top individual of a population has reached a certain fitness value or stop running after a certain amount of time has passed. Pertoft and Yamazaki meant that the GA performs better in the early stages and becomes stagnant after some time [3]. Therefore, it seemed appropriate to switch algorithms when the change of fitness value became idle. The final implementation of the hybrid had the following stopping condition: stop GA either when the fitness value of the top chromosome of the current generation had not improved in the previous twenty generations or when the fitness value had reached above -20. After extensive testing, this stopping condition was found to be the most suitable.

The GA-SA hybrid can be described with the pseudo code in algorithm 3.

Algorithm 3 GA-SA Hybrid Implementation

1: function Hybrid

2: uncomplete timetable ← GA() . Using stopping condition

3: f inal timetable ← SA(uncomplete timetable, 80, 0.7, 0.9998)

4: return f inal timetable

5: end function

3.3 Data sets

The three algorithms are run on five different data sets where four of them are the same as the ones used by Dahl and Fredrikson [4]. The XXL data set is created by the authors of this report to help simulate a more lifelike UCTP.

The input data is inspired by the Royal Institute of Technology, KTH, and is formatted according to figure 3.1. The beginning of each section is marked with an octothorpe (#), followed by the name of the section. Each section then containes a number of entries for all the different properties.

(20)

Figure 3.1: Input Data File Format

The five input data files vary in the number of students and hence the required courses and classes that needs to be allocated. The different data sets are sum- marized in Table 3.1.

Input Data File S M L XL XXL

Lecture Rooms 2 2 3 6 7

Lesson Rooms 3 5 6 10 15

Lab Rooms 3 5 7 11 15

Courses 12 15 21 29 39

Lecturers 9 12 15 21 28

Student Groups 6 8 12 21 30

Total Events 70 115 159 293 431

Total Time Slots 160 240 320 540 740 Event Density 0.44 0.48 0.50 0.54 0.58

Table 3.1: Summary of the different test data sets.

3.4 Constraints

The problem is considered solved when all events are assigned a time slot and at the same time not breaching any of the hard constraints. A solution (timetable) is considered acceptable when all of the following hard constraints are fulfilled:

• Every event in every course is assigned a time slot.

• No student group has two events at the same time.

• No lecturer has two events at the same time.

• All events are in the right kind of room.

• No two events are scheduled in the same room at the same time.

(21)

• No event is in a room with less capacity than the number of students at the event.

One may also consider soft constraints but they are not taken into consideration due to the scope of this study.

3.4.1 Fitness Function

To grade the solution, a fitness level function is used. GA was optimized by Pertoft and Yamazaki [3] with a weighted fitness function, since some of the hard constraints were found to be more often violated in the beginning than others. Dahl and Fredrikson [4] reused the same fitness function in their implementation of SA and so will this report for the implementation of the hybrid algorithm.

f itness(timetable) = (2x₁+ x₂+ 4x₃+ 4x₄) (3.1) In equation 3.1, x₁...x₄ are as follows:

x₁ returns the number of double booked student groups.

x2 returns the number of double booked lecturers.

x3 returns the number of room capacity breaches.

x4 returns the number of room type breaches.

The hard constraint No two events are scheduled in the same room at the same time is not considered in the fitness function since the data structure, matrix, makes it impossible to violate it. No cell of the matrices can hold more than one event id.

(22)

Chapter 4

Results

In diagrams shown in figures 4.1, 4.2 and 4.3, the results from GA, SA and the hybrid are presented. In these diagrams, each bar represents a test run, with run time on the logarithmically scaled y-axis, and each cluster represents a data set.

Figure 4.1: The results from the genetic algorithm.

The M and L data sets need more time to produce a solution for GA, as can be seen in figure 4.1. No feasible solution was found (fitness value never reached 0) when running the XL and XXL data sets with GA for 500 seconds on any of the 20 runs, hence these runs are not included in the diagram.

(23)

Figure 4.2: The results from the simulated annealing.

Figure 4.2 shows that the time consumption for SA was, for all test runs, smaller than GA’s respective test runs. Some of the runs in the XL data set and all of the test runs in the XXL data set did not produce a feasible solution before terminating.

Figure 4.3: The results from the GA-SA hybrid.

(24)

Figure 4.3 shows that the GA-SA hybrid performs better than GA but a bit worse than SA, in terms of runtime. All test runs for the hybrid managed to reach fitness level zero and produce a feasible solution before terminating.

In figure 4.4, the average run time for each of the three algorithms can be found for the five different data sets. The bars for the algorithms that terminated before reaching a feasible solution are not included. Table 4.1 contains the exact values for the average time for each algorithm.

Figure 4.4: The average runtime of all three algorithms with 20 runs for each data set.

Data set GA SA Hybrid

S 0.964 0.481 0.709

M 8.669 1.214 1.979

L 73.461 2.221 3.871

XL - 10.102 16.224

XXL - - 44.042

Table 4.1: The average runtime of all three algorithms for each data set.

The fitness improvement of one test run on the XXL data set is shown for each of the algorithms in figures 4.5, 4.6 and 4.7. Notice the different scales for the x-axis. In figure 4.5 it is seen that GA improved the solution quickly in the early stages of the run but much slower in the later stages. For GA, the fitness level never reached 0 for the XXL data set. The fitness improvement for SA can be found in figure 4.6. SA improved the fitness more evenly over the

(25)

run but similar to GA, the fitness value never reaches 0. The fitness level had approximately been improved from -1600 to -15. It can be seen, in figure 4.7, that switch between GA and SA in the hybrid happens after approximately 18 seconds when the fitness improvement for GA starts to slow down. The hybrid is the only algorithm where the fitness level reaches 0 on the test run on the XXL timetable.

Figure 4.5: Fitness improvement of the genetic algorithm on a test run on the XXL data set.

(26)

Figure 4.6: Fitness improvement of the simulated annealing on a test run on the XXL data set.

Figure 4.7: Fitness improvement of the GA-SA hybrid on a test run on the XXL data set.

(27)

Chapter 5

Discussion

This chapter presents a comparison between the algorithms, discussing the runtime and the improvements of the fitness level. Possible improvements are suggested for future research.

5.1 Algorithm comparison

The results clearly show that GA performs worse, with respect to time consumption, than the other two algorithms for all test runs. This is due to GA being heavily computational since it creates a large amount of bad solutions and cross them until a better one is found. However, GA does manage to improve the fitness value during the early stages of large data sets, which can be seen in figure 4.5. The results for GA are consistent with that of Pertoft and Yamazaki [3], which was expected since this study used the same implementation and parameters. As the hybrid was the main focus of this study, little time was spent on improving the other two algorithms. Pertoft and Yamazaki discussed a wide array of possible improvements for GA, which included adaptable parameters, altering various methods and parallelisation.

SA performed better at every given time interval. The implementation of SA followed the rules of Simulated Annealing strictly and is not as computationally demanding as the other two algorithms. Just as with GA, little time was spent on improving SA. A possible improvement for SA could be to change the modification process instead of having it be random. Different modification strategies could be investigated to find a more suitable one.

What is interesting is that the hybrid algorithm performs worse than SA in terms of runtime. This is mostly because of the GA part of the hybrid. This is also pointed out by figure 4.7 where one can see the sudden increase of slope around 18 s in the curve that shows the change of fitness over time, indicating

(28)

the transition from GA to SA in the hybrid algorithm. Transitioning from GA to SA required some stopping condition in GA, and extensive testing was done to find a near optimum one. During this testing, it was found that the performance of the hybrid algorithm is proportional to the amount of time GA was run before transitioning to SA. The longer the stopping condition allowed GA to run, the worse was the performance of the hybrid. It was hard finding an appropriate stopping condition since SA was better than GA in all test cases.

Perhaps the cooperation between GA and SA does not benefit the run time for solving UCTP as much as we thought it would.

Although the hybrid performed worse than SA with respect to time consumption, it managed to reach the desired fitness level of zero in all test runs with the XL and XXL data sets while both GA and SA failed to do that. SA terminated before it had reached zero in some of the test cases in XL and all the test runs of the XXL data set. We think that the reason for the early termination of SA is that it got stuck in a local optima in the late stages and the temperature value reached its end point before the fitness value reached zero. The fact that the hybrid performs better with larger data set is in agreement with previous research, that genetic algorithms are better at exploring the whole of the search space and narrow down while simulated annealing is good at finding the best solution in that space. Since the XXL data set is the closest one to simulate a real life UCTP, the hybrid algorithm could be a more reliable choice when dealing with a more complex scheduling problem. However, how well these algorithms managed to produce a feasible solution for large data set was not researched extensively in this report.

5.2 Future Research

There are several improvements and future research to consider. The implementations used in this report were static, shared many parameters and did not consider soft constraints. Possible improvements for these issues are presented in this section.

This study took inspiration from and used the same static parameters as Pertoft and Yamazaki and Dahl and Fredriksson for GA and SA respectively. This means that for all different data sets and during the whole runtime of a test, the parameters used were not changed or adapted in any way. Many improvements for GA, suggested by Pertoft and Yamazaki, discussed the idea of making it more dynamic, by changing parameters in accordance to performance during the execution. Also, the difference in size between the smallest data set (S) and the largest data set (XXL) used was significant. Despite that, the same parameters were used in all algorithms and for all data set sizes. Future research could also be to investigate the effect of different parameters for different sizes of data.

(29)

The fitness function that Pertoft and Yamazaki used for GA was reused by Dahl and Fredriksson for SA. The fitness function was specially optimized to make GA perform better by punishing some constraints harder. Since GA and SA are different algorithms that work in different ways, it would be appropriate to develop different fitness functions for them. This in turn could improve the performance of the hybrid since it relies heavily on both algorithms.

In the real world, soft constraints are attractive to improve the quality of scheduling. This study did not consider soft constraints when finding a solution for the UCTP. However if considered, it could have changed the outcome of the results. For example, all algorithms managed to find a feasible solution for data set smaller than XL. With soft constraints considered, one algorithm could outperform the other in terms of how well they pass the soft constraints.

Future research could clarify or confirm this.

(30)

Chapter 6

Conclusions

The implemented hybrid algorithm performs better than GA, for all sizes of data, but a bit worse than SA, for small to moderate sizes of data, with respect to time consumption when solving UCTP. On the other hand, the hybrid algorithm is the only algorithm that manages to produce a feasible solution (fitness level reaches zero) before terminating when the complexity of the problem increases.

This means that the hybrid algorithm could be a more reliable choice when dealing with more life-like UCTP, for example with the biggest data set tested (XXL). The hybrid algorithm can be improved mainly by improving GA in accordance to Pertoft and Yamazaki.

(31)

Bibliography

[1] Esra Aycan and Tolga Ayav. Solving the course scheduling problem using simulated annealing. In Advance Computing Conference, 2009. IACC 2009.

IEEE International, pages 462–466. IEEE, 2009.

[2] Edmund Burke, Kirk Jackson, Jeffrey H. Kingston, and Rupert Weare.

Automated university timetabling: The state of the art. The computer journal, 40(9):565–571, 1997.

[3] Hiroyuki Vincent Yamazaki and John Pertoft. Scalability of a genetic algorithm that solves a university course scheduling problem inspired by kth, 2014. Bachelor’s thesis. Stockholm: Royal Instititue of Technology, Diva id

= diva2:771121.

[4] Rasmus Fredrikson and Jonas Dahl. A comparative study between a simulated annealing and a genetic algorithm for solving a university timetabling problem, 2016. Bachelor’s thesis. Stockholm: Royal Instititue of Technol- ogy, Diva id = diva2:929059.

[5] Hamed Babaei, Jaber Karimpour, and Amin Hadidi. A survey of approaches for university course timetabling problem. Computers & Industrial Engineering, 86:43–59, 2015.

[6] Andrea Schaerf. A survey of automated timetabling. Artificial intelligence review, 13(2):87–127, 1999.

[7] H˚akan Andersson. School timetabling in theory and practice a comparative study of simulated annealing and tabu search, 2015. Bachelor’s thesis.

Ume˚a University, Diva id = diva2:852117.

[8] Hossain Poorzahedy and Omid M Rouhani. Hybrid meta-heuristic algorithms for solving network design problem. European Journal of Opera- tional Research, 182(2):578–596, 2007.

[9] Melanie Mitchell. An introduction to genetic algorithms. MIT press, 1998.

Massachusetts.

(32)

[10] Genetic algorithms - crossover, 2018. https://www.tutorialspoint.

com/genetic_algorithms/genetic_algorithms_crossover.htm, Vis- ited 2018-05-28.

[11] L. Jacobson. Simulated annealing for beginners, 2013. http://www.theprojectspot.com/tutorial-post/

simulated-annealing-algorithm-for-beginners/6, Visited 2018- 03-24.

[12] Christian Blum, Jakob Puchinger, G¨unther R Raidl, and Andrea Roli.

Hybrid metaheuristics in combinatorial optimization: A survey. Applied Soft Computing, 11(6):4135–4151, 2011.

[13] Y Awad, A Dawood, and A Badr. An evolutionary immune approach for university course timetabling. IJCSNS International Journal of Computer Science and Network Security, 11:127–135, 2011.

[14] Rhydian Lewis. A survey of metaheuristic-based techniques for university timetabling problems. OR spectrum, 30(1):167–190, 2008.

[15] M Fesanghary, M Mahdavi, M Minary-Jolandan, and Y Alizadeh. Hybridiz- ing harmony search algorithm with sequential quadratic programming for engineering optimization problems. Computer methods in applied mechan- ics and engineering, 197(33-40):3080–3091, 2008.

[16] Rafia Ilyas and Zahid Iqbal. Study of hybrid approaches used for university course timetable problem (uctp). In Industrial Electronics and Applications (ICIEA), 2015 IEEE 10th Conference on, pages 696–701. IEEE, 2015.

[17] Sadaf Naseem Jat and Shengxiang Yang. A hybrid genetic algorithm and tabu search approach for post enrolment course timetabling. Journal of Scheduling, 14(6):617–637, 2011.

[18] Mohammed Azmi Al-Betar and Ahamad Tajudin Khader. A hybrid harmony search for university course timetabling. In Proceedings of the 4nd multidisciplinary conference on scheduling: theory and applications (MISTA 2009), Dublin, Ireland, pages 157–179, 2009.

(33)

Appendix A

Source code

Java source code to the algorithms can be found in the public github repository at: https://github.com/alzahraasalman/GA_SA_Hybrid_Comparison

(34)

Appendix B

Data sets

B.1 S - small

# ROOMS D1 200 0 D3 50 1 D45 40 1 E1 350 0 E35 40 1 SPEL 40 2 SPOR 30 2 MUSI 40 2

# COURSES CALC 2 1 0 JAVA 1 0 1 MULT 2 0 1 CTEC 1 2 0 CSEC 0 1 1 SCON 1 1 1 DIGI 1 0 1 ENGM 1 0 1 ALGD 1 1 0 ELEC 1 0 0 PROB 1 0 1 OPER 1 1 0 TERM 2 0 1 DIFF 2 1 0 MECH 0 1 2 QUAN 1 1 0

(35)

OOPC 1 1 1 TCHE 2 1 0 PERS 1 0 0 REAC 1 0 2 POLY 1 1 0

# LECTURERS SVEN CALC MULT BERT JAVA SCON OOPC KARL CSEC

GUNN CTEC BERI DIGI ERIK DIFF SARA OPER OLLE ENGM ELEC BENG ALGD JUDI TERM REAC MANS MECH MICH QUAN PELL PROB DARI TCHE POLY MORT PERS

# STUDENTGROUPS COMP_1 200 CALC JAVA COMP_2 120 MULT CTEC COMP_3 70 CSEC SCON INFO_1 200 DIGI ENGM INFO_2 100 ALGD ELEC INFO_3 50 PROB OPER

B.2 M - Medium

# ROOMS D1 200 0 D2 50 1 D3 50 1 D45 40 1 D46 40 1 E1 350 0 E35 40 1 SPEL 40 2 SPOR 30 2 MUSI 40 2 ROD 30 2 ORA 30 2

(36)

# COURSES CALC 2 1 0 JAVA 1 0 1 MULT 2 0 1 CTEC 1 2 0 CSEC 0 1 1 SCON 1 1 1 DIGI 1 0 1 ENGM 1 0 1 ALGD 1 1 0 ELEC 1 0 0 PROB 1 0 1 OPER 1 1 0 TERM 2 0 1 DIFF 2 1 0 MECH 0 1 2 QUAN 1 1 0 OOPC 1 1 1 TCHE 2 1 0 PERS 1 0 0 REAC 1 0 2 POLY 1 1 0

# STUDENTGROUPS COMP_1 200 CALC JAVA COMP_2 120 MULT CTEC COMP_3 70 CSEC SCON INFO_1 200 DIGI ENGM INFO_2 100 ALGD ELEC INFO_3 50 PROB OPER PHYS_1 200 CALC TERM

(37)

PHYS_2 180 DIFF MECH

B.3 L - Large

# ROOMS D1 200 0 D2 50 1 D3 50 1 D45 40 1 D46 40 1 E1 350 0 E35 40 1 E36 40 1 F1 300 0 SPEL 40 2 SPOR 30 2 MUSI 40 2 ROD 30 2 ORA 30 2 VIO 40 2 GRA 30 2

# COURSES CALC 2 1 0 JAVA 1 0 1 MULT 2 0 1 CTEC 1 2 0 CSEC 0 1 1 SCON 1 1 1 DIGI 1 0 1 ENGM 1 0 1 ALGD 1 1 0 ELEC 1 0 0 PROB 1 0 1 OPER 1 1 0 TERM 2 0 1 DIFF 2 1 0 MECH 0 1 2 QUAN 1 1 0 OOPC 1 1 1 TCHE 2 1 0 PERS 1 0 0 REAC 1 0 2 POLY 1 1 0

# LECTURERS

(38)

SVEN CALC MULT BERT JAVA SCON OOPC KARL CSEC

# STUDENTGROUPS COMP_1 200 CALC JAVA COMP_2 120 MULT CTEC COMP_3 70 CSEC SCON INFO_1 200 DIGI ENGM INFO_2 100 ALGD ELEC INFO_3 50 PROB OPER PHYS_1 200 CALC TERM PHYS_2 180 DIFF MECH PHYS_3 100 QUAN OOPC CHEM_1 150 CALC TCHE CHEM_2 130 PERS DIFF CHEM_3 100 REAC POLY

B.4 XL - Extra large

# ROOMS Q1 250 0 D1 200 0 D2 50 1 D3 50 1 D45 40 1 D46 40 1 D31 40 1 D32 40 1 E1 350 0 E35 40 1 E36 40 1 E51 40 1

(39)

E52 40 1 F1 300 0 Q1 300 0 ALBA 400 0 TEXC 60 2 SPEL 40 2 SPOR 30 2 MUSI 40 2 ROD 30 2 ORA 30 2 VIO 40 2 GRA 30 2 KAR 30 2 MAG 30 2 BRU 30 2

# COURSES CALC 2 1 0 JAVA 1 0 1 MULT 2 0 1 CTEC 1 2 0 CSEC 0 1 1 SCON 1 1 1 DIGI 1 0 1 ENGM 1 0 1 ALGD 1 1 0 ELEC 1 0 0 PROB 1 0 1 OPER 1 1 0 TERM 2 0 1 DIFF 2 1 0 MECH 0 1 2 QUAN 1 1 0 OOPC 1 1 1 TCHE 2 1 0 PERS 1 0 0 REAC 1 0 2 POLY 1 1 0 MAGN 2 2 0 POLT 3 2 1 NUMD 2 2 3 TERT 2 0 0 DDED 3 2 0 MAGA 3 1 0 NUMA 3 1 3 TERA 3 1 0

# LECTURERS

(40)

SVEN CALC MULT BERT JAVA SCON OOPC KARL CSEC

GUNN CTEC BERI DIGI ERIK DIFF POLT SARA OPER OLLE ENGM ELEC BENG ALGD JUDI TERM REAC MANS MECH MAGN MICH QUAN PELL PROB DARI TCHE POLY MORT PERS LEFT TERA PATR TERT MIHA DDED DILI NUMA CGRI NUMD STEF MAGA

# STUDENTGROUPS COMP_1 200 CALC JAVA COMP_2 120 MULT CTEC COMP_3 70 CSEC SCON INFO_1 200 DIGI ENGM INFO_2 100 ALGD ELEC INFO_3 50 PROB OPER PHYS_1 200 CALC TERM PHYS_2 180 DIFF MECH PHYS_3 100 QUAN OOPC CHEM_1 150 CALC TCHE CHEM_2 130 PERS DIFF CHEM_3 100 REAC MAGN DDOS_1 150 POLY POLT DDOS_2 140 NUMD TERT DDOS_3 120 MAGA DDED BIZZ_1 150 POLY POLT BIZZ_2 140 TERA BIZZ_3 120 NUMA MIZZ_1 50 POLY MECH MIZZ_2 40 CALC MIZZ_3 20 ELEC

(41)

B.5 XXL - Extra extra large

# ROOMS Q1 250 0 D1 200 0 D2 50 1 D3 50 1 D45 40 1 D46 40 1 D31 40 1 D32 40 1 E1 350 0 E35 40 1 E36 40 1 E51 40 1 E52 40 1 F1 300 0 Q1 300 0 ALBA 400 0 TEXC 60 2 SPEL 40 2 SPOR 30 2 MUSI 40 2 ROD 30 2 ORA 30 2 VIO 40 2 GRA 30 2 KAR 30 2 MAG 30 2 BRU 30 2 B1 300 0 B11 50 1 B12 50 1 B21 40 1 B22 40 1 B23 40 1 RODSPEL 40 2 RODSPOR 40 2 BLASPEL 30 2 BLASPOR 50 2

# COURSES CALC 2 1 0 JAVA 1 0 1 MULT 2 0 1 CTEC 1 2 0

(42)

CSEC 0 1 1 SCON 1 1 1 DIGI 1 0 1 ENGM 1 0 1 ALGD 1 1 0 ELEC 1 0 0 PROB 1 0 1 OPER 1 1 0 TERM 2 0 1 DIFF 2 1 0 MECH 0 1 2 QUAN 1 1 0 OOPC 1 1 1 TCHE 2 1 0 PERS 1 0 0 REAC 1 0 2 POLY 1 1 0 MAGN 2 2 0 POLT 3 2 1 NUMD 2 2 3 TERT 2 0 0 DDED 3 2 0 MAGA 3 1 0 NUMA 3 1 3 TERA 3 1 0 CPRO 2 1 3 PROS 1 2 1 ORKA 2 1 3 DEPF 1 1 1 MEKA 0 2 1 MVVK 2 2 0 FLVA 3 1 0 ENVA 1 1 0 HOLF 0 2 2 INET 3 1 1

GUNN CTEC BERI DIGI ERIK DIFF POLT SARA OPER OLLE ENGM ELEC BENG ALGD JUDI TERM REAC

(43)

MANS MECH MAGN MICH QUAN PELL PROB DARI TCHE POLY MORT PERS LEFT TERA PATR TERT MIHA DDED DILI NUMA CGRI NUMD STEF MAGA ANNA CPRO PROS FRIA ORKA DEPF ALEX MEKA MARK MVVK FLVA LEXA ENVA LUKE HOLF LILY INET

# STUDENTGROUPS COMP_1 200 CALC JAVA COMP_2 120 MULT CTEC COMP_3 70 CSEC SCON INFO_1 200 DIGI ENGM INFO_2 100 ALGD ELEC INFO_3 50 PROB OPER PHYS_1 200 CALC TERM PHYS_2 180 DIFF MECH PHYS_3 100 QUAN OOPC CHEM_1 150 CALC TCHE CHEM_2 130 PERS DIFF CHEM_3 100 REAC MAGN DDOS_1 150 POLY POLT DDOS_2 140 NUMD TERT DDOS_3 120 MAGA DDED BIZZ_1 150 POLY POLT BIZZ_2 140 TERA BIZZ_3 120 NUMA MIZZ_1 50 POLY MECH MIZZ_2 40 CALC MIZZ_3 20 ELEC FIZZ_1 100 CPRO FIZZ_2 50 PROS CPRO FIZZ_3 70 ORKA DEPF RIZZ_1 150 DEPF MEKA RIZZ_2 70 MVVK RIZZ_3 100 FLVA

(44)

GIZZ_1 50 DIGI ENVA GIZZ_2 150 JAVA MEKA GIZZ_3 80 HOLF INET

(45)

A Comparative Study between GeneticAlgorithm, Simulated Annealing and a Hybrid Algorithm for solving a University Course Timetabling Problem

A Comparative Study between GeneticAlgorithm, Simulated

Annealing and a Hybrid Algorithm for solving a University Course Timetabling Problem

ALZAHRAA SALMAN

ROUWAYD HANNA

A Comparative Study between Genetic Algorithm, Simulated Annealing and a Hybrid Algorithm for solving a University

Course Timetabling Problem

ALZAHRAA SALMAN, ROUWAYD HANNA

Degree Project in Computer Science, DD142X Supervisor: Alexander Kozlov

Examiner: ¨ Orjan Ekeberg

EECS, KTH. Stockholm, Sweden. June 3, 2018.

Contents

Chapter 1

Introduction

1.1 Problem Statement

1.2 Purpose

1.3 Scope

1.4 Outline

Chapter 2

Background

2.1 The University Course Timetabling Prob- lem

2.1.1 Constraints

2.2 Meta-heuristic Algorithms

2.2.1 Genetic Algorithm

2.2.2 Simulated Annealing

2.2.3 Hybrid Algorithms

2.3 Previous Research

Chapter 3

Method

3.1 Test Approach

3.1.1 Environment

3.2 Algorithm Implementations

3.2.1 Genetic Algorithm

3.2.2 Simulated Annealing

3.2.3 GA-SA Hybrid

3.3 Data sets

3.4 Constraints

3.4.1 Fitness Function

Chapter 4

Results

Chapter 5

Discussion

5.1 Algorithm comparison

5.2 Future Research

Chapter 6

Conclusions

Bibliography

Appendix A

Source code

Appendix B

Data sets

B.1 S - small

B.2 M - Medium

B.3 L - Large

B.4 XL - Extra large

B.5 XXL - Extra extra large