A Comparative Evaluation of Metaheuristic Approaches to the Problem of Curriculum-Based Course Timetabling

(1)

A Comparative Evaluation of Metaheuristic Approaches to the Problem of Curriculum-Based Course Timetabling

Daniil Bogdanov

Bachelor’s Thesis at KTH, Royal Institute of Technology School of Computer Science and Communication

Supervisor: Pawel Herman Examiner: Mårten Olsson

May 21th, 2015

(2)

Abstract

Timetabling is an active area of research and used in a wide range of applications. As the development of most of these applications is on its way towards automation, the need for automated timetabling increases. Despite many years of research and development of automated approaches, solving NP-hard problems such as timetabling problems remains a challenge.

Metaheuristic-based approaches to these problems are constantly being refined and further developed as the complexity of these applications increases. But despite the increase in complexity, the time it takes for these algorithms to solve these problems is constantly being challenged.

While this thesis covers the fundamentals in metaheuristic approaches to the problem of timetabling, its main focus is to compare how two known metaheuristic algorithms, Tabu Search and Simulated Annealing, perform across different scales of resources that are to be scheduled.

To attempt fairness, similar implementations of these two algorithms were made in order to eliminate systematic biases. For each set of resources the algorithms solves a timetabling problem under a limited amount of time and computational capacity. The collective quality of all the produced timetables were compared. The results show that Simulated Annealing perform slightly better in the majority of the instances but with little margin for the collective quality of all tables. Despite trying to set a common ground for the these similar metaheuristic approaches, the underlying difficulties in comparing algorithms are discussed.

(3)

Sammanfattning

Schemaläggning är ett aktivt forskningsområde och har ett stort antal tillämpningsområden. Då utvecklingen av de flesta av dessa tillämpningar är på väg mot automatisering, ökar behovet av automatiserad schemaläggning. Trots många års forskning och utveckling av automatiserade tillvägagångssätt, är det fortfarande en utmaning att lösa NP-svåra problem såsom schemaläg- gningsproblem. Metaheuristiska metoder som löser dessa problem förfinas ständigt och vi- dareutvecklas i takt med ökande komplexitet i de tillämpningar dem löser. Men trots den ökade komplexiteten utmanas ständigt tiden det tar för dessa algoritmer att lösa dessa problem.

Då denna avhandling behandlar grunderna i metaheuristiska tillvägagångssätt till schemaläg- gningsproblem, är dess huvudsakliga fokus att jämföra hur två kända metaheuristiska algoritmer, Tabu Search och Simulated Annealing, presterar vid olika skalor av de resurser som skall schemaläggas. För att göra jämförelsen rättvis, implementerades dessa två algoritmer likartat vilket syftar till att eliminera systematiska fel. För varje uppsättning av resurser löser algo- ritmerna ett schemaläggningsproblem under en begränsad tid och beräkningskapacitet. Den kollektiva kvalitén hos de producerade tidtabellerna jämförs. Resultaten visar att Simulated An- nealing presterar något bättre i de flesta av fallen men med lite marginal sett till den kollektiva kvalitén hos respektive algoritm. Trots försök att fastställa en gemensam grund för dessa lik- nande metoder, diskuteras de underliggande svårigheterna i att jämföra algoritmer.

(4)

Acknowledgements

I would like to thank Helena Sandgren at the Royal Institute of Technology for her cooperation and expertise on how scheduling is done at her university. It gave me insights on the complexity of these problems and how real world timetabling problems are being solved today.

I especially want to thank Pawel Herman at the Royal Institute of Technology, Dept. of Computational Biology, for supervising me through the work of this thesis. I have learned a lot on the subject of academic work thanks to his guidance. As he has been the only one familiar to my work, the received feedback has been vital in writing this report.

(5)

1 Introduction

Automated resource planning has a wide range of applications in areas such as universities, high schools, sports, employment, hospitals etc. Anywhere where there is a need for order and structure in both aspects of time and space, a desire for a well-thought planning is most relevant.

More resources have to be spent if the planning fails to execute at the right time and place than the cost of planning itself. This is why having a reliable and accurate planning is desired. While the essence of planning has been somewhat static over time, the areas in which this planning is needed have become more complex. But with an increasing computation power of today’s computer and the development of sophisticated algorithms, one can hope that this complexity can be handled in a reasonable way.

While resource planning is used in a wide range of applications, the purpose of planning can be many. For instance, hospitals is in need for resource planning when scheduling the resources of human workflow, both for employees but also for the flow of patients. Resource planning is also used within companies when constructing a timetable for employers and their working schedule. And having an automated planning system could help to handle salaries of each em- ployee since the system keeps track of how many hours each employer works. For universities a timetable for all students has to be created at least once every year. While this is still manually done in many universities, ways of trying to automate this work has long been an active subject [1]. Over the past decades there have been a large interest in applying metaheuristic algorithms to university timetabling [2]. Metaheuristic approaches are in their nature general- purpose algorithms which can be applied to problems which have a great deal of variation such as timetabling.

Timetabling problems can be regarded as high-dimensional, non-Euclidean, multiconstraint combinatorial optimization problems [3]. But despite of knowing what types of categories these problems fall into, they still lack a formal definition. This is because it is difficult to have a general formulation that suits all cases. Institutions have their own definition of the problem depending on the area of investigation and the nature of the problem. This makes the field of resource planning hard to systematically develop further [4]. But it should not be retained from evaluating and comparing metaheuristics to gain a better understanding of how algorithms that solve these problems perform under various conditions and restrictions.

Efforts in standardizing timetable problems through International Timetabling Competitions (ITC) confirms that there is an awareness in the community for such needs [4]. The aim for these competitions has been to create a common ground for comparing algorithms on standardized benchmarking tests. Formulating problems which covers the fundamentals in timetabling has been one way of setting a common ground in this area. Benchmarking results which compare metaheuristic approaches shows which algorithm perform well in specific tests, but not always in varying circumstances. Projects aiming to focus on comparing and analyzing performances in varying conditions such as Metaheuristic Network (MN) has put efforts in creating a common ground for comparisons [5]. Analyzing metaheuristics under these conditions and exposing them to different environments rather than to static standardized tests only might help to understand the key factors of when and why some algorithms perform better than others.

(7)

2 Background

In essence, timetabling problems consists of assigning a number of events, each having their own features, to a limited number of resources subject to certain constraints [6]. The solution strongly depends on these constraints since they define the outlines of the problem. Verifying that a timetable indeed solves the problem requires one to understand how the constraints reflects upon the involving resources. Different constraints give rise to different conclusions regarding which approach to the problem is the best one. This is why making comparisons between solutions regarding efficiency and accuracy for problems with different constraints may give different results if the problems are defined on equal terms.

2.1 Curriculum-based course timetabling

Timetabling within universities can be divided into two categories depending on the course enrollment system. In some universities students are obligated to pick their own courses and in others universities there are curriculums which have predetermined courses that the students enroll. The two different systems is characterized as Course-Based Timetabling (CTT) and Curriculum-Based Course Timetabling(CB-CTT) respectively. Despite the enrollment system, there is a great need for automated timetabling systems in both cases [1]. While some universities already have automation in their work, there is still a need for manual aid in order to construct a high quality table. The main problems with automated systems is that they must be able to generate high-quality timetables despite the huge variation in constraints and resources that schedulers use to construct a table [7]. They must be easy to use and include all functions needed to generate high-quality timetables. Since each university have their own constraints to satisfy, that may vary over time, and different resources to schedule, both commercial packages and self-developed automated systems often fail to provide all the required functions to have a fully automated timetabling system.

2.2 Timetabling at the Royal Institute of Technology

The Royal Institute of Technology (KTH) falls under the category of CB-CTT. A timetable is created for each semester and is based on the courses each curriculum offer their students.

Typically there are twenty possible events each week for a curriculum to schedule their classes, two before noon and two after, each being 2 hours long and having fixed starting hours. The classes start at 8 am, 10 am, 1 pm and 3 pm. Some classes such as laboratories require more than 2 hours but still start at the same fixed starting hours. The courses that are scheduled after 5 pm are considered evening courses and is often outside the portfolio of the mandatory courses offered by curriculums.

Each course mainly consists of theoretical lectures and practical classes. Occasionally some courses have laboratories, tests or seminaries. These special events usually bring about addi- tional constraints since they require that the students has been taught the material beforehand.

Therefore they cannot be scheduled as any other event and has to be handled with caution.

The rooms in which the classes are held in varies depending on the type of class. Bigger halls that have the capacity to fit all students for a given course is used for theoretical classes.

(8)

Professor give theoretical lectures and because his or her time is valued and they are a limited resource to universities, theoretical classes are never divided into smaller groups. On the other hand, practical classes are usually divided into smaller groups and require more classrooms for each class. Here students from a higher academic year often acts as teachers and give the practical classes.

Two timetables are constructed each year, one for each semester. A large set of courses which is acquired through a database is used as a foundation to the upcoming timetable. Addi- tional information for each course may be found in this database which often has to be handled manually. Information such as different types examination modules, requests and preferences from the lecturer and in general how the structure of the course is thought to be given. These requests vary a lot from course to course and is one of the reasons why scheduling at KTH with a fully automated timetabling system is still not yet in use [7].

Bigger universities often have a more complex scheduling system since more instances are involved and hence making it more challenging. But despite the size of universities, there are some features and constraints that almost always are present in timetabling problems.

2.3 Constraints

Constraints can be divided into two parts, hard constraints and soft constraints. If a hard constraint is violated, the timetable cannot be accepted as a feasible solution. This is because the constraints are usually physically impossible to ignore and must therefore be highly prioritized.

The soft constraints are less forbidding and will therefore be less prioritized but ideally you want to violate as few constraints as possible in order to find the best possible solution. Soft constraints usually arise as preferences from different resources that are to be scheduled. Soft constraints may be defined in such a way that they work against each other. One has to then specify which constraint impacts the quality of the timetable more and optimize the timetable in that regard.

2.4 Metaheuristic algorithms

Metaheuristic algorithms may be applied to a wide variety of problems since they are not problem specific. Their successful comes from the fact they cleverly generate solutions to problems that are often hard to solve for exact solutions. They combine heuristic methods aiming to efficiently and effectively explore the solution space of a problem. The solution space for timetabling problems consists of the set of all possible combinations one can make in order to construct a table. The shape of the solution space depends on an objective function which has to be defined for each problem. This function determines the quality of the tables and affects how metaheuristics explore the solution space.

The generation of new solutions is done either by exploring the solutions space locally or constructing a solution from scratch by adding components until a complete table is generated.

These ways of finding solutions are referred as local search and constructive methods respectively. Some local search methods starts to explore the solution space from an initial solution which is often generated randomly or in a greedy way. Algorithms which uses this approach are classified as trajectory based. While these approaches may quickly generate approximate

(9)

solutions, accuracy is traded for speed. But since methods of finding exact solutions are hard to construct due to the complexity of the problem, metaheuristics are often a good first approach to these problems.

Intensificationand diversification are two terms often used in the fields of metaheuristics.

Intensification is often referred to as the ability to avoid getting trapped in confined areas in the solution space. And diversification is referred as the ability to explore new areas and quickly identify regions of the solution space which bring about high quality solutions. While both aspects are vital to these algorithms, they may sometimes work against each other [8]. It is therefore of importance to find a balance between them to achieve optimal performance.

2.5 State of the art

There are many approaches to the problem of timetabling and methods has been hybridized to optimize the solution method of specific timetabling problems. New approaches to metaheuristic algorithms has in recent time been developed to specific timetabling problems and shown promising results.

Generic algorithms which has its inspiration from Darwinian evolution has been a hot topic to researches in the field of timetabling. These algorithms represent the timetable in a long en- coded string just as DNA encoding. Mutations are done on these strings in the form of operators to find better timetables. These operators come in a variety of forms and has long been studied to optimize specific problems. These operators bring diversity into the generation of new solution and has increased the adaptiveness in many applications. Today they successfully timetables courses at the University of Edinburgh, the Harvard Business School, Kingston University and several other institutions [1].

Although metaheuristic algorithms such as generic algorithms has shown to perform well in standardized benchmarking tests such as tests at ITC, other approaches has also made it to the spotlight. For problems which have many constraints that has to be handled, boolean sat- isfiability (SAT) if a preferred way of solving them. For problems where the resources can be divided into smaller groups and assigned independently from each other, a two-stage Integer Linear Programming (ILP) has shown to perform well [9].

Other popular algorithms being used in today’s benchmarking tests are Memetic Algorithms, Constraint Logic Programming (CLP), Tabu Search (TS) and Simulated Annealing (SA) to name a few.

2.6 Complexity theory

Timetabling has long been known to belong to the complexity class of problems called NP- hard (None-deterministic Polynomial time hard) [9]. Deterministic problems (P-problems) are characterized as such; if a solution is given, one can check (in determined polynomial time) if this solution is a valid one. Also algorithms that finds these solutions are also well understood.

NP-problems are characterized as being easy to check the validity of a solution if one is given, just as P-problems, but algorithms that find these solutions are hard to formulate. Finally NP- hard problems are both hard to both check the validity of a solution and finding algorithms that finds these solutions. Algorithms which is used to solve NP-hard problems has not yet been

(10)

formulated such that a solution is found in a reasonable (polynomial) amount of time. Would such formulation be found, it could solve most NP problems since the essence of these problems can in a mathematical sense be translated into one another. The question if P-problems really are the same as NP-problems has been asked for decades and is known as the P vs. NP problem.

This problem is still one of the unsolved Millennium problems.

3 Problem description

Curriculum-based course timetabling problems consists of constructing a timetable by assigning classes from each course given by curriculums to specific events. While the problem at first sounds simple, many aspects has to be considered which quickly complicates the problem. The solution to timetabling problems is a timetable where every class has been assigned an event with no violations to hard constraints.

A synthetic timetabling problem was approached by using two different metaheuristic algorithms. The algorithms that were studied were Tabu Search and Simulated Annealing. The aim of this investigation were to compare how well these algorithms solved for timetables under certain conditions. A standard setting of the resources were defined as an anchor point for the problem and parameters that both scaled up and increased the difficulty of the problem was varied. The algorithms solve for timetables under these varying conditions under a fixed amount of time and computational capacity. The purpose of this problem was to see how these algorithms performed at various scales and difficulties under restricted limitations. The restrictions were set to 10 minutes or 10000 iterations. These restrictions were set to see how good these algorithms solved for harder and harder problems.

Many metaheuristic algorithms solve optimization problems in very different ways, which can make comparisons between them biased. TS and SA are two metaheuristics that share many similarities. For instance, they both search locally in the solution space and are both trajectory based. In this way, much of the same processes surrounding the problem were used for both algorithms to make the comparison less biased.

The structure of the timetabling problem was inspired from the timetables used at KTH. But the resources used was ultimately synthetically crafted to easier manage the variations that were to be made.

4 Method

4.1 Definitions and notations

Since there is no general formulation of timetabling problems, definitions and notations often vary. To make this problem self-contained, definitions and notations of the problem had to be specified. Throughout this problem the following notations and definitions were used:

Timeslot - T denotes the set of all timeslots. A timeslot is a time period of two hours and each weekday contain four of them. The timeslots starts at fixed times throughout the day, beginning at 8 am, another one at 10 am, 1 pm and the last one at 3 pm. A whole week of

(11)

school therefore consists of 20 timeslots where any class can be scheduled on. The size of this set is denoted T .

Room -R denotes the set of all rooms. Each class has to take place in a room. All rooms are regarded as equally fit for any class to be scheduled in. Logistics features such as distances between rooms and maximum capacities were not considered in this problem. The size of this set is denoted R.

Event -E denotes the set of all events. An event is the composition of a timeslot and a room.

The size of this set is therefore T · R. Every class are to be scheduled to one element in this set.

Curriculum -C1 denotes the set of all curriculums. Each curriculum has a total of 4 courses and 2 teachers, responsible for 2 courses each. The size of this set is denoted C₁.

Course -C2 denotes the set of all courses. Each course consists of classes which are to be scheduled at different timeslots. Each course has one assigned teacher. Since parameters that scale the problem were to be varied, the number of classes for each course were set to vary with it to keep the workload for each curriculum constant throughout each week.

The size of this set is denoted C2.

Class -C3 denotes the set of all classes. Each class has a duration of two hours, matching a timeslot exactly. Classes come in two different types; theoretical classes and practical classes. There is no difference in the way these classes are taught, but used as a feature for this set. The size of this set is C₃.

Teacher -P denotes the set of all teachers. Each teacher is responsible for 2 courses which belong to the same curriculum. Teachers may have other matters to attend and therefore have timeslots which they cannot give classes on. The size of this set is denoted P.

Quantities measuring the ratio of different parameters of this problem was defined to get a better overview of the structure of the timetable being scheduled. Ratios such as

Attendance level ρ - The attendance level is a measure of how many classes any given curriculum is supposed to give each week. To balance the workload for all students each week, ρ is simply defined as the number of classes each week per curriculum.

Unavailability level σ - The unavailability level is a measure of how often each teacher is unavailable. This ratio was simply defined as the amount of unavailable timeslots per week per teacher.

Event compactness Ω - The event compactness of a timetable was defined as the ratio between scheduled and unscheduled events. Since every class was to be scheduled, the ratio could be defined as Ω = C₃/E.

Unavailability compactness ζ - The unavailability compactness is similar to σ but normalized with the number of timeslots each week. The ratio was defined as ζ = σ /20. ζ = 0 means teachers are always available and ζ = 1 means they can never teach class.

(12)

Class compactness η - The class compactness is similar to ρ but normalized with the number of timeslots each week. The ratio was defined as η = ρ/20. η = 0 means there are no classes for any student. η = 1 means students have no empty timeslot throughout the timetable.

4.2 Quality of timetables

In order to say something about the quality of the timetables that these algorithms construct, one has to specify what a good and bad timetable means. Assuming tables do not violate any of the hard constraints, the table with the lowest cost is the one regarded as the most feasible solution. For this task a cost function was used to determine the quality of each table. The cost function considered all constraints and evaluated how many constraints were violated for a given table. Since violating different constraints have different impacts on the quality of the table, cost parameters, λc, were associated to constraint c to differentiate their impact. A higher cost parameter would contribute with a higher cost if violated and thus favoring this timetable less. The cost function was defined as

C(x) =

∑

c

λc· Λc(x) (1)

where Λc(x) is the total number of violations against constraint c and λc is the corresponding weight of that constraint. The output of the cost function was thus a weighted linear sum of the cost from each constraint and used as a measure of quality. This way of defining the objective function for timetabling problems is one of the more common approaches [2].

While the relation between the cost parameters for both hard and soft constraints affects the construction of the timetable, little efforts ware spent to tune these parameters. Higher costs were set to the hard constraints so that these constraints would be prioritized more often than the soft constraints which were given low costs.

4.2.1 Hard constraints

A solution was regarded as feasible if there were no violations towards any of the hard constraints. This is the reason for a high cost parameter for each hard constraint. The hard constraints used in this timetabling problem were the following:

H1: Classes of the same curriculum cannot be scheduled at different rooms at the same timeslot.Violation to this constraint would give the table a cost of λH1= 100 for each class that were scheduled on the same timeslot.

H2: Two classes of the same type, of the same course, cannot occur more than once each day. Time for the students to prepare for new material is needed and is the reason for this constraint. Each class that violated this constraint would contribute to the cost with λ_H2= 75.

(13)

H3: Each teacher has unavailable timeslots where classes cannot be given. Each scheduled class which violated this constraint would contribute with a cost of λH3= 100.

H4: Obey the maximum amount of classes of each type per week for each course. This constraint considered the students workload and evened out the load over the whole week uniformly. Violation of this constraint contributed with a cost of λ_H4= 100.

H5: All classes must be scheduled at a distinct time and room. This constraint was automatically met since the initiation process of an each table made sure that all classes were scheduled somewhere in the table.

H6: Two classes cannot be scheduled to the same room at the same timeslot. This constraint was automatically met since each room at any timeslot could only have one class scheduled at a time because of the implementation of the problem.

4.2.2 Soft constraints

The soft constraints are the ones that should be violated first if any. Ideally a solution should not violate any constraint but these solutions are often impossible to construct and violations will therefore occur. The soft constraints used in this timetabling problem were the following:

S1: Minimize consecutive classes of the same type for each course. Theoretical and practical classes should be taught at the same phase. For each pair of classes that violated this constraint, a cost of λ_S1= 2 was added to the cost function.

S2: For each course of the same type, prefer to schedule for the same room. Students often prefer to have classes of the same type in the same room. The number of different rooms used for a course of the same type of class were used as the value of the cost. This constraint therefore had a weight of only λ_S2= 1.

S3: For each teacher, minimize consecutive classes in any given day. Since each teacher has more than one course, this could happen and lecturing for long hours is tiring. For each consecutive class scheduled, a cost of λS3= 2 was added to the cost function.

S4: For each curriculum, even out the classes throughout the week over morning- and evening timeslots. Classes starting at 8 am and 3 pm contributed equally much in the opposite direction. The same was done for classes given at 10 am and 1 pm. The difference in the number of classes of these two time periods contributed to the cost function. Classes at 10 am and 1 pm contributed with λS4= and classes scheduled at 8 am and 3 pm contributed with 2 · λS4.

4.3 Tabu Search

Tabu Search (TS) is a local search metaheuristic algorithm that utilizes a list called the tabu list which contains previously visited tables. When exploring the solution space, the list is used to make sure that already visited tables are not revisited repeatedly. This is the diversification

(14)

feature of the algorithm since it prevents the algorithm from being stuck in the same area in the solution space. The length of the tabu list may vary depending on the implementation. Common dependencies are the cost of the current timetable or the number of iterations performed, but sometimes it has no dependency at all [6].The length of the list will determine how soon the algorithm may cross an old path later on. Since the size of the list affects the algorithm (con- sider the limiting cases such as no length and infinite length), an optimal size can be determined depending on the problem [12]. Determination of an optimal length of the list was left outside the scope of this report and set to a maximum length of 100. A pseudo code of the algorithm is shown below.

Algorithm 1: Pseudo code of Tabu Search

01 xBest ← Initial table 02 tabuList ← null

0304 while (not stop condition met) 05

06 x0 ← null

07 for (x in NeighborhoodsOf(xBest))

08 if ( x not in tabuList and costFunction(x) < costFunciton(x0) )

09 x0 ← x

10 end

11 end 12

1314 if (costFunction(x0) < costFunction(xBest)) 15 xBest ← x0

16 end 17

18 put x0 in tabuList

19 if (size of tabuList > allowed size) 20 remove first element in tabuList 21 end

22 23 end 24 return xBest

4.4 Simulated Annealing

Simulated annealing (SA) has a thermodynamic analogy where feasible solutions are regarded as states of a system. The system in this case is a piece of imperfect material often thought of as a composite metal. The goal is to reduce the defects in this material by minimizing the internal energy. This is done by heating it up, and overcoming potential barriers in the microscopic structure and then cooling it down to find lower, more stable energy states. This process is controlled by the temperature of the system and repeated until desired property of the metal is reached. Energy in this analogy is the cost of a state and a state is a scheduled timetable. The analogy to the heating process is when the algorithm may accept worse solutions when exploring the solution space. This is the diversification feature of SA and can be seen on line 11 of the pseudo code of Simulated Annealing below.

(15)

Algorithm 2: Pseudo code of Simulated Annealing

01 xBest ← Initial table 02 iterations ← 0

03 while (not stop condition met) 0405 T ← Temperature(iterations) 06 bestCandidate ← null

07 for (x in NeighborhoodsOf(xBest)) 08 if (costFunction(x) < costFunction(xBest)) 09 xBext = x

10 else

11 if (A(x, xBest, T) > RandomInteger from 0 to 1)

12 xBext = x

13 end

14 end 15 end

16 iterations ← iterations + 1 17 end

1819 return xBest

4.4.1 Acceptance function

The acceptance function determines if a worse timetable should be accepted in the process of finding the best solution. This function was defined as to be dependent on the temperature and also by the cost of the tables which were considered. Other implementations vary the dependency, sometimes being independent of the costs of the tables considered [3]. The acceptance function in this timetabling problem was defined as

A(x, x0, TSA) = e⁻^TSA^∆C = e⁻

C(x)−C(x0)

TSA . (2)

where C(x) is the cost function, x₀is the current solution, x is the candidate solution and TSA is the temperature (see 4.4.2 an informative description of the temperature). Since (2) is only used when a worse candidate is found, the cost difference in the exponent is strictly positive, ensuring that 0 ≤ A ≤ 1.

The structure of this acceptance function has its origins from statistical mechanics. (2) can be viewed as the ratio between two Boltzmann factors. These factors depend on the energy of a particular state and determines the probability that a system is found in that particular state.

4.4.2 Temperature

While SA explores the solution space in a similar way as TS, it makes instead use of a parameter often referred to as the temperature, TSA, rather than a list. TSA is a parameter which decreases in value as the iterations of the algorithm increases. The temperature may also be implemented as to rapidly increase in value to at certain times or iterations, mimicking the process of cooling and reheating. The function that defines the temperature has therefore a significant impact in the efficiency of the algorithm depending on implementation [11]. Much efforts in finding an optimal temperature function was not spent since it was not the intentions of this report. Instead, the temperature was defined the following way

T_SA(i) = T₀e^{µ i} (3)

(16)

where T0and µ are constants and i is the iteration count. These constants were determined by considering probabilities in accepting worse tables in situations regarding the soft constraints.

Initially, the algorithm should accept worse solutions more frequently. And as the cost of the current timetable decreases, the algorithm should have a lower probability of moving away from the minimum it is approaching in the solution space. Considering an arbitrary increase in the cost between two neighboring tables regarding only soft constraints, a typical increase was observed to be ∆C ≈ 10. A probability of 10 % was set to be the threshold of accepting worse solutions in the beginning. Thus, calculating backwards one could compute the constant T0to be

TSA= T0· e^{µ ·0}= T0 → 0.1 = e⁻^∆C^T0 → T₀= − ∆C

ln(0.1)≈ 4.3 (4)

The same was done for µ but now considering the probability of accepting worse solutions when i= imax. Setting an acceptance probability of 1 %, one would get a similar result

0.01 = e⁻^TSA(imax)^∆C → T_SA(imax) = − ∆C

ln(0.01). (5)

And by using equation (3) with imax= 10000 as argument, we get

TSA(imax) = T₀· e^{µ ·i}^max → µ = ln(−_ln(0.01)T^∆C

0

imax

) ≈ −6.83 · 10⁻⁵. (6) Having µ < 0 insured that the temperature decreased as the iteration count increased.

4.5 The standard setting

The timetabling problem was to be solved for various conditions, it was therefore necessary to define a standard setting which would act as an anchor point. The standard setting was defined as:

• 240 timeslots. This corresponded to 60 days of scheduling

• 6 curriculums

• 6 available rooms

• 12 classes per week for each curriculum

• 4 unavailable timeslots per week for each teacher

(17)

Through these values the ratios defined in section 4.1 for the standard settings could be computed as shown in the table below

Ratio ρ σ Ω ζ η

Value 12 4 0.6 0.2 0.6

Table 1: Ratio quantities for the standard setting of the timetabling problem.

Note that C1= R = 6. This was intentional since each curriculum can only take up 1 room at any given timeslot. But since η is less than 1.0, all 20 timeslots of a week for every curriculum will not be occupied and thus scheduling for R < C₁was possible and preferred in an economical point of view.

4.6 Variation of parameters

Variations in four different parameters of this timetabling problem were made. T and R scaledE which scaled the problem in size. ρ and σ scaled the difficulty in finding a solution. Increasing ρ meant more classes to handle while increasing σ meant fewer possibilities to where assign classes. The parameters and their variations is presented in the table below:

Parameter minimum maximum interval step

T 120 360 40

R 4 8 1

ρ 10 20 2

σ 2 6 1

Table 2: The table shows the variations in each parameter. These parameters were the number of timeslots T , the number of available rooms R, the number of classes per week for each curriculum ρ and the number of unavailable timeslots per week for each teacher σ .

Although the number of timeslots were varied, it was easier to represent this as variations in days. This variation corresponded to 30 days of scheduling up to 90 days with an interval step of 10 days. Also, in the upper limit when varying ρ, critical ratios such as ζ = Ω = 1 were reached. These corresponded to a completely full timetable.

4.7 Initialization

Because both algorithms are trajectory based, an initial timetable had to be generated. This generation were a mapping betweenC3 andE . It was done by randomly assigning classes to events without any consideration to the hard or soft constraints. The process only made sure that every element inC3had a mapping to an element inE . This made sure that constraint H5 was met.

(18)

4.8 Finding neighboring timetables

While there are many ways of bringing about neighboring tables, only one operation that handled the findings of these neighboring tables was implemented. By randomly picking two distinct elements inE a swap was made between their mappings to C3. Two outcomes could occur. In some cases one element lost its mapping to the other and in other case they would mutually swap classes.

4.9 Implementation

The problem was implemented in Java and run on a PC with an Intel Core 3.10 GHz. Both algorithms were implemented and for each execution when varying parameters, relevant data were extracted. Each algorithm was set to solve for a timetable 8 times for a given instance.

Stop conditions were set to imax= 10000 iterations or a maximum time limit of tmax= 600 s.

5 Evaluation

The first complication that were observed which can be seen in figure 1-4 were that the hard constraints were violated in almost every instance. Stop conditions in both computational capacity limit and time limit were set to see if they solved the problems in a limited amount of time or iterations which they did not. Despite this, comparisons of the performance between these algorithms could still be performed.

(19)

Days

30 40 50 60 70 80 90

Total cost

0 1000 2000 3000 4000 5000 6000

TSSA

Days

30 40 50 60 70 80 90

Hard constraint costs

0 1000 2000 3000 4000 5000

Days

30 40 50 60 70 80 90

Time elapsed [s]

0 100 200 300 400 500 600 700

Days

30 40 50 60 70 80 90

Iterations

0 2000 4000 6000 8000 10000 12000

Figure 1: The graphs show the mean values of different instances taken when varying the number of days. The top left graph shows the mean total cost of the timetables that were constructed for each variation. The top right graph shows the mean hard constraint costs of the timetables that were constructed. The bottom left graph shows the total time elapsed in seconds and the bottom right shows the total amount of iterations for each variation. The red horizontal line indicates the limit set in this timetabling problem.

Figure 2 indicates that there were no difference with respect to the cost of the tables when varying the number of rooms. Although there were some fluctuations in the cost, it may have been because of the randomness in the process of generating solutions. Each bar represents the mean value of 8 runs which might were not enough to eliminate deviations of these kinds.

(20)

Rooms

4 5 6 7 8

Total cost

0 200 400 600 800 1000 1200 1400 1600

TSSA

Rooms

4 5 6 7 8

0 100 200 300 400 500 600 700

Rooms

4 5 6 7 8

Time elapsed [s]

0 100 200 300 400 500 600 700

Rooms

4 5 6 7 8

Iterations

0 2000 4000 6000 8000 10000 12000

Figure 2: The graphs show the mean values of different instances taken when varying the number of available rooms. The top left graph shows the mean total cost of the timetables that were constructed for each variation. The top right graph shows the mean hard constraint costs of the timetables that were constructed. The bottom left graph shows the total time elapsed in seconds and the bottom right shows the total amount of iterations for each variation. The red horizontal line indicates the limit set in this timetabling problem.

(21)

Unvailable timeslots

2 3 4 5 6

Total cost

0 500 1000 1500 2000

TSSA

2 3 4 5 6

0 200 400 600 800 1000 1200

2 3 4 5 6

Time elapsed [s]

0 100 200 300 400 500 600 700

2 3 4 5 6

Iterations

0 2000 4000 6000 8000 10000 12000

Figure 3: The graphs show the mean values of different instances taken when varying the number unavailable timeslots per week for each teacher. The top left graph shows the mean total cost of the timetables that were constructed for each variation. The top right graph shows the mean hard constraint costs of the timetables that were constructed. The bottom left graph shows the total time elapsed in seconds and the bottom right shows the total amount of iterations for each variation. The red horizontal line indicates the limit set in this timetabling problem.

(22)

Classes per week 10 12 14 16 18 20

Total cost

×10⁴

0 0.5 1 1.5 2 2.5

TS SA

×10⁴

0 0.5 1 1.5 2

Time elapsed [s]

0 100 200 300 400 500 600 700

Iterations

0 2000 4000 6000 8000 10000 12000

Figure 4: The graphs show the mean values of different instances taken when varying the number of classes per week for each curriculum. The top left graph shows the mean total cost of the timetables that were constructed for each variation. The top right graph shows the mean hard constraint costs of the timetables that were constructed. The bottom left graph shows the total time elapsed in seconds and the bottom right shows the total amount of iterations for each variation. The red horizontal line indicates the limit set in this timetabling problem.

The resulting cost from each table from each instance were summed and divided with the total number of runs. This gave the total mean value of both algorithms. These values were compared and viewed as a measure of how good these algorithms performed across varying sets of resources. These mean values are presented in table 3.

(23)

Mean values of the total cost

Parameter µT S µSA µT S− µSA r_µ %

T 2133 2052 81 3.9

R 1398 1439 −41 −2.8

ρ 6842 6744 98 1.5

σ 1461 1395 66 4.7

Mean values of hard constraint costs Parameter µT S µSA µT S− µSA r_µ %

T 1196 1042 154 14.8

R 456 439 17 3.9

ρ 5643 5509 134 2.4

σ 536 408 128 31.4

Table 3: The table shows the summed mean values of each algorithm, µT S and µSA, for all variations of parameters. r_µshows the relative difference between the mean values with respect to µSA.

Despite SA having a lower cost in almost all of the instances, the summed mean value from table 3 showed small differences between the algorithms considering the absolute mean values. The deviations when varying T and σ , represented as error bars in figure 5, shows larger variations when both scaling and difficulty of the problem were increased. This suggests different paths in a more complex solution space were explored which gave rise to different end results.

(24)

Days

30 40 50 60 70 80 90

Total cost

0 1000 2000 3000 4000 5000 6000

Tabu Search

Days

30 40 50 60 70 80 90

Total cost

0 1000 2000 3000 4000 5000 6000

Simulated Annealing

Unavailable timeslots

2 3 4 5 6

Total cost

0 500 1000 1500 2000

2500 Tabu Search

Unavailable timeslots

2 3 4 5 6

Total cost

0 500 1000 1500 2000

2500 Simulated Annealing

Figure 5: Total cost of the constructed timetables when varying the number of days and the number of unavailable timeslots per week for each teacher. The red error bars represent the deviations from each set of resources.

6 Discussion

These sets of data were synthetically crafted and not benchmarking sets. Comparisons between other results of similar setup were therefore hard to make. A big problem in this field is that there is no general definition of the timetabling problem and hence no smooth and solid way of comparing results. Without a common ground, valid comparisons across different experiments cannot be done. This is the main reason why MN was founded and has grown as a community.

But real world problems are seldom similar to each other which makes evaluations of different problems still valuable. Particular problems may also underline the difficulties in comparing and solving different timetabling problems.

(25)

6.1 Control parameters

In this evaluation, efforts in establishing a common ground for better comparison were made.

The initially generated timetables for every execution and the procedure of finding neighboring tables were the same for both algorithms. Therefore the difference in these two approaches of solving for a timetable were mainly their metaheuristics features and the parameters that controlled these procedures. While little attention and efforts in tuning these parameters were spent, they still contributed in their way on outcome of the results.

6.1.1 Temperature

The determination of TSA was posed as a boundary problem. An initial temperature, T₀, was set according to the desired probability of accepting worse solutions in the beginning of the run.

The temperature was defined as to decrease exponentially and reach a value such that the desired probability of accepting a worse solution would be 1 % at the end of the run. Varying the bound- aries or having defined the temperature in another way drastically changed the performance of how SA solved for solutions.

6.1.2 Tabu list

It was observed during the extraction of data that TS did not visit many of the solutions stored in the tabu list. This could be due to the fact that the number of neighborhoods for each solutions was significantly higher than the maximum number of timetables the tabu list could contain.

The metaheuristic guidance for TS was therefore not utilized to its full potential and may be the reason why this approach lacked behind in almost all of the variations that were made.

6.2 Finding neighboring timetables

The operators that were used to generate neighborhood tables may have affected the generation of feasible solutions. They may have been situations where only one exchange of mappings between C3 and E would not have been enough to escape a local minimum in the solution space. Other operators such as multiple remappings in the same table, ordered or randomly, may have been required to overcome local minimums and finding feasible solutions. Both algorithms generated neighbors in the same way and suffered equally but ultimately none of them solved for feasible solutions when increasing the parameters that were varied.

6.3 Cost parameters

The cost parameters played a big role when calculating the initial and final outputs of (2). Only the soft constraints were considered in the these calculations. Since the acceptance function was dependent on the cost difference, ∆C, big differences between the hard and soft constraints was notable in these calculations. Efforts in fine tuning these cost parameters were not considered.

Marginally higher values were instead assigned to the cost parameters of the hard constraints than the soft constraints. This made the output of (2) reconsidering violating hard constraints

(26)

negligibly small. A smaller difference between the cost parameters of the hard and soft constraints would have given SA more opportunities to reconsidering violating hard constraints and in this maybe finding its way to feasible solutions.

7 Conclusion

Comparisons between Tabu Search and Simulated Annealing were made on the problem of curriculum-based course timetabling. A simulated problem was constructed and both algorithms were implemented as fair as possible. Four parameters were varied and the algorithms solved for the best possible solutions under a limited amount of time and computational capacity. While Simulated Annealing showed to performed better in the majority of these runs, the overall difference of the mean performance across all sets of resources that were used were small. A typical difference in the mean total cost of these two algorithms were 72 which in this problem corresponded to around one hard constraint violation. The relative difference between the mean total costs with respect to the slightly better algorithm (SA) were not more than 5 % while corresponding quantity varied a lot between the hard constraints.

Due to time limitations of this evaluation, more data for each set of resources were not extracted. If more data could have been acquired, a proper statistical analysis could have been done which would have given credibility to the conclusions. Despite of this, this paper shows the difficulties in timetabling problems. Much can be learned and used to improve future work of these kinds of problems. And to make the analysis valuable for future work, common and established benchmarking sets of resources should instead be used as an anchor point. This way comparisons between other participants can be made and varying parameters from these sets may better show how it affects the results for different algorithms.

(27)

References

[1] - Burke, Edmund, et al. "Automated university timetabling: The state of the art." The computer journal 40.9 (1997): 565-571.

[2] - Lewis, Rhydian. "A survey of metaheuristic-based techniques for university timetabling problems." OR spectrum 30.1 (2008): 167-190.

[3] - Elmohamed, MA Saleh, Paul Coddington, and Geoffrey Fox. "A comparison of annealing techniques for academic course scheduling." Practice and Theory of Automated Timetabling II.

Springer Berlin Heidelberg, 1998. 92-112.

[4] - Bonutti, Alex, et al. "Benchmarking curriculum-based course timetabling: formulations, data formats, instances, validation, visualization, and results." Annals of Operations Research 194.1 (2012): 59-70.

[5] - Rossi-Doria, Olivia, et al. "A comparison of the performance of different metaheuristics on the timetabling problem." Practice and Theory of Automated Timetabling IV. Springer Berlin Heidelberg, 2003. 329-351. [6] -Lü, Zhipeng, and Jin-Kao Hao. "Adaptive tabu search for course timetabling." European Journal of Operational Research 200.1 (2010): 235-244.

[7] - Helena S. Interviewed by: Daniil B. Royal Institute of Technology, March 6, 2015.

[8] - Blum, Christian, and Andrea Roli. "Metaheuristics in combinatorial optimization: Overview and conceptual comparison." ACM Computing Surveys (CSUR) 35.3 (2003): 268-308.

[9]- Bettinelli, Andrea, et al. "An overview of curriculum-based course timetabling." TOP (2015): 1-37.

[10] - Aladag, Cagdas Hakan, Gulsum Hocaoglu, and Murat Alper Basaran. "The effect of neighborhood structures on tabu search algorithm in solving course timetabling problem." Ex- pert Systems with Applications 36.10 (2009): 12349-12356.

[11] - Van Laarhoven, Peter JM, and Emile HL Aarts. Simulated annealing. Springer Nether- lands, 1987.

A Comparative Evaluation of Metaheuristic Approaches to the Problem of Curriculum-Based Course Timetabling