Task Scheduling in Distributed Systems

(1)

INOM

EXAMENSARBETE DATATEKNIK, AVANCERAD NIVÅ, 30 HP

STOCKHOLM SVERIGE 2016,

Task Scheduling in Distributed Systems

Model and prototype EMIL ÅSTRÖM

KTH

SKOLAN FÖR INFORMATIONS- OCH KOMMUNIKATIONSTEKNIK

(2)

Abstract

Distributed systems is a collection of entities that cooperate to solve a problem that otherwise a single entity would not be able to solve. The use of heavy algorithms has been growing and therefore have distributed computation systems arisen. To make the best use of a distributed computation system task allocation and task scheduling algorithms has been developed. Task allocation is the process of allocating tasks to the best suitable processors of the system while task scheduling is used to determine the execution order of tasks. In some cases task scheduling also incorporate allocation process. A task is the smallest entity that can be scheduled and can be a process or a thread. Tasks are often bunched together as jobs which can include an arbitrary number of tasks. In this thesis has a general model and prototype been created which can be used in a variety of distributed systems and handle a wide range of tasks which includes tasks with execution order and deadlines. The model has been created using the best methods examined during a literature study which include the use of batch sampling, late binding and fair scheduling. A bare bone prototype has been created from the model which includes some core functionality and has been tested in regards to throughput and how balanced load it is. The throughput tests showed that the model does not lose throughput when tasks with execution order is scheduled and the load test showed that the system is fairly balanced. An evaluation of the model has been performed which shows that it works well in distributed systems and can handle a variety of tasks. The prototype has been has been tested in a small test environment which shows that it works well in smaller environments with similar requirements.

(3)

Abstract

Ett distribuerat system är en samling av enheter som arbetar tillsammans för att lösa ett problem som en ensam enhet inte skulle kunna lösa själv. Användningen av tunga algo- ritmer har ökat och därför har distribuerade beräkningssystem skapats. För att ta vara på de resurser som ﬁnns i de distribuerade beräkningssystemen har uppgiftsallokering och schemaläggningsalgoritmer skapats. Uppgiftsallokering försöker allokera uppgifter till de bästa processorerna och schemaläggning bestämmer körordningen på uppgifterna. I vissa fall hanterar schemaläggningen också allokeringen av uppgifter. En uppgift är den minsta enhet som kan schemaläggas och kan vara processer eller trådar. Uppgifter kan läggas ihop till ett så kallat jobb som innehåller en mängd uppgifter. I denna uppsatts har en generell modell och prototyp skapats som kan användas i en mängd olika distribuerade system och kan hantera de olika egenskaper som uppgifter kan innehålla. Dessa egenskaper kan till ex- empel vara körordning eller deadline. Den framtagna modellen har skapats utifrån de bästa granskade metoderna under en literaturstudie. Den framtagna modellen har skapats utifrån de schemaläggnings-metoder som granskats vid edn literaturstudie och visat sig ha de bästa egenskaperna, dvs går att anpassa till en generall lösning, skalbara, felsäkra och snabba.

Några av de använda metoderna innefattar batch sampling, late binding och fair scheduling. Utifrån den skapade modellen har en prototyp implementerats som innehåller de mest väsentliga delarna av modellen. Prototypen har testats genom att mäta genomströmmningen av uppgifter samt hur bra fördelad är uppgifterna mellan arbetarna av systemet. Testet för genomströmmningen visade att när uppgifter med körordning används sänks inte genom- strömmningen av uppgifter och lastbalanseringen visade att arbetarna har ungefär lika stor last genom körningen. En evaluering av modellen har utförts som visa att modellen fungerar bra i distribuerade system samt kan hantera en mängd olika uppgifter. Prototypen har testats i en liten miljö och visats fungera bra för mindre miljöer med liknande behov.

(4)

1 Introduction

General scheduling is a process that can be used in many areas. The process is used to allocate resources to a number of tasks within a time period. The used resources can be anything for example processing units or people [1]. Tasks can be things that need to be performed for example lectures, applications and algorithms [1]. The goal of the scheduling is to optimize this allocation according to some predeﬁned objectives [1]. The objectives of the scheduling process can for example be minimizing the average execution time for sent in tasks. With the great advancements in technology during the last decades more demanding algorithms is computed. In many cases it is not possible to compute these algorithms on a single processor unit and therefore distributed computation systems has been developed. To make the best use of the hardware available in a distributed computation system some sort of task scheduling must be applied.

1.1 Background

Task scheduling and task allocation algorithms are used in computer systems to determine in which order and processor each task is to be executed on [2][3]. Task scheduling determines the execution order and task allocation what processor to use [2][3]. In some cases task scheduling performs both the allocation process and determines the execution order [4][5]. Tasks are the smallest entity that can be scheduled and consists of threads or processes [6].Task scheduling when searching for the optimal solution is a NP-hard process in a heterogeneous distributed system with a discrete number of processors [7]. A problem that the solution only can be found in exponential time is classiﬁed as a NP-Hard problem [8].

NP-Hard is a complexity class which states that it is atleast as hard as NP [9]. NP stands for nondetermenistic polynomial time and problems belonging to this complexity class cannot be solved in polynomial time. [9] To ﬁnd a solution in a feasible time approximate algorithms are often used, some examples are heuristic and genetic algorithms [7]. Heuristic means that solutions are found by using qualiﬁed guesses or trail and error [10]. Task scheduling exists both as static and dynamic processes where static schedulers performs their computation at compilation and dynamic performs the scheduling during run time [4].

The process of scheduling tasks is used in a distributed computation system to utilize the hardware associated with the system. A distributed system is an collection of entities that cooperate to solve a problem. Distributed systems are often geographically separated, does not share a common physical clock and are often used to solve problems that otherwise would be impossible to perform on a single entity [11][12][13]. Distributed computing is a type of distributed system that is used for high-performance computing. This type of distributed

(7)

system can be divided in two sub groups, clustered computing system and grid computing system [14].

1.2 Problem

One of the biggest diﬃculties in task scheduling is the complexity to create good solutions in a feasible time, where optimal solutions can be found in exponential time because of the NP-hard property of the problem [7]. To counter this approximation methods has been developed which require processes that can determine if a calculated solution is a feasible solution. When dynamic scheduling is used little information about the tasks is known in advance which makes it hard to schedule them in a good way [4]. Some tasks can also have constraints attached to them [15] which makes the scheduling harder. Some of the problems with scheduling is often linked with the problems of distributed computation systems. There exists many problems when performing computation in a distributed environment where tasks are computed in a parallel setting [11]. One of the core problems with distributed computations is to propagate information in the system where speed and the magnitude of communication play a crucial part [16]. Knowledge of the system is another big problem, does the nodes have a complete knowledge, partial or no knowledge [16]. Another problem is to eliminate the possibility of processors performing the same task, which could slow down the process or even result in a faulty result of the algorithm [17]. When these algorithms are computed in a asynchronous environment the presence of a coordinator is important [17]. Distributed computation systems are a sub part of distributed system [14]. Distributed systems can exist in many forms and can have diﬀerent architectural structures [18].

How can a general scheduler be created that can handle a variety of tasks for jobs consisting of multiple tasks?

1.3 Purpose

The purpose of this degree project is to propose a model and a prototype of a general task scheduler. The general task scheduler should be able to be used in a variety of distributed systems and eﬃciently schedules tasks in distributed systems. The model and prototype will be base upon a combination of the best methods examined during a literature study.

(8)

1.4 Goal

The goal is to create a general task scheduling model and prototype. The model and prototype should be able to handle diﬀerent tasks that can have both constraints and no constraints attached to them. The model and prototype should also be able to handle diﬀerent architectures of distributed system such as server-client and peer to peer. The model should also be resilient to failures. Resilience to failures includes crashes.

1.4.1 Benefits, Ethics and Sustainability

The outcome of this degree project will be a new scheduling model for distributed computation system and a constructed using the best alternatives from an literature study of existing schedulers. This will be beneﬁcial to both researcher and developers that are interested in this area because a new model that can be used in many diﬀerent kinds of distributed environments will be created. The role of ethics in software development has gained a greater importance in the last few years because of the increasing use of computer software [19].

Ethical aspects of software development can for example be privacy, accuracy, property, ac- cessibility, use of knowledge and quality of service [19]. In ethics the terms good, bad, wright, wrong is studied in the application of norms and rules [19]. This project will not have a big ethical impact because it does not directly handle critical information, no ones safety is in risk the only ethical aspect is that it needs to withhold a good quality of service.

Sustainable development is development that can meet the needs of the present generation without compromising the needs of the future generations [20]. It is a process that values both the present and future generations in regard to sound environment, just society and a healthy economy [20]. Since this master thesis is focused on creating a model and prototype with high trough put it will be eﬃcient in both economical and environmental aspects because, if a high trough put is achieved more jobs can be processed per second which will lead to less power consumption and that in turn leads to less use of money. The project will not have a big impact on society.

1.5 Methodology / Methods

To assure the quality of a degree project it is important to follow methods and methodologies. They help to plan and steer the project in the right direction [21]. Quantitative and qualitative are the two main categories of the research methods [21].

• Quantitative research is oriented towards experiments and tests to investigate theories and hypotheses. It requires a large amount of data and statistics to draw conclusions

(9)

from these investigations [21].

• Qualitative research is oriented towards understanding the meaning, opinions and be- haviors [21]. This type of research is often used to come up with a tentative hypothesis or develop applications and systems [21]. In contrary to the quantitative research this method uses smaller data sets to draw conclusions [21].

This degree project will use the qualitative research method because others work and research will be used to develop the prototype [21]. By going through already done work opinions on what works and what does not will inﬂuence the prototype greatly [21].

1.5.1 Philosophical assumptions

In the beginning of a degree project a philosophical assumption must be taken [21]. It hold the view and stand point for the project and describes how collected data should be viewed [21]. There exists a variety of core assumptions that should be considered [21].

• Positivism is an objective assumption that the reality cannot be inﬂuenced by the researchers [21]. Projects that have a experimental approach should consider this view [21].

• Realism is an realistic assumption where data is collected throughout observation of a phenomena [21]. The data is then interpreted to develop understanding of the phenomena [21].

• Interpretivism is used to understand the meaning of a phenomena in a subjective way where opinions and perspectives have a meaning [21]. This assumption works well when developing a computer system [21].

• Criticalism assumes that reality is a constituted, produced and reproduced by people [21]. Critiamlism can be used to for example ﬁnd out the social, historical and cultural eﬀects on using a computer system [21].

Interpretivism will be used during this degree project as the philosophical assumption because in the end a prototype will be developed based on a literature study where papers about models will be examined. In these papers opinions and perspectives have a big meaning if a model works or not.

1.5.2 Research Methods

This section describes the research methods that can be used as an framework during the research [21]. The following methods are the most common research methods [21].

(10)

• Experimental research is often used to perform investigations on a systems performance [21]. The relationship between variables and the cause of that relationship is investigated [21]. In other words, it is a study to ﬁnd the cause and eﬀect [21].

• Non-experimental research is used for investigating the subjective thoughts on a system from an users perspective [21]. Using this method behavior can be described and predicted [21].

• Descriptive research is used to study situations and phenomena and to describe its characteristics without the causes or occurrences [21]. Observation, surveys and case studies are often used to ﬁnd out the characteristics of a situation or phenomena [21].

• Analytical research uses already existing data to draw conclusions if a hypothesis is true or false [21]. It can be used in product development because of the ability to ﬁnd what works and does not work [21].

• Fundamental research is a research method that focus on learning new information out of curiosity by observing a phenomena [21]. In all ﬁelds where new theories, principles and innovations are developed this research method can be used [21].

• Applied research focus on solving known practical problems or answering speciﬁc questions by using preexisting data from research [21]. The practical problems can for example be practical applications and inventions [21].

• Conceptual Research method interprets used concepts by doing a literature study and analyzing it [21]. Commonly used for developing new concepts or interpreting existing concepts [21].

• Empirical research uses experiments, observation or experiences focusing on people and situations to gain new knowledge [21].

The research method to be used during this degree project is fundamental research because current will be examined and reviewed solutions. The information gathered will then be used to create a new model for a scheduling system.

1.5.3 Research Approaches

Research approaches is used to derive what is true or false[21]. There are three research approaches, inductive, deductive and abductive [21].

• Inductive approach uses collected and analyzed data to create theories, propositions and in some cases to develop artifacts [21]. This approach is commonly used with

(11)

the qualitative methods because of the outcome is based on behavior, opinions and experiences [21].

• Deductive approach uses big data sets to test theories if they are true or false [21]. The data is collected by using a quantitative method because of the need of big data sets [21]. The hypothesis must use a form that is measurable and explains how everything should be measured [21].

• Abductive approach is an mix between the inductive and deductive approach [21].

The inductive research approach will be used during the degree project because a artifact in the form of a prototype will be created.

1.5.4 Literature study

To perform a qualitative investigation information needs to be gathered. To gather this data an literature study will be performed where the main source of information will come from the internet. Internet will be the main source of information because almost everything can be found through it and it will contain new information that have not been printed to books and papers. The information that will be considered needs to be related to the process of scheduling or distributed systems. Information that is gathered will be reviewed after the purpose and goal of the project to create a general model and prototype. From the purpose and the goal four requirements will be followed, general solution, reliable, scalable and time eﬃcient.

1.5.5 Model requirements

This section will present the requirements of the model which the model will be created after:

• General solution, the model needs to be able to be used in many types of distributed systems and can schedule a wide range of tasks.

• Reliable to be able to withstand crashes. Without this requirement the scheduler could lose valuable data and in worst case go down.

• Scalable to be able to handle an arbitrary number of workers in the system. This is because the scale and structure of diﬀerent distributed systems can vary in size and layout, without a scalable model the scheduler might not be able to carry out the scheduling in a eﬃcient manner.

(12)

• Time eﬃcient, which in this case means that the average turn around time for per job needs to be minimized.

1.6 Delimitations

This report will only cover the scheduling aspect and therefore message ordering of results will not be included. The prototype that is based upon the model will not cover fault tolerance even though the model will include this. The model will include explanations on how multiple tasks and tasks with diﬀerent constraints, however the prototype will only cover uses multiple tasks and tasks with and without execution order. The prototype will be tested in a small environment using a server-client architecture and the model will incorporate all architectures.

1.7 Outline

In chapter 2 the background knowledge, theories and related work be introduced and explained. This information will later be used to to propose an general task scheduler model and prototype. Chapter 3 introduces the research methods that will be used during the project, the chapter also explains the diﬀerent software development and which of them that is used. In chapter 4 the proposed model is presented and explained. Chapter 5 contains the built prototype which is based upon the proposed model. Chapter 6 presents the results of the project which includes both and evaluation of the model and prototype. Tests performed on the prototype is also shown which includes tests in regards to throughput and load bal- ance. Chapter 7 presents discussions, evaluation and future work related to the model and prototype.

(13)

2 Scheduling in Distributed Systems

Task scheduling is the process of determining the order of which tasks will be performed.

Distributed systems utilized task schedulers to make sure that the hardware is used to its maximum, in other words creating a schedule that will leave no processor idle while there are tasks to be performed. This section will explain the information needed to understand the model that will be proposed. The ﬁrst thing that will be explained is task scheduling, in this report task scheduling takes care of both the scheduling and allocation aspect, and the related topics like tasks and policies. Thereafter will distributed systems be explained and how they utilizes task schedulers. Lastly will related works be presented and how and explanation on how they can be used will be presented.

2.1 Task Scheduling

In a distributed computation system the main purpose for task allocation and scheduling algorithms is to utilize the maximum performance of the system [2][22][5]. Task allocation is about assigning each task to the best suitable processor or worker [2][3] while task scheduling algorithms determines in which order each task should be executed on the processors and workers [3], creating a so called schedule. In other words an schedule is tasks allocated to processors in the sens of time [6]. In some cases the task scheduler takes care of both allocating tasks to processors as well as determining the execution order [4][5]. Scheduling can be performed on both a global and local scale. Global scheduling refers to the process of allocating tasks to processors the processors of the system. Global scheduling is performed before the local scheduling and one of the main purposes for the global scheduling is to share the load evenly among the processors of the system. The purpose of local scheduling is to determine which of the available tasks to run next on the assigned processor [23].

2.1.1 Tasks

A process or a thread can be called a task, which is the smallest entity that can be scheduled [6]. A collection of tasks that are sent in to schedulers is called a job. A job can contain an arbitrary number of tasks. Both tasks and jobs can have constraints attached to them, the most common constrains are [15]

• Per-job constraint - Is a constraint that is attached to the whole job, for example that all the tasks within the job needs to be executed on a machine with a GPU [22].

(14)

• Per-task constraint - Is a constraint that is associated with a task, for example a speciﬁc task needs to be exexuted on a machine with input data [22].

• Timing constraints - The most usual timing constraint is a tasks deadline, which is the time tasks should have been executed before [15].

• Precedence constraints - Tasks that have precedence constraints are tasks that cannot be executed in an arbitrary order [15]. This means that each task can only be executed in a predeﬁned order. The order of the task execution is often represented using a directed acyclic graph, in short DAG, also called task graph [6].

• Resource constraints - Is a constraint on which a task can only be executed on a speciﬁc type of resource, like a worker with a GPU [15].

2.1.2 Scheduling classifications

Task scheduling can be performed both as a centralized or a decentralized unit. Because a centralized scheduler contains global knowledge of all the participating workers it is easier to implement however it lacks the scalability and fault tolerance a decentralized scheduler would have [24]. This is because of the single machine that is scheduling which also is a single point of failure. Because of the single point of failure are centralized scheduler not optimal when working with a big distributed system where a decentralized scheduler would work much better [24].

Task scheduling can be divided into two distinct parts; static scheduling and dynamic scheduling [4]. Static scheduling is scheduling performed at compile time and dynamic scheduling is performed during run time [4]. Static scheduling assumes that there is an ﬁnite number of tasks that are going to be scheduled and dynamic scheduling assumes that there is an con- tinuous ﬂow of incoming tasks [4]. When static scheduling is used all information about the tasks need to be known in advance. This information is for example the size, dependencies and demand of the task. Dynamic scheduling on the contrary does only know very little or no information about the tasks in advance. The dynamic scheduling is because of this more complex to create but achieves a better throughput than the static scheduling algorithm [4].

Preemptive and non-preemptive is a classiﬁcation for task scheduling algorithms which determines if a task can be paused and then restarted on a later stage [15]. In preemptive algorithms any running task can be interrupted to make another task execute on the processing unit [15]. The preemption is determined by a predeﬁned policy [15]. Non-preemptive is the exact opposite of preemptive algorithms, once a tasks has begun to execute it will

(15)

continue until it is done [15]. A drawback when using preemptive is that it requires more resources to maintain its state [24].

2.1.3 Scheduling schemes and policies

A scheme is a plan or some form of structured action that is performed. In this section two commonly used scheduling schemes will be explained. FIFO and earliest deadline first are commonly used schemes in scheduling. FIFO, first in first out, where the first task to arrive is the first task to be executed on the machine [6]. A big drawback using FIFO is that the processing units will have a lot of idle time which is a big waste of resources [6].

Earliest deadline ﬁrst (EDF) is another commonly used scheduling scheme. It is a priority driven algorithm where every task in the system have a priority [25]. The priority is derived from how close a task is to its deadline [25]. The tasks with the highest priority will be execute ﬁrst [15]. It has been shown that when EDF is used in a uni processor system in combination with preemption it is optimal [25].

Scheduling policies determine how the resources of the system is utilized and allocated [25].

2.1.4 Scheduling approaches

There are diﬀerent outcomes of the scheduling process, the solution can be optimal or an approximation. To classify a resulting schedule as optimal it need to meet all the speciﬁed requirements of the scheduling and the smallest value of a criterion function [26]. Find- ing an optimal solution to a scheduling problem has been shown to be NP-hard therefore approximation methods are a more promising solution [7].

P is a class of problems that can be solved in polynomial time [9]. NP stands for nondeter- ministic polynomial time are problems that cannot be solved in polynomial time and instead in exponential time [9]. It has not been shown that P = NP however the contrary has also not been proved [9]. If a problem is classiﬁed as NP-hard it is at least as hard a NP [27]. Any algorithm that can solve a NP-hard problem can be translated into solving a NP problem [27].

Another feasible way to calculate the schedule is to use an heuristic algorithm. The word heuristic comes from the Greek language and means “to ﬁnd out, discover” [28]. Heuristics are more or less the work of qualiﬁed guesswork or trail and error [10]. Heuristic algorithms are guided by a so called heuristic function. This function is used by heuristic algorithms to guide the algorithm towards the goal [15], it more or less tells the algorithm how close the

(16)

goal it is [29]. The solutions of the heuristic algorithms works towards an optimal schedule but cannot guarantee it is optimal because the algorithm does not explore all solution, the algorithm just takes a feasible solution if any can be found[15]. This type of algorithms are often used in time critical systems where optimal solutions cannot be found within a feasible time because of the exponential time it takes to find the solution [2]. Heuristic algorithms usually belongs to the list scheduling class which is divided into two phases [30]. In the first phase a priority is added to each task and the tasks is added to a waiting list in the order of their priorities [30]. In the next phase the tasks are assigned to the first processor that becomes available according to their order in the list [30]. This class of heuristic algorithms are often efficient and practical because it has a narrow space of solutions [30]. A drawback of this is that the performance of the algorhtims is heavily dependent on the heuristic function [30]. Finding a optimal solution in scheduling and allocation problems are considered to be NP-hard and therefore can the optimal solution only be found in exponential time. [2].

Two subcategories of heuristic algorithms are greedy and meta-heuristic algorithms [2].

Greedy heuristics are algorithms that create solutions from scratch and work towards fast progression and chooses options that are local optimums and hopes it will give and global optimum [31][2], which in this case is a optimal schedule. A local optimum is a point which in its closest vicinity is the best point however there exists a global optimum that is better [32]. There can exist many local optimums but there can only be one global optimum [32].

Greedy heuristics type of algorithm can in some cases ﬁnd the optimal, global optimum, solution however the quality of the solution can vary [31][2]. Meta-heuristics are a general class of algorithms and techniques which searches for a optimal or as optimal solution as possible [33]. When a meta-heuristic algorithm is used it either starts from a blank state, if no previous solutions have been found, or uses a already calculated solution to improve it [2]. This means that the meta-heuristic improves its solutions over time [2]. To ﬁnd a solution these meta-heuristic algorithms uses some degree of randomness [33]. A common use of heuristic and meta-heuristic types of algorithm is in evolutionary algorithms [2].

Evolutionary algorithms are heuristic and meta heuristic based algorithms that tries to mimic the natural concept of evolution [34]. Becuase of the property of evolution and improvement evolutionary algorithms are used when there are possibilities for improvement [34]. Evolu- tionary algorithms are used in scheduling to fast explore and ﬁnd possible schedules [35]. One type evolutionary algorithm called genetic algorithm (GA) [36] consistently creates more ef- ﬁcient solutions than any other type of evolutionary algorithm [35]. GA is method that uses parts of the theory of natural selection such as inheritance, mutation, selection and crossover [36]. A population of nodes with random characteristics are created as the initial population, where each node is evaluated and the most successful node are merged together and create a new node, child [36]. The child have a combination of the parent nodes characteristics [36].

(17)

Some of the advantages of using a GA is that they are easy to implement and adapt, creates good solutions and can with ease handle constrains [36]. The downside to evolutionary algorithms is high computational cost of the search because of the large solution space that is being covered [37].

2.2 Distributed systems

Distributed systems is a collection of autonomous processors/entities that cooperate to solve a problem that otherwise a single entity would not be able to solve[11][12]. The entities communicate over a network and does not have a common physical clock or shared mem- ory and are often geographically separated and does not have the same speciﬁcations[13].

Distributed systems exists in many forms, for example computing systems and wide-area networks [11][12]. Distributed systems can be structured in diﬀerent ways, client-server and peer to peer. Client-server is the most common architecture used in distributed systems [18].

The users of the system, clients, communicates with service providers called servers, a server can also be a client to another server [18]. In peer to peer all nodes, so called peers, of the system have similar roles. A peer can be both servers and clients at the same time [18].

The client-server model scales poorly because the service is located on one computer [18].

When the requests to the server is more than the computer and bandwidth can handle it will become a bottle neck [18]. In peer to peer all the participating peers of the system run the same service and have the same interface to each other [18]. Because of this the system can share the load between the peers which enables a much better scalability [38]. Peer to peer also lack the property of single point of failure which a client server model has.

Since the advancements of the Internet the need of distributed computations for applications has been growing [13]. These distributed computations of algorithms and applications is important because they can solve problems that cannot be solved by one machine in a reasonable time[12]. The ﬁeld of distributed computations cover everything between computation and information access over a network[13]. Distributed computing is a type of distributed system that is used for high-performance computing. This type of distributed system can be divided in two sub groups, clustered computing system and grid computing systems[14].

Cluster computing system is a distributed computation system which consists of worker nodes that the same type of hardware and are running the same type of operating system[14]. In other words they are homogeneous system[14]. This type of system is usually used to run parallel applications on all the systems that are included in the cluster to speed up the run time[14]. A cluster consists of nodes that is governed by a master node which creates an interface for the user, handles the scheduling and job assignments[14]. Each node in the

(18)

system are tightly coupled in the network and usually lies in the same sub net or domain [39]. Cluster computing is most beneﬁcial on tasks that have precedence constraints where the tasks might depend on the previous tasks result because it has low latency in the network [39].

Grid computing systems on the opposite of a clustered computing system does not have common hardware or operating systems on the connected entities[14]. The grid computation system is a heterogeneous computation system that tries to incorporate the mixture of hardware, software and operating systems each worker in the system is using[14]. Grid computation systems is dynamic computation systems that can handle the fact that nodes come and leave. Because of this the grid system is very scalable [39]. The nodes of the grid can be distributed over many domains, sub nets and often use the existing network [39]. As a result of this grids are usually more suitable for tasks that does not require so much coordination, in other words tasks that can be executed independent of other tasks [39].

One of the big strives to use distributed computing is to speed up extensive algorithms and computations. In theory the total amount of time it takes for an computation on a single processor can be divided by the total amount of processors it can be divided between[40]. To make the best use of a distributed system no processor should be idle while other processors are overloaded, this can be solved by using a task scheduler. When jobs are submitted to a distributed system the schedulers job is to decide which of the workers in the system to execute the submitted job[5]. Scheduling in distributed system usually focuses on the global scheduling aspect and relies on the architecture of the underlying system to perform the local scheduling[23].

2.2.1 Fault tolerance

As explained in the previous chapter distributed systems is an collection of autonomous processors and entities that cooperate to solve problems [41]. These systems have an huge number of hardware components and software running at the same time [41]. When an multitude of hardware and software are used there are an big risk of crashes and other failures [41]. Failures and crashes can in some systems lead to catastrophic results and therefore must crashes and failures be handled [41]. Some systems have been developed to be fault tolerant which means that the system can either can take care of the failures or that they are masked [41]. Fault tolerant system have become important when creating computer systems to maintain the correctness of the produced results [42]. A fault is a defect in either the hardware or the software of the computer system which may create errors and other faults [42]. Faults often lead to failures which is a deviation from the speciﬁcation of the

(19)

system [42]. Fault tolerance is what enables a system to function normally even when faults and failures occurs [42]. A main key to achieving a fault tolerant system is redundancy [42].

A common use of redundancy in computer system is to mirror the operations performed on other systems which means that if a failure occurs on one of the systems the other systems can take over [42]. Quality of service or so called dependability is a measure of reliability and availability [42]. Reliability is the probability in time that the system will be free of failures, in other words operate correctly [42]. Availability is the how long will the system be available for operation [42]. As explained earlier redundancy is the key technique of fault tolerance [42]. Redundancy can be performed in many ways for example information and software redundancy [42]. Information redundancy is simply using redundant information that is sent to other systems to check faults and create a backup to create a more fault tolerant system [42]. Software redundancy can be everything from exact replications of the software to some small programs to check for faults and errors [42]. Exact replication of the system can be used for error recovery and complete redundancy. Error recovery will try and create safe point that the failed system can restore to [42].

2.3 Related work

In this section the related work of this project will be explained in short to get a understanding of how they work and what their functionality are. The related work that will be explained are Sparrow [22], Multiple priority queues genetic algorithm (MPQGA) [30] and some schedulers used in Hadoop. The first related work is Sparrow, it is a greedy and centralized scheduler that utilized the power of random choice to schedule tasks to workers [22]. MPQGA is a scheduling algorithm that combines a heuristic approach with a genetic algorithm [30]. Lastly Hadoops schedulers; FIFO scheduler, Fair scheduler and Capacity scheduler. FIFO works just as it sounds first in first out [43], fair scheduler tries to give each job the same amount of execution time [44] and the capacity scheduler which gives an fixed amount of capacity per user of the system [45]. The fair scheduler tries to give all jobs an equal amount of execution time [44]. Capacity scheduler give all users the same amount of resources [45].

2.3.1 Sparrow

Sparrow is a decentralized, randomized sampling task scheduler that provides a near optimal solution[22]. In contrary to a traditional task schedulers Sparrow does not keep any state of the worker load and uses many schedulers that work in parallel[22]. Any task can be scheduled by any scheduler in the system[22]. The workers in the system executes the assigned tasks and if there are more tasks assigned to the worker than it can handle simultaneously it places

(20)

the tasks in a queue[22]. Sparrow uses a combination of batch sampling and late binding to perform its scheduling[22]. Batch sampling is a improved version of the power of two choices technique[22]. In the power of two choices technique two workers are probed for each task and the task is en queued on the worker with the smallest queue[22]. In batch sampling the the probe information for all the tasks containing to the same job is saved and then the tasks are en queued on the workers with the shortest queue[22]. Two problems when using this technique is that queue length is a bad estimation of execution time and there can exist race conditions when there are multiple schedulers working in parallel[22]. To solve this the creators of Sparrow used a method called late binding where the workers instead of answering the probe right away a reservation is placed in the queue[22]. When the reservation reaches the front of the queue the probe is answered and the workers gets the job, if there are any left[22]. When all the tasks that a job contains is launched the scheduler proactivly sends out a cancellation message to the remaining workers[22]. One downside to late binding is that while the worker is answering the probe it will be idle[22]. Sparrow only handles per-job constraints, for example all tasks should run on workers with GPU, and per-task constraints.

For example some tasks need to run on a machine with input data[22]. Sparrow can use multiple queues to handle diﬀerent resource allocation policies[22].

Sparrow will be considered when creating the model because of its simplicity and and scalability. The drawbacks of Sparrow is that it does not handle crashes of the schedulers or the workers. If a scheduler crashes all the tasks associated with it will be lost. If to many of the schedulers crash it might overload the remaining schedulers and they might also crash or perform sub optimal. It does not handle precedence constraints which makes Sparrow only work in tasks that lack this property which might limit its usefulness.

2.3.2 MPQGA

The authors of this paper has proposed a task scheduling scheme on heterogeneous computing systems using a multiple priority queues genetic algorithm (MPQGA) [30]. MPQGA is an algorithm that combines a genetic algorithm (GA) and a heuristic based earliest finish time first, EFT, algorithm [30]. MPQGA tries to exploits the advantages with both GA and heuristic approaches while avoiding their drawbacks[30]. A big advantage when using heuristic based altoghtms is that i can find near optimal solution in a polynomial time [30].

The GA part of the algorithm assign the tasks execution priorities and the heuristics is used to search for solutions on mapping the tasks to processors [30]. The combination between the two approaches, GA and heuristics, makes a trade of between make span and convergence speed [30]. The algorithm uses crossover, mutation, and ﬁtness function that is speciﬁcally designed to be used in scheduling with directed cyclic graphs [30]. The crossover and mutation

(21)

operators are developed to take the precedence relationship of the tasks in consideration when generating new oﬀspring to make sure that the oﬀspring is valid [30].

2.3.3 Apache Hadoop Schedulers

Apache Hadoop is a framework used for distributed computing. The framework is designed to distribute the processing of data sets to multiple distributed processing units that is connected in a cluster[46]. For scheduling Hadoop uses a framework called Yarn. In Yarn the scheduler can be swapped with other schedulers [47] these are some examples of existing schedulers that can be used in Haddop:

• FIFO scheduler - This is the original scheduling scheme where the oldest task will be executed ﬁrst [43]. This is a simple approach with no concept of priority or task size [43].

• Fair scheduler - This scheduler tries to give every job an equal amount of time of execution [44]. This leads to that smaller tasks can be executed intermixed with large tasks which gives a greater responsiveness to the Hadoop cluster [44]. It is implemented by using pools that the users of the system is assigned to [44]. Initially all pools have the same share of the resources, this can be changes if needed [44]. The fairness of the system will be ensured by that each user is assigned to a pool regardless of the number of submitted jobs which will give all the users the same priority [44]. The scheduler checks the run time diﬀerence between of each job and the job which the highest deﬁcit will be the next job to get CPU time [44].

• Capacity scheduler - This scheduler is designed for large clusters where the exes capacity of the clusters can be shared to other users of the system [45]. Instead of using pools like the fair scheduler the capacity scheduler uses queues, where each queue have a minimum capacity [45]. When a queue does not use all of its assigned capacity it can temporarily be given to another queue [45]. The capacity scheduler can also handle priorities of the jobs where a job with higher priority will get to the processing unit before the jobs with lower priority [45].

2.3.4 Uses of Theory and Related Work

As stated in the introduction of the thesis the scheduler suppose to be a general solution. In chapter 2.1.1 six diﬀerent scheduling classiﬁcations was stated. They where: centralized, decentralized, static, dynamic scheduling, Preemptive and non-preemptive. The decentralized approach will be used because it ensure that the scheduler can be deployed in distributed

(22)

systems that have different architectural layouts such as client server and peer to peer. This is more beneficial than the centralized approach because there is no need for full knowledge of the system and multiple schedulers can be used at the same time. The dynamic approach will be used because it enables the scheduling to be performed during run time which will be much more beneficial than static scheduling where the scheduler only performed scheduling during compilation. Both non-preemptive and preemptive are good approaches however non-preemptive scheduling will be used because it require less resources.

Two scheduling schemes were explained in section 2.1.3 which were FIFO and EDF. Both FIFO and EDF will be used because they give diﬀerent possibilities of what kind of tasks that can be scheduled. EDF for example enables scheduling of tasks with timing constraints because it will schedule tasks in regards to the deadline.

All of the stated approximation approaches in section 2.1.4 are feasible approaches to use.

However the algorithms that ﬁnd the optimal solution are not feasible in this project because ﬁnding the optimal solution is shown to be NP-hard and will perform badly. The approximation algorithms are more feasible because they are much faster and usually creates good solutions.

From the Sparrow scheduler batch sampling in combination late binding and queues will be used because batch sampling does not need any knowledge of the system except from which workers are connected to the schedulers. Late binding in combination with batch sampling makes sure that the fastest worker will execute tasks ﬁrst which will be good because it will speed up the process. Queues is chosen because batch sampling and late binding was chosen.

The method used in MPQGA is a really interesting approach to the scheduling problem, however this is hard to ﬁt into this project because of its complexity when combining the heuristic algorithm with a genetic algorithm.

Two of the Apache Hadoop schedulers will be used which are FIFO and fair scheduling.

FIFO is a simple solution that will work well when using it in combination with another method such as fair scheduling. Fair scheduling will also be used because it is a interesting method that gives each job equal amount of execution time. This could for example be used to contract starvation in the system.

(23)

3 Methodologies and Methods

This section will show the remaining methods that can be used during this thesis.

3.1 Research strategies/ Design

The research strategies are guidelines that describes the steps to conduct the research. These steps include organizing, planning, designing and conduction the research [21].

• Experimental research is used to check the correctness of a hypothesis and to ﬁnd the relationship between variables. This strategy is used in experiments with huge data sets [21]. It controls all the variables that can inﬂuence the outcome of the experiment and does therefore require an huge amount of data [21].

• Ex post facto research is similar to experimental research but instead of trying to ﬁnd the relationship between variables on already collected data [21].

• Surveys describes phenomena that are not directly observed [21].

• Case study examines phenomena in a real life context through a empirical study with multiple sources the explain the examined phenomena [21]. This type can be based on both a qualitative and quantitative study [21].

• Action research is used to improve how people addresses and solves problems. This method includes cyclic steps on planning, taking actions, observing, evaluating and reﬂection [21].

• Exploratory research tries to find as many relationships between variables as possible and uses surveys to understand the problem [21]. Qualitative data collection is used identifies key issues and variables but usually no definite answer is found [21].

• Grounded theory is a method that collects and analyses data [21]. This data is used to develop theory [21].

• Ethnography is a study that seeks to place a phenomena in a social and cultural perspective through investigating culture and people [21].

Grounded theory will the research strategy to be used during this master thesis because to develop a new model data a solid foundation of good information is needed. Therefore will data be gathered and analyzed throughout this degree project.

(24)

3.2 Data collection

To create an feasible solution data needs to be collected.The following methods describes how to collect data.

• Experiments is used to collect huge huge data sets for variables [21].

• Questionnaire uses questions to collect data [21].

• Case Study is used within the case study research method and performs a in depth investigation on a small sett of participants [21].

• Observations is used to observe the behavior of a situation and culture [21].

• Interviews is used to capture a participants point of view to understand their opinion [21]. This is usually done in interviews where questions either can follow a predeﬁned structure or they can be open ended [21].

• Language and Text is used to interpret the content of conversations, texts and docu- ments [21].

Because the main source of information/data will come from an literature study text inter- pretation will be used to create a good knowledge base. From this knowledge a model for a task scheduler will be created using the best examined methods from the literature study. All gathered information will be reviewed after some requirements to make sure that the chosen methods are the best. A prototype based upon the model will be created which will be tested by using experiments. Experiments belongs to the quantitative research and is fitting in to test the prototype because it is easier to see in numbers and graphs that something that can be tested works. The tests of the will be done to find out its effectiveness when compared to other schedulers.

3.3 Data analysis

These methods are used to analyze the gathered data [21]. Statistics and computational mathematics are the most common quantitative data analysis methods [21]. Coding, analytic induction, grounded theory and narrative analysis are the most common qualitative data analysis methods [21].

• Statistics is used to analyze data and doing a evaluation of the results to ﬁnd the signiﬁcance [21].

• Computational Mathematics is used for calculating methods, modeling and simulation [21]. This is done to analyze algorithms, numerical and symbolic methods [21].

(25)

• Coding transforms qualitative data into quantitative data by analyzing interviews and observations [21].

• Analytic Induction and Grounded Theory are iterative methods that will continue until a valid theory can be established [21]. The iterations consists of alternating between collection and analysis [21].

• Narrative Analysis is used in discussions and analysis of text [21].

The Data analysis in this degree project will follow the Analytic induction and grounded theory approach because information will be gathered and then analyzed to see if it can be used in the model. This will be performed iteratively until a solid model has been established.

3.4 Quality assurance

This section explains quality assurance of both quantitative and qualitative research. Valid- ity, reliability, replicability and ethics belong to quantitative research [21]. Validity, dependability, conﬁrmability, transferability and ethics belong to qualitative research [21].

• Validity is in an quantitative research to make sure that used instruments in the tests are actually measuring right things [21]. In a qualitative research validity is how trust- worthy the conducted research is [21].

• Reliability is the consistency of the measured results, how stable is the measurements [21].

• Replicability is it possible for other researcher to replicate the results of the research [21].

• Ethics is the morals of all the stages in the research, this applies to both quantitative and qualitative research [21].

• Dependability is to check if the drawn conclusions are correct by using auditing [21].

• Conﬁrmability the research have been performed without letting personal interests to change the results of the research [21].

• Transferability is to create good documentation such that other researchers can use it [21].

This paper will follow a qualitative approach and therefore will the quantitative quality assurance methods be used. All instruments used during the tests will be validated and to ensure replicatbility all tests performed will be documented in high detail. All experiments performed will include the deviation to show how accurate the results are. The results of the

(26)

tests will be used without any tampering to ensure that the thesis projects follows a ethical and moral path.

3.5 Software development methods

In this section the software development methods/models Waterfall model, spiral model Scrum and Kanban will be explained and there after a choice of software development model will be done according to what will ﬁt this project best.

3.5.1 Waterfall model

The waterfall model is a software development framework which consists of phases that are sequentially performed where the next phase only can be started once its predecessor is completed [48]. The project is started with a requirement phase where requirements are gathered and documented [48]. These requirements are then discussed with the stakeholder which are the people and groups that have an interest in the project or its outcome [48]. The next phase is the design phase where a detailed design of the complete system is created. When the design phase is completed the implementation phase begins [48]. Here will the developers translate and implements the created design into the modeled components and systems which will build up towards the ﬁnal product [48]. When the implementation phase is done the testing phase will begin [48]. Here are the created components and systems integrated into and system [48]. In some cases the testing phase is split into a integration phase and a testing phase [48]. During the testing phase the team will work towards identifying bugs in the software to correct them before the release [48]. After the release of the product a support phase is started [48].

3.5.2 Spiral model

The spiral model is a refinement of the waterfall model and uses a iterative approach in the form of a spiral [49]. The spiral consists of cycles which are iterated until the project is done [49]. The first step in the cycle is to identify the objectives of the product, implementation alternatives and constraints for the alternatives [49]. In the next step risk is analyses for the alternatives from the previous step [49]. The analyzed risk in this step will influence how the next step will be performed [49]. During this step the construction of the product will begin and the approach on how to construct it depends on the risks analyzed during the previous step [49]. The alternative with the least risk is chosen [49]. In the last step the stakeholders of the product will review the cycle [49].

(27)

3.5.3 Scrum

Scrum is an agile software development framework that uses small development teams and sprints [50]. The teams or so called scrum teams consists of product owner, development team and a scrum master [50]. The product owner is the one to create visions and priorities, the development team implements the product and the scrum master ensures that the project works as smooth as possible [50]. The team is a self organized and cross functional which means that the team can rely on them self instead of needing resources and decisions from outside the team [50]. The job to be performed is split into a list containing small concrete results that should be achieved [50]. The list is sorted after priority and each entry in the list have a estimated resource requirement [50]. The total available time is dived into so called sprints which usually is about 1-4 weeks [50]. Before each sprint starts a plan is created which contains what tasks to be developed [50]. The tasks are chosen depending on the product owners priorities and the capacity of the development team [50]. When a sprint is done ruFnnable code should be presented which in the long run builds up towards the ﬁnal product [50]. During the whole development tight cooperation with the customer is withhold to ensure that the process is following their needs [50].

3.5.4 Kanban

Kanban is a software development framework that uses a visual control mechanism follow the work during each step of the process [50]. A board and paper notes or a electronic note system is commonly used to create the visual control of the work ﬂow [50]. The idea of Kanban is the minimize the amount of tasks that is concurrently being worked on and new tasks can only be started once the old task is done [50]. When stating a new project the work is divided into small tasks that is written down on a note and added to the board, the size of the tasks should be approximately the same so the work ﬂow will not take a stall on some tasks [50]. The maximum of concurrent tasks that can be worked on is also decided in the beginning of the project [50]. The board is divided into sections that can contain for example to do, in process and done [50]. In the beginning all the tasks will be located in the to do section and then moved to the in process section [50]. When a task is done it will be moved to the done section [50]. A small drawback of Kanban is that it is limiting the amount of concurrent tasks, this is however also a positive aspect because of the same reason it is a drawback.

(28)

3.5.5 Choice of software development model

The software development model that will be used during this project is Kanban. Kanban will be used because of the visual feedback, prioritizing of tasks and the possibility to limit the number of concurrent tasks that is active during the development and testing. The visual feedback will give a nice estimation of how much work that needs to be done and what tasks that might need to be cut to be able to create a working prototype. Because the tasks are priorities after importance the tasks with the lowest priority will not impact the project if they are not implemented, these tasks are more of help functions that will increase the ease of use. The ability to limit the number of concurrent tasks will also be useful because it will help maintaining focus on one task instead of starting multiple tasks and loosing track of what needs to be done. Scrum could also have been used when looking at the ordering with priority however there is a slight risk of overestimating the capacity and starting to many tasks at the same time. The waterfall model will not be used because it require all components of the system to be done before anything else can be performed which might negatively impact the end result. The spiral model will not be used because it requires a lot of experience on making risk assessments in each step.

3.6 Uses of chosen methodologies and methods

During this thesis grounded theory has been used to collect and analyze texts and data. All information is gathered from an literature study where the main source of information is located in research articles and books concerning the subject. The gathering and analysis of information has been conducted iteratively during the whole project.

(29)

4 Task scheduling model

In this section a model for a general task scheduler that can be used in a variety of distributed systems and handle a variety of tasks will be presented. Which means that the model must be able to have an arbitrary number of schedulers and workers working at the same time.

The model will also be able to handle a variety of tasks which includes tasks with precedence, per-job, per-task, timing and resource constraints. All information is gathered iteratively and reviewed after the stated requirements in section 1.5.5. This means that the model grows and changes during each iteration. Methods that ﬁrst were considered to be good might have been discarded because new and better methods and models were found. All information was gathered from mostly research papers and books that where found using Bildas search engine. There are some few sources that are web pages. All information was as explained reviewed after the requirements. The model is a product of the best methods examined because of the iterations.

4.1 Scheduler model

The model of the general scheduler will use the methods explained in chapter 2.3.4. The scheduler will be a decentralized and dynamic solution that uses a greedy approach. Because of the greedy approach the scheduler will use a non-preemptive approach which means that once a task has begun to execute it cannot be replace with another task until the execution is done. The decentralized approach has been chosen because it can be directly derived from the requirements that the model needs to be scalable. A decentralized approach is much more scaleable than a centralized approach because the decentralized approach gives the possibility to run multiple schedulers at the same time and does not need full knowledge of the system.

The centralized choose could only run one scheduler and needs complete knowledge of the system. To make sure that the scheduler can run at all time without having to recompile the dynamic approach will be used. The dynamic approach will ensure that all scheduling is performed during run time and does not need to be recompiled for each new ”run” like the static scheduler needs. If the model is both scalable and can run without interruption it can be used in a more general fashion because one or multiple schedulers at the same time. This ensures that the distributed system the model is used in can be of varying size.

The scheduler will be divided into two distinct parts; Global and local part. The global part will be located on a standalone software and will make the decisions on which node/worker to place each task. The local part is located on the nodes/workers of the system and determines the execution order of the allocated tasks. The separation into a global and local part is done to ensure that the resources is utilized in good way so that the global part only focus on the

(30)

over all scheduling and the local part only tries to optimize the usage of resources. When the term scheduler is used without any other explanation it aims on the global scheduling.

There can also be multiple schedulers in use at the same time. When the term workers is used it will aim at the local part if nothing else is stated.

4.1.1 Global scheduling

The global scheduling will follow in Sparrows footsteps by using the batch sampling with late binding, see chapter 2.3.1. Batch sampling was chosen because it does not need any more knowledge of the system than the connected workers. This will ensure that the scheduler withholds the decentralized approach and as explained earlier a decentralized approach will be more ﬂexible than a centralized. Late binding was chosen because when used in combination with batch sampling late binding speeds up the execution of scheduled tasks. In late binding the fastest worker will execute the tasks ﬁrst. Batch sampling on the global part will be used to send probes to the know workers. The number of sent probes will be proportional to two times the number of tasks there is in the submitted job. This means that if there are n tasks in a submitted job, the number of probes that will be sent are 2*n. The probes contain information about the task and the scheduler it was sent from. Late binding will be used both in the global and local part of the scheduler and when a probe arrives on the local part of the scheduler the probe is placed in a queue as a reservation. When this reservation reaches the front of the queue the local part located on a worker will request to execute the task.

The global part of the system then decides if this speciﬁc worker can execute it. The decision depends on what type of task it is and if the task has not been executed already.

If there are less workers than the calculated number of probes to send the probes will be sent to all the knows workers. For each task there will exist two reservations on the workers and when the ﬁrst of the two reservations requests to execute it will get the task.

Figure 1 illustrates a example of the global scheduling using batch sampling and late binding.

In this example one task is to be scheduled. The scheduler contacts two workers and sends the task to the both workers. The workers places the task in the right queue. When the task is placed in a queue it is seen as a reservation. How the tasks are placed will be explained more in depth in the next section. Task 1 in the ﬁgure is the task was sent in to the scheduler and then was scheduled to the workers. The tasks are consumed from the queues and when a reservation reaches the front of the queue the worker contacts the scheduler and requests to execute the task corresponding to the reservation. This will also be explained more in depth in the next section.

To make sure that the model can handle tasks with resource, per-task and job constraints

(31)

Figure 1: Batch sampling and late binding

the global part of the scheduler will select a subsection of the know workers of the system has the needed parts that either the task or job require. These requirements can for example be that some tasks need to be executed on machines that have a GPU or some speciﬁc input data. This method is used because the scheduler only needs to know about its workers and does not need to communicate with other schedulers to know what to do. This will also ensure that the model will be a general solution because the scheduler model can schedule many tasks. Tasks with precedence and timing constraints will not aﬀect the global part of the scheduler and will therefore be explained in the local part.

The global part of the system will not have any communication about scheduling decisions between the deployed schedulers, if there are more than one, which means that an arbitrary number of scheduler can be used in parallel. This will ensure that the system will be scalable and can be customized to diﬀerent architectures such as server-client and grid computation system.

4.1.2 Local scheduling

The local scheduling will be located on the nodes/workers of the system and will govern the execution order of tasks. because the scheduler is suppose to be a general solution it needs to be able to handle all kinds of properties that tasks can have. In the last section task with

(32)

resource, per-task and per-job constraint was explained. In this section methods for taking care of tasks with precedence and timing constraints will be presented.

To ensure that the scheduler can handle tasks with precedence constraints, in other words execution order, the local part of the scheduler will use two types of queues, one for tasks with execution order and one for tasks without it. When tasks are submitted to the worker the type of tasks is determined and they are placed in the right queue. The right queue depends on the type of the task. Task with precedence constraints in one queue and tasks without it in the other. When two types of queues are in use to determine which of the two queues to execute a task from needs to be determined. This will be done by either combining fair scheduling with FIFO or EDF. Fair scheduling will ensure that the two queues will get the same amount of execution time and the queue with the lowest execution time will be selected. FIFO will determine what task from the selected queue to be executed. If EDF would be in use it would replace both fair scheduling and FIFO, this is done to take care of tasks with timing constraints. EDF will govern the queues and the tasks from either queue with the highest priority will be executed. When EDF are in use both tasks with precedence and timing constraints will work.

As explained in the previous section late binding will be located on both the global and local part of the scheduler. In the local part late binding is utilized such that when a probe arrives to the node, the local part is located on, it is place into a queue. The probe will contain information about the task it is associated with and the address to which the original scheduler that sent the probe. Depending on the type of task the probe contains it will either be place in the queue with execution order or the one without execution order. When the probe is located in a queue it will be seen as a reservation. When the reservation is chosen to be executed the local scheduler will contact the original scheduler and request to execute the task which the reservation contains. Depending on the answer the the task will either be executed, removed or put on hold. When a tasks with the property of execution order is associated with the chosen reservation/task multiple entries from that queue will be sent with the request. Without this deadlock can occur because the queue can contain multiple tasks which might in theory be earlier in the execution order than the chosen task.

4.1.3 Fault tolerance

The model is a decentralized and dynamic solution where many schedulers and workers might be in use. This means that faults and crashes will occur. To make sure that the system can withstand crashes on both the scheduler and workers of the system some sort of fault tolerance must be established. As explained in section 2.2.1 fault tolerance can

Task Scheduling in Distributed Systems

Task Scheduling in Distributed Systems

Model and prototype EMIL ÅSTRÖM

Abstract

Abstract

Table of Contents

1 Introduction

2 Scheduling in Distributed Systems

3 Methodologies and Methods

4 Task scheduling model