# Task Scheduling In Distributed Systems A Survey Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Task scheduling is one of the most challenging problems in distributed computing. This paper summarizes algorithms in the field of real-time task scheduling in distributed systems and discusses the scheduling paradigms. The paradigms are static vs. Dynamic, priority vs. Preemption, assignment-oriented vs. Sequence-oriented. Various scheduling algorithms are also discussed based on the environment in which they are used.

arise in distributed systems. Task scheduling is the most important issue because inappropriate scheduling of tasks can fail to exploit the potential of a distributed system. The problem of scheduling task can also be referred as task allocation problem.

Many scheduling algorithms exist for specific instance of task scheduling problem. Scheduling involves the allocation of resources and time to task to meet its performance requirements. Scheduling is a well-structured and conceptually demanding problem. Task scheduling in distributed computing system can be local scheduling and global scheduling.

This paper is survey of several approaches that are designed to schedule real-time tasks in distributed systems.

Task-Scheduling Problem

A scheduling system model consists of an application, a target computing environment, and performance criteria for scheduling. Here an application can be represented by a directed acyclic graph.

In a given graph, a task without any parent is called an entry task and task without any child is called an exit task.

Task Characteristics

An instance of a task is the basic object of scheduling. Its fundamental properties are arrival time and approximate execution time. A real-time task also specifies a deadline by which it must complete its execution. Real time tasks are of three major categories:

1) Periodic Task: A periodic tasks is one that repeats after a certain fixed time interval. The precise time at which periodic task recur are set by clock interrupts. So periodic task are also referred as clock-driven tasks. The time interval by which task repeats is called period of the task.

2) Sporadic task: A sporadic task is one that repeats at random instants. The minimum time separation between two consecutive instances of the task implies that once a sporadic task occurs then it then next instance will occur only after the time limit.

3) Aperiodic Task: An aperiodic task is similar to sporadic task. An aperiodic task can arise at any random interval. In case of an aperiodic task when compared to sporadic task the minimum separation between two consecutive events can be 0.It means that two or more events of aperiodic task can occur at same time instant.

B. Inter-Task Dependency

Depending on system's definition of task, there may be dependencies among the task. A task j is dependent on a set of task S if j cannot start execution until all task in S have been completed there execution. Scheduling independent task is much simpler. Schedulers that have to deal with dependent task usually construct a directed acyclic graph. A task represented by a node j can be scheduled to start execution only after all tasks represented by a parent node of j have completed their execution.

C. Deadline

All real-time tasks define a deadline by which they have to complete executing. They can be scheduled based on the deadline of each task. Some algorithms assign priority to the task based on the deadline. In such algorithms task having shortest deadline is scheduled first. Thus it's an important characteristic of real time task which affect scheduling. Task is schedulable under such algorithms if maximum utilization of process due to task set should be less than one.EDF is one the algorithm based on deadline.

Scheduling Paradigms

Several scheduling paradigms emerge, depending on a) whether the system performs schedulability analysis b) or whether it has been done statically or dynamically and c) whether result of schedulability analysis will produce a schedule or plan with which tasks are dispatched. Based on this we can identify following choices: should the scheduling be performed statically or dynamically, or it takes into account priority of task and allow preemption.

Static vs. Dynamic

Schedulers can be categorized as either static or dynamic depending on the time at which necessary information for the successful completion of application is available. If application can be divided into tasks each having well defined execution times, scheduling can be done statically at compile time, all necessary information for schedule such as structure of parallel program, processing time ,inter -dependent task must be determined primarily. Aim of schedule in distributed systems is optimal allocation of available hardware resources to input task so that time of program execution can be decreased. In static task scheduling, an application is represented by a directed acyclic graph. When prior knowledge of task completion is unavailable we go for dynamic scheduling. A dynamic scheduler uses current allocation information to determine feasibility of application.

Static scheduling algorithms can be classified into following categories: priority based, clustering based, duplication based. Priority based approach task allocation is based on the priority assigned which are based on their deadline or arrival time. Clustering based approach groups task with data dependencies. This algorithm minimize inter process communication. In duplication algorithm task can be duplicated to take benefit of available processors. More than that static schedulers are supposed to have more time and space at their disposal and therefore they can afford more expensive technique producing better schedule.

In dynamic scheduling tasks feasibility is checked at run time i.e.. a dynamically arrived task is accepted for execution only if it is found feasible. In this scheme task are assigned at run time. When scheduling is made we consider current and future availability of system resources. A dynamic scheduler uses this information to determine whether new task can be scheduled, or current task to be dropped or priorities assigned to task need to be re-evaluated. In dynamic best effort approaches no feasibility checking is done .Although scheduler can make optimizations, limits should be placed on complexity of algorithm. Additional benefit of dynamic scheduling is that it's able to adjust changes in the processing environment, so that it will provide greater system availability.

Priority an d Preemption

Real time task are prioritized based on their deadlines. Priorities can also be assigned based on their processing time, frequency of execution or by the severity of deadlines. Priorities may be fixed or dynamic. Fixed priorities do not change over time and can be assigned statically. On the other hand dynamic priority may change at run time.

A useful mechanism in supporting priority policies is preemption. Preemtive scheduling is widespread in operating systems and in parallel processing. Preemtive algorithms provide good throughputs in managing distributed computation. Being able to preempt a low priority task in order to schedule a higher priority task will allow more flexibility in constructing a schedule. Even if preemtive schedulers allow the scheduler more flexibility, they are also more complex than non-preemtive schedule and require more resources. Poorly designed preemption strategies can easily lead to starvation of low priority task.

In the simplest case, it is possible to take benefit from preemtive scheduling by use of round robin scheduling. If some information about execution time of task is available we can perform scheduling by optimization algorithm. Here time slice is an important parameter.

C . Assignment-Oriented vs. Sequence-Oriented

The problem of scheduling can be viewed as a search for feasible schedule. Feasibility of a schedule is defined as meeting a set of constraints such as deadline compliance, task dependencies. The search space can be assigned as a tree where each node is a task-processer assignment. A schedule may be any path from root to leaf, although they may not be a feasible one. A partial schedule is path between any two nodes. Most scheduling algorithms traverse such a tree in Depth-First order(DFS).In case of multiple feasible schedule exists our goal is to find optimal one where optimality is achieved by maximizing or minimizing some metric function.

The search through task- processer tree can be categorized as assignment-oriented or sequence-oriented. In assignment-oriented search for each level of tree an algorithm chooses a task and tries to assign some processors.

## .....

T1

Pn-1

P

T1

Pn

T1

P1

Fig. 1 Assignment-Oriented search tree

In sequence-oriented search, an algorithm picks a processor and then tries to find a task for it. The algorithm runs in exponential time and space.

If the scheduling algorithm is static and has no time constraint it can search tree in one attempt and can find all feasible schedules. In case of dynamic algorithms it will produce partial schedules at specified interval. In between intervals new jobs will be added to the scheduler and jobs with expired deadlines are removed.

Fig. 2 Sequence-Oriented search tree

Sequence-oriented approach of scheduling real-time task in distributed system is scalable. It's because it chooses a new task to run on next processor during each iteration from the queue.

Algorithms for Scheduling Paradigms

The variety of performance metrics, scheduling approaches and processing resources used by each task imply a wide variety of algorithm. Here each algorithm is described in terms of problem it tries to solve.

Rate Monotonic Scheduling

The problem of scheduling periodic task with hard deadlines in which deadline is equal to period of the task was studied by Liu and Layland. The processor executes with task having highest priority. The first optimal fixed priority algorithm they introduced was rate-monotonic algorithm. Here task having shorter period is given higher priority and vice versa. Even though they addressed only single processor, they are applicable to distributed systems.

Rate Monotonic algorithm is fixed priority, static scheduler which assigns priority to the task by a monotonic function of its rate. Once priority is assigned processor will select the task with higher priority from available task.

In rate monotonic algorithm if total processor utilization is higher than .7 then only task can be scheduled successfully. For large periodic task having total utilization .9 can be scheduled using rate monotonic algorithm. But deadline driven algorithm will provide full processor utilization. So we assign priority based on deadline. Priority assignments are adjusted every time a new task arrives.

Real-Time Self-Adjusting Dynamic Scheduling

RT-SADS (Real Time Self-Adjusting Dynamic Scheduling) make use of current state of the system to tune itself. This algorithm was proposed by Atif and HamidZadeh.

RT-SADS schedules non-preemtable, aperiodic, soft real time tasks and tasks which are independent with deadline on a set of processors with distributed memory. The primary goal of this algorithm is scalability of deadline compliance. This algorithm makes use of dedicated processor and it performs search through assignment-oriented search space.

RT-SADS based on processor load, task arrival rate and slack to adjust the scheduling stage duration. If the duration is longer then it can traverse greater part of search tree and can find out more feasible solutions. At the end of each scheduling stage more than one feasible partial schedule will be available.

Tasks which are independent, non-preemtable are scheduled using this algorithm, this simplifies design of scheduler .Here processor need not worry about precedence constraints as the tasks are independent.

RT-SADS can be made distributed. Here multiple processors can perform scheduling stage simultaneously, because any sub tree search is independent of its sibling. One processor will be centralized processor in charge of assigning sub-trees and choosing the best scheduling stage from the result returned by each scheduling processor.

SP3

SP2

SPk

T1

Pn-1

T1

P1

T1

Pn

n

Fig. Distributed RT-SADS

Preemptive Scheduling

All multitasking operating systems use preemptive scheduling. But preemptive scheduling in distributed system is rare. In parallel application it divides itself into set of tasks. The scheduling algorithm provides good throughputs in managing distributed computations.

Theoretical study on parallel computing shows that attaining optimality in preemtive scheduling is polynomial. Work in distributed scheduling is scanty. Static scheduling schemes are used by distributed parallel processing. Preemption is actually feasible and an attractive scheduling policy.

In the list of preemption algorithms, the first algorithm is targeted for the situation where number of arrived task is greater than number of processor available. This algorithm pre-computes optimal schedule .It requires task execution time to be known in advance.

The second algorithm is a variation of round-robin algorithm. This algorithm is known as Distributed, Fault tolerant round robin algorithm. Here a set of x task is scheduled on y processors where x is greater than y.After some interval tasks are pre-empted and x tasks are assigned. This scheduler yields a fault-tolerant system as the tasks are pre-empted.

The third algorithm is the best algorithm of compared to the other algorithms. This is the preemtive task bunching algorithm. All x tasks are first bunched into y bunches and then y bunches are assigned to y processors. When a machine finishes its assigned bunch then it pre-empts all other tasks and all remaining tasks are collected and rebunched and assigned again. This algorithm works well in machines with varying speeds.

The above schemes are suited for distributed computing system. They are also suited in case where independent task

have to be scheduled. But the implementation of preemptive scheduling was very time consuming.

Epoch Task Scheduling

Epoch scheduling is a special type of scheduling for distributed processor. In this scheme, processor queues are rearranged at the end of predefined intervals. This time interval between successive queue rearrangement is called an epoch. To efficiently schedule it is necessary to divide programs into tasks. At the end of an epoch, the scheduler recalculates the priority of each task in the queue using Shortest Task First (STF) criteria.

In this scheduling we consider the parallel program as simple fork-join. Here rearrangement of queues takes place at the end of predefined intervals. We can compare the performance of the epoch in relation to STF method for different epoch sizes.

Dynamic Scheduling Using Genetic Algorithm

Task scheduling is one of the important issues in distributed computing. This is because inappropriate task scheduling can exploit true potential of a distributed system. To find a good solution we can use Holland's genetic algorithm. This algorithm applies evolutionary strategies to allow fast exploration of search space.

Some assumptions are made that reduce generality of the solutions obtained, which assumes that scheduling can be calculated off-line in advance and cannot be change. Genetic algorithm works dynamically such that it allows tasks to arrive for processing continuously and also considers variable resources, which has not been considered by other Genetic Algorithm (GA) schedulers.

A generation of a GA contains population of individuals which are possible solutions from a work space. Using fitness function each individual in the population is evaluated which produce a value that indicates goodness of the solution. Certain numbers of individuals are selected from the population which brings them forward to next generation. Crossover takes each pair from the population and use parts of each to produce new individuals.

Dynamic schedules make use of the state of task and system resources known in prior to create schedules at runtime. Dynamic schedulers are more practical than static schedule in distributed computing systems. The GA scheduling algorithm was developed based on homogeneous GA scheduler. Here the goal of algorithm is to schedule total number of tasks for processing in a distributed system and should have minimal total execution time. In distributed system the processors are heterogeneous. The resources available for a processor can vary on time. Tasks are independent and we cannot partition and can be processed by any processor in distributed systems.

When tasks arrive they are placed in a queue and then batches of tasks are scheduled on processors during each invocation of schedule. During invocation of scheduler each idle processor requests task from scheduler and after processing returns its result. When processor requests for waiting task, task at the head of the queue is send for processing.

The queue of unscheduled task contains large number of tasks and to schedule all at once it requires long time to select efficient schedule. To speed up the schedule and to reduce the chance the processor become idle, we consider only the subset of unscheduled task queue.

D. Duplication based Scheduling

Duplication based scheduling algorithms is to schedule a task set by mapping redundantly some of its task. This reduces inter process communication. According to the selection strategy of task set for duplication, the algorithm will vary. The algorithm in this scheduling are usually for an unbounded number of identical processors and this algorithm has higher complexity value than other groups.

In case of real time application it can be scheduled through extra or idle processors with task duplication. The duplication based algorithm is actually static scheduling algorithms designed to minimize the execution time of real time task by taking advantage of idle processors. In case of task set that are partitioned this algorithm produces efficient schedules. Here these tasks are considered to be non-preemtive processes running on homogeneous, connected, unbounded set of processors.

Since application is partitioned main task is to identify critical task among the task set. Here we draw a directed acyclic graph (DAG) to show the dependency among tasks. In DAG critical node is that node having more than one parent.Inorder to identify completion time of predecessors we identify critical task .If the predecessor has latest running time then it has to run in the same processor in which the critical process run. This reduces inter-process communication cost.

Duplication based algorithm process in three steps. In first step it computes earliest start time and earliest completion time. In step two it rewrites task dependency as an inverted tree if there exist any node with more than one child. Finally this inverted tree is assigned to processor.

One of the drawbacks of this algorithm is that it assumes that there are enough processors available to handle any amount of task duplication. A modified version is introduced to address this issue. The new version takes four steps to determine optimal schedule. In step one it computes earliest start time and earliest completion time as in duplication based scheduling. Step two computes latest allowable start end completion time for each task. Here in step three it traverse the graph, thus dividing task into clusters. Step four uses the information of possible improvements in completion time. Both the schemes will emphasis strict constraints for producing optimal schedule with computational cost reasonable.

Conclusions

This paper introduces the problem of scheduling real time task in distributed system. The general task scheduling problem includes assigning task of an application to suitable processors and to perform event ordering. The intension here is to compare work in resource management. Here it discusses various options available to the problem of resource management. Analysis of existing algorithm will focus on problems on choosing solution and point out its shortcomings.