This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This research paper discusses on 'Linux Scheduler 2.6' supported by Linux Operating System. Paper describes and explores algorithms and mechanisms that play a vital role in scheduling processes/tasks for Linux OS. It also discusses different constraints and characteristics of CPU scheduler for version 2.6 of Linux kernel and also includes comparison between previous versions.
Linux is a highly popular operating system today freely distributed over the internet. There is a fast paced development and implementation of Linux kernel considering patches and upgrades.
This paper will provide thorough documentation of Linux 2.6 CPU Scheduler. Linux kernels are numbered in the form W.X.Y.Z where W is changed only when major change takes place in the kernel development such as it is incremented only if not compatible with previous versions, X denotes the kernel series, Y being the version number and Z is incremented when a bug is resolved from previous version.
Kernel is responsible for keeping track of thread details such as threads state and the necessary information to carry out the implementation of given task. We know that there is only one thread running or executing on a CPU at any given point of time. This is the reason why CPU Scheduler plays a pivotal role for any Operating System.
CPU Scheduler is responsible for determining thread that has to be implemented next. Thread can be CPU or I/O bound. Schedulers tend to give priority access to the I/O threads. This helps in avoiding wait for the time it takes to get information from I/O allowing the next thread in the queue to be accessed keeping the earlier thread idle.
Scheduling in Linux
In Linux, all threads are simply processes sharing resources. A process in Linux is simply a group of threads that share a TGID (Thread Group ID) between themselves.
Scheduler Design Considerations
- Prevent starvation of tasks
- It should maintain Scheduling latency, reducing delay between tasks to wake up and task to run.
- The decisions taken by the scheduler should be optimum without delay.
- Priorities should be given to important tasks rather than less important tasks.
- There should be no CPU sitting idle when tasks are in queue.
As being freely available, Linux has been way different from other operating systems in the market. It is important to know that scheduling goals and algorithms are determined by the market it targets. Linux was earlier meant as high end operating system and mostly targeted server market. But the lately written scheduler for 2.6.x kernel has increased the number and type of users targeting both the server market and the desktop market.
Few of the scheduler goals are discussed below:
Efficiency is the most important goal of a scheduler. A scheduler must prioritize the task and try and allow as much of the real work to be done while staying within the restraints of requirements. If context switching is costly then there should be a way to run the task for a longer time. Schedulers speed should also be good enough as it runs most of the time. Scheduler decisions should run as quickly and as efficiently as possible. If all the other goals are met then efficiency plays a major role.
Scheduler should provide with good interactive performance during high load. Schedulers designed should be capable of giving fast responses on input/output processes. If the user types or clicks then the system must react instantly and must execute user tasks smoothly, even during considerable background load.
No process should stay without for any timeslice for uncertain amount of time. Also no process should get unjustly high amount of CPU time. Tasks are to be treated with certain degree of fairness so as no thread ever starves.
Tasks should be properly handled with priority. More important tasks can be given higher priority while with less important tasks should be given lower priority.
Linux supports multiprocessing and hence its scheduler should handle tasks efficiently across several CPUs by maintaining task and CPU status all the time. As all CPUs access memory and resources, scheduler's primary task is to make best use of processer task. Also there should be no CPU staying idle when there is work to be done.
The tasks that run on one CPU should stay affine to that CPU and should not bounce between the CPUs too frequently unless required.
Multiple threads are supported on a single Symmetric Multi-threading chip by Linux kernel. Each physical SMT chip can have more than one virtual processor. But there should be a consideration as the virtual processors share some resources like types of cache. As some resources are shared, virtual processors should not be treated same as that of regular processors.
SMP load-balancer can be extended / switched with additional parallel computing and cache hierarchy concepts such as Non-Uniform Memory Access (NUMA) scheduling. The Linux kernel supports NUMA. The Linux kernel runs a single image of a system across more than one node (where node can be a motherboard) if supported hardware present. At a hardware level, a node is like a traditional uniprocessor or multiprocessor machine in which it has its own CPU(s) and memory. But, NUMA treats each node as a part of a single system as it runs a single image throughout. Therefore, CPUs as nodes are capable of executing any thread, and also the memory across nodes. Perhaps the biggest issue that a scheduler supporting NUMA needs to tackle is the possibility of having far more CPUs to schedule on than most SMP systems. Common SMP systems might have anywhere from 2-8 processors, but NUMA systems can have hundreds of processors. Linux 2.6 kernel offers extended support to NUMA scheduling.
Soft RT Scheduling
The Soft Real-Time Scheduling can effectively be used to schedule the tasks with strict timing constraints. However, Linux 2.6 kernel is capable of managing very strict RT scheduling deadlines but does not give guarantee that deadlines will be met. Real-Time tasks include two different scheduling modes FIFO (First In First Out) which acts in First Come First Serve basis and Round Robin in an Round Robin fashion.
There are different scheduling performance perspectives considering needs of everyone. There is no single scheduling goal for Linux scheduler to strive for. It mostly leads to give and take situation. Improving performance in one way may lead to decrease the performance in other way. The most important performance issue for desktop users is perceived performance. It can be put as how fast is the machine response for requests by mouse clicks or key presses. Context switching can play important role here. Fewer context switches leads to better efficiency while more number of context switches more responsive is the machine to the user. Server on the other hand is focused less on perceived performance than desktop systems as it is more concerned with actual performance. Whereas High Performance Systems require least immediate response times as they are involved in handling larger problem than desktop or server. So it can be said that there is no single universal goal for scheduler performance. The Linux scheduler strives hard to tackle with these situations.
Linux Scheduler 2.6 CPU Scheduler
Early Linux schedulers used minimal designs and not massive architectures considering multiple processors and hyper threading. Like 1.2 Linux scheduler used a circular queue for runnable task management that operated with a round-robin scheduling policy, 2.2 introduced the idea of scheduling classes, permitting scheduling policies for real-tasks, non-preemptible tasks, and non-real-time tasks. The 2.2 scheduler also included support for symmetric multiprocessing (SMP). The 2.4 kernel included a relatively simple scheduler that operated in O(N) time. The 2.4 scheduler divided time into epochs, and within each epoch, every task was allowed to execute up to its time slice. While 2.4 CPU scheduler was widely used and very reliable had few undesirable characteristics which had to be taken care of. Ingo Molnar was the one who came up with this and completely changed the Linux scheduler instead of making changes to earlier one.
O (1) Algorithm
O(1) algorithm is basically the scheduling algorithm which is independent of the input to it. The Big-O notation denotes the growth rate of an of an algorithms execution time based on the amount of input. The running time of an O(n) algorithm increases linearly as the input size n grows. To maintain constant upper bound on running time, then it is considered to be O(1). This can guarantee that algorithm completes in certain amount of time regardless the size of the input to it. It is also called Constant Time Algorithm.
The highest priority level, with at-least ONE task in it, is selected. This takes a fixed time (say t1)
The first task (head of the doubly-linked list) in this priority level is allowed to run. This also takes a fixed time (say t2).
Total time taken for selecting a new process is t = t1 + t2.
The time taken for selecting a new process will be fixed (constant time irrespective of number of tasks).
Runqueue is the most basic structure on which whole algorithm is built. Run Queue keeps track of all runnable tasks that are assigned to the particular CPU. As such one run queue is created and maintained for each CPU in a system.
Each Runqueue contains two priority arrays. All tasks start in active priority array and if they are out of timeslices then they are moved to the expired priority array. If there are no more active tasks to be implemented in active priority array, then the active priority array is simply swapped with expired priority array. The main job of the Runqueues is to handle CPUs special thread information and the two priority arrays.
Priority arrays are an array of linked lists, one for each priority level. In Linux 2.6 scheduler there are 140 priority levels. This data structure is basis for the most important and advantageous behavior of Linux 2.6 scheduler considering its O (1) time performance algorithm. Linux 2.6 Scheduler always schedules the highest priority task and if more tasks are present at the highest level they are taken care in round robin fashion. To find the highest priority in the tasks and to attain round robin in constant time, priority tasks are used.
It is used to dynamically scale a tasks priority based on its interactivity. Interactive tasks receive a priority bonus providing with larger timeslice whereas CPU bound receive priority penalty. The bonus is a number in [-5, 5] that measures what percentage of the time the process was sleeping recently where 0 is neutral, 5 is helpful and -5 hurts. Interactive estimator does not apply to RT tasks.
Recalculation of Priorities
When task finishes its timeslice, its interactivity is estimated. Interactive tasks can then be inserted into the Active priority array; if not then priority is recalculated and inserted into the NEW priority level in the expired array.
Re-Inserting Interactive Tasks
To avoid delays, interactive tasks may be re-inserted into the active array after their timeslice is expired. It can be only done if tasks in the expired array have run recently. This is basically done to prevent starvation. Also, the decision to re-insert depends on the task's priority level.
In a fine-grained timeslice distribution priority is calculated only after expiring a timeslice. Interactive tasks may become non-interactive during their large timeslices making other processes to wait and starve. To prevent this, timeslices are divided into small chunks of 20ms. A task of equal priority may preempt the running task every 20ms and is re-queued in a round robin fashion according to its priority level. There is also priority recalculation after every 20ms.
schedule() function is the main scheduler function. It picks up a new task to run and stick to it unless completed. This task is invoked whenever a task wishes to give away CPU voluntarily or whenever the task run out of timeslices. At first, the schedule() function makes sure it is not being called unnecessarily. Then, it disables preemption and also determines length of time that the task to be scheduled has been running. If there is high interactivity credit, the corresponding task time is then reduced as it would be undesirable for a task to wait for long period of CPU usage during I/O operation. If the function is entered off of a kernel preemption interruptible tasks with a signal pending get a state of TASK_RUNNING and uninterruptible tasks get removed from the Runqueue. As the task has a signal pending and it can be interrupted, it needs to handle that signal. The tasks that are non-interruptible should not be present in the Runqueue.
Now there is a lookup for next task to run. If there are no runnable tasks in the Runqueue load balancing occurs. If still there are no tasks after load balancing, then it turns to idle task. If there are runnable tasks in the Runqueue but not present in the active array then there occurs a swap between active array and expired array. At this point there is a runnable task in the active priority array. Now the active priority array is checked for next task in the highest priority level. After which the dependent sleeper tasks on virtual CPUs given chance.
If there is no switch to idle task then a check is performed to see whether the task to be run next is not an RT task and has been woken up. If the task is not an RT task then its priority is dynamically recalculated. After check is performed, the wakeup flag is cleared. Now schedule() is ready to make an actual task switch and whatever task is pointed next is initiated for execution.
At times tasks stay on particular CPU for most part than other CPUs in the system. This would make other CPUs to sit idle and do not work. It is necessary to move load from one CPU to the other CPUs who are idle to increase efficiency and balance the system. Load balancing is important part of any kernel that runs more than one CPU in a system. Since there is one Runqueue per active CPU in a system, it makes sense for that data structure to keep track of each CPU's load. Each Runqueue maintains a variable called cpu_load, which stores a value representing the CPU's load.
Soft RT Scheduling
Linux 18.104.22.168 supports soft RT scheduling. This scheduling algorithm does good job of meeting scheduling deadlines but never guarantee that they will be met. Prioritizing is done on RT and non-RT tasks between the range 0-140. There are no other tasks running as RT tasks are runnable operating with different scheduling schemes such as SCHED_FIFO and SCHED_RR whereas non-RT tasks are marked SCHED_NORMAL being default scheduling scheme.
SCHED_FIFO tasks are scheduled in First in First out manner. Multiple SCHED_FIFO tasks are scheduled according to priority and the higher priority task always preempts the lower one.
SCHED_RR tasks have timeslices and always preempted by SCHED_FIFO tasks. It is similar to SCHED_FIFO scheduling scheme. SCHED_RR tasks are always scheduled in priority and in certain priority they are scheduled in round robin fashion.
NUMA architectures differ from uniprocessor and SMP systems in which a NUMA system can have multiple nodes. Scheduler Domain System is a critical part of 22.214.171.124 NUMA support. A NUMA system can allow use of memory on any node for that CPU. It is always faster to access memory banks than physically accessing memory. The Scheduler System Domain solves the proximity issues by bringing resources together in groups.
In the NUMA system the top level scheduler domain contains all the CPUs of the system. There is one group for each node, that group's CPU mask contains all the CPUs on the node. The top level domain has one child scheduler domain and each child has one group per physical CPU.
Schedulers can be tuned as needed by the developer. It can be for the desktop machine or server machine. If users with development skills want to optimize CPU scheduler sacrifices efficiency for response time or system admin sacrificing servers' response time for efficiency. It can be achieved by increasing or decreasing timeslices, starvation limit, max_sleep_avg value and prio_ratio_bonus as required.
Linux 2.4 CPU Scheduler vs. Linux 2.6 CPU Scheduler
Advantages of Linux 2.4 CPU Scheduler
It works well and runs on any system from servers to supercomputers and hence great demand.
Linux 2.4 scheduler is robust and efficient enough to make Linux a major player in computing world comparing many schedulers in the past.
Linux 2.4 file kernel/sched.c is one third of that of Linux 2.6 file. It has a relatively simple logic.
Tweaking scheduler behavior is fairly easy in Linux 2.4 for specific situations.
Disadvantages of Linux 2.4 CPU Scheduler
The Linux 2.4 scheduler runs on O(n) algorithm, hence scheduling overhead on the system is dismal. For every schedule() call, every task must be iterated at least once for scheduler to do its job. There are frequent long periods of time when no real work is done. This affects scalability.
The average timeslice assigned by this scheduler is very large. This large timeslices can cause time between low priority tasks to grow quite large and in turn affect the scheduler performance.
Scheduler preference for I/O bound tasks has some flaws. First, priority boost is provided to I/O bound non-interactive tasks even when not needed and second there is cancellation of tasks due to high CPU usage and interactivity for CPU bound interactive tasks.
This scheduler is weak supporter of RT applications as it is not preemptable.
Advantages of Linux 2.6 CPU Scheduler
An O(1) algorithm to select next task to run provides better scheduling with constant time for each task. In O(1) algorithm the scheduler recalculates timeslices as each task uses its timeslice. Also priority arrays help in finding high priority tasks with very simple logic.
There is immediate response to interactive processes even when there is considerable load.
There is reasonable prevention of starvation, not allowing tasks to wait.
It provides good symmetric multiprocessing scalability and task affinity.
Linux 2.6 is a preemptable kernel and hence RT application support is considerable better than previous versions.
Disadvantages of Linux 2.6 CPU Scheduler
The two-array system at times is not the ideal solution for interactivity ass when interactive task is to be expired, there could be long period of unresponsiveness while rest of the array is cleared out.
There are certain times when Linux 2.6 kernel cannot be preempted affecting the RT support. Hence, RT support is not perfect yet.
Future Modifications and Enhancements to Linux 2.6 CPU Scheduler
The Linux 2.6 CPU scheduler is quite solid and hence is unlikely to go through a major change. It could go through minor tweaks by improving the things that are implemented now. It also has a strong set of data structures used and are unlikely to go through any changes. New features will be added but the overall structure would remain the same with minor changes.
The addition of SMT to scheduler is not perfect and can be modified for load balancing between CPUs. Use of shared Runqueues would result in better load balancing in physical CPUs before virtual CPUs.
Swappable kernels are most likely the future enhancement to Linux 2.6 scheduler, being able to switch between the schedulers.
The Linux 2.6 scheduler provides with strong improvements over its earlier versions. It has maintained better scalability on larger machines for most workloads and has improved performance. The 2.6 scheduler resolves the primary three issues found in the earlier scheduler (O(n) and SMP scalability issues), as well as other problems. The processes scheduled more efficiently and the scheduler has been redesigned to be more scalable when the number of processes in a machine are increased. An additional benefit to the improvements in the scheduler allows more efficient use of resources, such as forked processes, and with a default kernel configuration, more processes can run on a system than before.