Abstract- To Schedule the Simultaneous Multithreading (SMT) scheduling to improve CPU performance by using Hybrid Genetic Algorithm thereby improving the processor's execution resources...
Keywords- Cross-over, Mutation, Fitness value, schedules, superscalar architecture, multiprocessors, kernel level external module.
The ratio between main memory access time and core clock rates continues to grow. As a result, a processor pipeline may be idle during much of a programs execution. A multithreading processor can maintain a high throughput despite large relative memory latencies by executing instructions from several programs.
Simultaneous Multithreading is the least restrictive model from multiple threads can execute in the same cycle.
SMT or Multithreaded Superscalar Architectures is a technique that permits high processor utilization by allowing multiple independent threads  to coexist in the processor pipeline and share a group of execution resources with support of multiple hardware contexts. It improves the Utilization of processor's execution resources and providing latency tolerance, parallel workloads   by exploiting Thread level Parallelism in hardware without the need to context switch, in addition to the exploitation of Instruction Level parallelism in modern wide issue superscalar processors.. It improves processor throughput (both fine-grain multithreading and single-chip multiprocessors) by processing instructions from multiple threads each cycle
Scheduling with SMT requires two decisions:
- Which threads to run simultaneously.(the co-schedule)
- How to share processor resources among co-scheduled threads.
SMT scheduling or Resource sharing involves the following criteria .First (co-schedule selection) ,when the number of available tasks is larger than the hardware contexts supported by the SMT processor, we need to determine which tasks to co-schedule. Second(Resource Sharing) ,we need to determine how to partition processor resources among co-scheduled tasks- in an SMT processor, this partitioning can change each cycle
Materials & Methodology
Genetic Algorithm, the method/material is used in this project for LINUX platform, Open Source software.
Genetic Algorithm Methodology
Genetic algorithm, powerful and broadly applicable optimization technique and the most widely known type of evolutionary computation methods can be used for optimization.
It is a stochastic search algorithm, which borrows some concepts from nature. GA maintains a population pool of candidate solutions called strings or chromosomes. Each chromosome is a collection of building blocks called genes. A fitness value is associated with each chromosome. A user-defined function called fitness function determines the fitness value.
The function returns a magnitude that is proportional to the candidate solution's suitably and/or optimality. At the start of the algorithm, an initial population is generated randomly or according to some rules. The reproduction operator selects chromosomes  from the population to be parents for a new chromosome. The crossover operator oversees the mating process of two chromosomes. It decides what genetic material from each parent is passed on to the child chromosome. The mutation operator takes each chromosome in the offspring pool and randomly changes part of its contents. Thus each new generation of chromosomes are formed by the action of genetic operators on the older population. The chromosomes are compared via their fitness value to derive a new population, where the weaker chromosomes may be eliminated. 
GA uses selection and recombination operators to generate new sample points in the solution space. GA encodes a potential solution to a specific problem on a chromosome-like data structure and applies recombination operators to these structures in a manner that preserves critical information. Reproduction opportunities are applied in such a way that those chromosomes representing a better solution to the target problem are given more chances to reproduce than chromosomes with poorer solutions. GA is a promising heuristic approach to locate near-optimal solutions in large search spaces.
Selection is the process of keeping and eliminating chromosomes in the population based on their relative quality or fitness. In most practices, a roulette wheel approach, rank-based or value-based, is adopted as the selection procedure.
New chromosomes, called off springs , are formed by (a) merging two chromosomes from the current population together using a crossover operator or by (b) modifying a chromosome using a mutation operator. Crossover, the main genetic operator, generates valid offspring by combining features of two parent chromosomes. Chromosomes are combined together at different crossover rate, which is defined as the ratio of the number of offspring produced in each generation to the population size.
Mutation, a background operator, produces spontaneous random changes in various chromosomes. Mutation serves the critical role of either replacing the chromosomes lost from the population during the selection process or introducing new chromosomes that were not present in the initial population. The mutation rate controls the rate at which new chromosomes are introduced into the population. In this paper, results are based on the implementation of a set of crossover operators, which is mostly used in scheduling problems, and insertion mutation operator that selects a gene randomly and inserts it in a random position.
Linux Kernel & HT
Linux is a Unix-like computer operating system. Linux is one of the most prominent examples of free software and open source development: typically all underlying source code can be freely modified, used, and redistributed by anyone. The name "Linux" comes from the Linux kernel started in 1991 by Linus Torvalds. The system's utilities and libraries usually come from the GNU operating system, announced in 1983 by Richard Stallman. The GNU contribution is the basis for the alternative name GNU/Linux. Predominantly known for its use in servers
Overview of HT
Hyper-Threading Technology (HT) is Intel's implementation of Simultaneous Multi-Threading .SMT Machines increase utilization of the execution resources in a processor package and speedup the execution of jobs, by fetching and executing multiple instruction streams.
Generation of Initial population
For each genetic-based scenario, random schedules are generated for the initial population. The poorest schedules are eliminated from the initial population, and the GA population size is kept a constant. Natural selection scheme selects the top performers and eliminates the worst performers. The various recombination operators are used at various crossover rates.
Finding fitness values
This fitness routine helps balance out the idea of pure throughput. In the throughput phenotype, the fitness simply looks at the number of sectors read or written in during a child's lifetime. This phenotype makes sure data is actually moving, and not just servicing a lot a small request.
Selection is the process of keeping and eliminating chromosomes in the population based on their relative quality or fitness
- Rank based Selection: In a rank based selection scheme, the population is sorted according to the fitness values. Each chromosome is assigned a sector of the roulette wheel based on its ranked-value and not the actual fitness value.
- Roulette wheel selection: A value-based selection scheme assigns roulette wheel sectors proportional to the fitness value of the chromosomes.
- Natural selection: Natural selection based fitness assignment provides uniform scaling across chromosomes in the population and is less sensitive to probability-based selections.
Crossover, the main genetic operator, generates valid offspring by combining features of two parent chromosomes. Chromosomes are combined together at different crossover rate, which is defined as the ratio of the number of offspring produced in each generation to the population size as shown in fig 1.3.
Mutation operator Shift, Inverse
Mutation, a background operator, produces spontaneous random changes in various chromosomes. Mutation serves the critical role of either replacing the chromosomes lost from the population during the selection process or introducing new chromosomes that were not present in the initial population. The mutation rate controls the rate at which new chromosomes are introduced into the population. In this paper, results are based on the implementation of a set of crossover operators, which is mostly used in scheduling problems, and insertion mutation operator that selects a gene randomly and inserts it in a random position as shown in fig1.2.
Developing Kernel Level External Module
Linux kernel 2.6.18 is used for our project. The proc file system, which is a pseudo-file system rooted at /proc contains user-accessible objects that pertain to the runtime state of the kernel and, by extension, the executing processes that run on top of it. The proc file system exists only as a reflection of the in-memory kernel data structures it displays. In our work, we propose to develop Kernel level external modules in the proc file system.
SMT vs. the Superscalar
Superscalar processor executes a single program or thread. It attempts to issue multiple instructions each cycle of a program. Due to low instruction-level parallelism, it may not find sufficient number of instructions to issue, causing horizontal and vertical wastes.
SMT processor fetches instructions from all threads for execution each cycle. It exploits both instruction-level parallelism and thread-level parallelism. Thus, it recovers issue slots lost to both horizontal and vertical wastes.
SMT vs. the Multiprocessors
The fixed partitioning of the multiprocessors hardware resources prevents them from responding well to changes in instruction and thread level parallelism. Processors can be idle when thread level parallelism is insufficient; and processors have trouble exploiting large amounts of instruction level parallelism in the unrolled loops of individual threads. An SMT processor, on the other hand, dynamically partitions its resources among threads, and therefore can respond well to variations in both types of parallelism, exploiting them interchangeably. When only one thread is executing, (almost) all machine resources can be dedicated to it; and additional threads can compensate for a lack of instruction level parallelism in any single thread.
From the graph, it is clear that Simultaneous Multithreaded processor shows better results than either of the other two architectures. The SMT performance peaks at 6.2 instructions per cycle as shown in fig1.4
Thus CPU performance can be improved by scheduling the Simultaneous multithreading using Hybrid Genetic Algorithm Implementation.
- D.Tullsen, S.Eggers, J.Emer, H.Levy, J.Lo and R.Stamm.Exploitting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor. In international Symposium on Computer Architecture, May 1996.
- J.Lo, S.Egger .Converting thread-level parallelism into instruction-level parallelism via simultaneous multithreading. ACM Transactions on Computer Systems, August 1997.
- T.Ohsawa, M.Takagi.S.Kawahara and S.Matsushita. Pinot: Speculative Multi-threading processor architecture exploiting parallelism over a wide range of granularities. In Proceedings of 38th International Symposium on Micro Architecture [Micro'05], 2005.
- R.Sasanka, S.V. Adve, Y.Chen and E.Debes.Comparing the energy efficiency of cmp and smt architecture for workloads. In UIUC CS Technical report, 2003.
- R.Alverson, D.Callahan, D.cummings, B.Kobelnz. The Tera Computer System, In Proc. Of the 1990 Intl.Conf. On Supercomputing.
- H.Levy,S.Parekh and D.Tullsen . Tuning compiler Optimization for simultaneous multithreading. In International Symposium on Microarchitecture.2003.
- Genetic Algorithms in Search, Optimization and Machine Learning, David E.Goldberg.
- Genetic Algorithms, Mitsuio Gen, Ranwei Cheng.