The Technique Of Virtualization Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Virtualization is a technique that can divide a computer or a cluster of computers to multiple execution environments. Each execution environment can host a guest operating system, which runs in its own seperate isolated container like environment. These are also called virtual machine images. These execution environments are portable and also secure.

The use of virtual machines has increased in the past decade. In the 1960's the main purpose of using virtual machines was that it provided a way of dividing the expensive computer hardware into multiple execution environments. But today by having virtualization we can run more than one server on one computer thereby reduce space and power consumption. A few of the most valuable uses that virtual machies serve today are, users can try new operating systems with ease , try new software without any impact to host operating system , run legacy software applications on old virtualized hardware.

There are several methods of making virtualization possible. Operating system-level virtualization , Kernel Level Virtualization and Hypervisor based Virtualization are a few virtualization techniques. Hypervisor based virtualization which is implemented on top of hardware presents the guest operating systems a virtualized interface of the underlying hardware, handles resource allocation, manages the execution of virtual machines. There are two types of hypervisors. Hypervisors that run on native hardware without operating system support and the other type which runs on top of a host operating system. My main focus would be on the latter hypervisor. Examples for Host hypervisors include Vmware, Xen , Microsft Virtual PC. Exmaples for Native Hypervisors are VMWare ESXi , P98

For the VM to run at full speed the VMM and the guest operating system should be made aware of virtualization. This is called hybrid virtualization. The VMM is made aware of virtualization by running the VMM on a processor that supports hardware assisted virtualization. And the virtual machine is made aware of virtualization by using para virtualization.


The remainder of this article is organized as follows.

Section 2 gives details about the current I/O problem in hypervisors.

Section 3 gives details about how I/O is performed in the hypervisor. Xen is used as a example.

Section 4 gives details about the three virtualization techniques and the associated I/O problems.

Section 5 gives details of various technqiues used to improve I/O performance.

Finally, Section 6 gives the conclusions.

\section{Current Issues with I/O virtualization in hypervisors}

Instructions that a CPU executes can be divided into two. They are priviledged and non priviledged. Priviledge instructions are captured by the virtual machine monitor and is further processed. Non priviledged instructions are executed on native CPU. Since most of the instructions that the CPU executes is non priviledged the performance degration is minimal. But this is not the case for I/O operations. Since a I/O device is shared with several virtual machines the Hypervisor has to make sure that all the I/O instructions are not harmful. So all of it has to pass throught the Hypervisor. Which can affect the I/O performance severly. Since there is extra processing that happens inside the Hypervisor for each I/O instruction it increases CPU overhead as well.

I/O virtualization is a well known bottleneck in a virtualization system \cite{xen} \cite{bridgegapsofthard}. When virtualization first started, those days devices did not offer hugh performance. For example harddisks had very high latency and very low bandwidth. And networking devices offered speeds measured in baud rate. But today the data storage and transmission mechanisms have all changed. Today we have hard disk interfaces that are capable of delivering data at 6 Gbps and network cards that are capable of offering speeds in the 10 Gbps ranges. To gain these performance benefits the hypervisor has to virtualize hardware I/O devices with least overheads otherwise it will not be able to meet the low latency and high throughput of todays high end devices. As in \cite{cloudbottle} the Eucalyptus cloud framework experiences performance degration. Upto 51\% and 77\% of the performance of non-virtualized disk writes and reads respectively and also 71\% and 45\% of the performance of non-virtualized transmit and receive workloads respectively have being discovered.

\section{How I/O is performed in the Xen Hypervisor}

The Hypervisor plays a important role in managing virtual machines. Several researchers use Xen as their prefered Hypervisor. Figure 1 shows the architecture of the Xen Hypervisor. The lowest level of it (Hypervisor) has direct access to the hardware. The hypervisor is the one that controls virtual machines. It runs in the most priviledged processor level.




\caption{The Xen Architecture \cite{softvio}}


~\\Xen supports para virtualization and full virtualization. For operating systems that cannot be ported into Xen for eg:- Microsoft Windows then such operating systems has to be run in Full Virtualization mode. This is slower than the paravirtualized model because all I/O operations has to go through the Hypervisor. If the underlying processor and chipset suports hardware virtualization you can get a performance level close to that of the para virtualized model \cite{highperf}.

\subsection{Xen domains}

Xen runs virtual machines in environments known as domains. They run above the level of the hypervisor. Communication from a domain to the hypervisor is done through synchronous hypercalls. Communication in the reverse direction happens through a mechanism similar to hardware interrupts\cite{osvmmbypass}. Domain0 or isolated driver domain (IDD) is the first domain to boot when Xen first starts. It directly attaches itself to the control interface of the hypervisor. Virtual machine management tasks such as create, terminate and migrate are handled by domain0. DomainU (Unpriviledged domains) runs at a less privilege level than domain0.

Xen's first implementation had its device drivers hosted within the hypervisor. But today the Xen uses a split driver model. This split driver model has several advantages. Firstly it seperates the hypervisor address space from the device driver address space thus it prevents hypervisor code from crashing due to buggy device drivers. Secondly it allows guest virtual machine transparent services. For eg:- Live migration. \cite{bridgegapsofthard}. If device driver functionality is implemented inside the hypervisor then for each new device that is made the hypervisor has to be modified so that that particular device would work with it. This is not a easy process. Usually device manufacturers make devices that can be run by current major operating systems like linux and windows. Since in Xen device access code is resided on domain0 and since it hosts a general purpose operating system like linux it supports a wide range of devices \cite{highperf}.

The VM guest domains (domainU) do not have direct access to the hardware. They have to go throught the domain0 and then through the hypervisor. This is a performance bottleneck in systems that depend largely on I/O operations. Hypervisor by pass is a solution for bypassing the Hypervisor for non priviledged I/O operations \cite{highperf} \cite{osvmmbypass}.

DomainU guests needs to implement the front end of the split device driver model. And the back end of it is implemented in Domain0. Domain0 is the domain that conducts I/O operations to hardware devices on behalf of domainU's. Interrupts from the device is received by domain0 and the guest operating system receives virtual interrupts through the front end driver. These interrupts travel through I/O channels. Virtual interrupts remain in the I/O channel until it is delivered to the target domainU. The front end and the backend drivers communicates through shared memory I/O channels.

Timesharing and space partitioning are a few of I/O device multiplexing/demultiplexing methods. The method to choose depends on the type of device. Timesharing is suited for network and space paritiining is suitable for storage\cite{highvmm}. When it comes to transfering data between domains Xen uses shared memory and asynchronous descriptor rings \cite{xen}.

\subsection{I/O Rings}




\caption{The Xen I/O ring \cite{xen}}



~\\Figure ~\ref{fig:ioring} shows the structure of asynchronous I/O rings. The hypervisor uses these I/O rings to make inter domain and Domain to Device asynchronous communication. If data communication is to happen between DomainA and DomainB (Both these domains belong to DomainU) then DomainA would place a request in the ring and DomainB would remove it from the ring. DomainA in this case is termed as the producer and DomainB as the comsumer. There are altogether five parts. The producer and the consumer forms two parts and the rest are start and end pointer's and the ring itself which is called the buffer.

\section{Hypervisor based I/O virtualization}

Virtualizing CPU and main memory comes very close to native performance since there is little overhead in virtualizing them \cite{bridgegapsofthard}. But virtualizing I/O using current methods consumes a very high proportion of system resources. The hypervisor acts as the multiplexer/demultiplexer of I/O operations and also provides security isolation.

The hypervisor may use one of the following techniques to make I/O access possible for virtual machines \cite{intelvtd}.


\item Emulation or full virtualization

\item Paravirtualization

\item Hypervisor bypass


\subsection{Full virtualization}

This technique emulates device behavior so that it can be shared among multiple virtual machines \cite{vmwareio}. The hypervisor usually emulates legacy devices in software over the physical device. The hypervisor does the multiplexing and demultiplexing of all the incoming and outgoing data to the device. A advantage of this approach is that existing device drivers can be used in guest operating systems without having to modify them. Virtualizing hardware in this mannar is complex bacuase you have to know all the hardware commands that can be issued by the guest operating system \cite{highperf}.

For the hypervisor to be able to virtualize a I/O device, it has to intercept all the I/O operations that are being issued by the guest operating system \cite{vmwareio}. A operating system uses the priviledged IA-32 IN and OUT Instructions to perform this. The hypervisor can be made to detect those IN , OUT instructions and trap them so that it can then handle them. Actually these trapped instructions are binary tranlsated into P98. However this result's in world switches. A world switch is the context switch between the host operating system and the hypervisor.

Full virtualization offers several advantages. The main one is that the guest operating system needs no modification to run on it. Also it offers the best isolation and security features that other systems such as paravirtualization have to take other additional steps to overcome this issue.

~\\ Emulation techniques \cite{highvmm}.


\item Binary rewriting : The hypervisor scans the instructions issues by the virtual machine to see weather any priviledge instructions are present. If it finds any priviledges instructions it rewrites to the compatible emulated version because priviledge instructions are not virtualizable. Priviledge instructions include I/O device access, BIOS access, memory access etc. So the hypervisor traps those instructions and rewrites them so that the virtual machine would then be accessing the virtual implementations of them without accessing the real native hardware. There is huge overrhead when priviledge instructions are being issued while doing a I/O intensive task.

\item Hardware assisted virtualization. Requires explicit support in the host CPU. Intel VT and intel VT-d \cite{intelvtd}.


\subsection{Paravirtualized IO}

Paravirtualization is the term used to refer to the changing of the guest operating system so that it would support virtualization. The guest operating system throught a change is made aware of running in a virtualized environment. This modification is actually making the guest operating system communicate with the underlying paravirtualized interface throught what is know as hypercalls. Hypercall provides a mechanism of making the virtual machine access hypervisor services such as device access, memory management unit etc. Paravirtualization is a improvement over the fully emulated I/O access technique. In paravirtualization the hardware drivers must be explictly designed for virtualized environments thereby minimize the slow I/O operations and it can improve performance.

Even though it improves I/O operations it's performance depends on CPU speed as for its

I/O operations CPU intervention is needed \cite{highquality}. However the cpu usage is still lower than the fully emulated device approach.

\subsubsection{Overheads found in the paravirtualized driver in xen}

Comapred to direct I/O the paravirtualized driver model is having more overhead. This is due to the fact that the direct I/O model consist of only one driver domain and the paravirtualized driver model consist of two driver domains \cite{bridgegapsofthard}.


\item Copy overhead - Usercopy and Grantcopy are the two copy types that occur in any computer system. In paravirtualized driver model hypervisor's, this is found to consume more cpu cycles than native linux. The main reason behind this is due to the different alignment of data source and destination memory addresses. Memory addresses of source and destination must have the 64 bit word alignment in intel processors. Proper alignment can reduce this overhead. This is done by modifying the front end driver so that it would copy the data into properly aligned memory address. Experimental results show that proper alignment can reduce this overhead by a factor of two. The performance can be further increased if the data is properly aligned in the cache as well.

\item Kernel overhead - The main reason behind high kernel overhead is due to the use of fragments in guest socket buffers. Xen attempts to improve the efficiency in packet processing by making use of large packets that span multiple pages.

\item Hypervisor overhead - Most of the CPU overhead caused by the hypervisor is due to domain0 execution, grant operations and schedule functions. The highest overhead out of these are the schedule functions. This is due to increase cache misses. Since there are two drivers the front end and back end, they can run on two cpu cores causing domain related datastructures to move between cpu cores.

\item Driver domain overhead - The driver domain kernel consumes twice more CPU cycles than the native linux kernel for network device usage. The driver domain processes the lower end of the network stack but two device drivers are being used, the front end and the backend. The netfilter and bridge functions is the main reason for the high cpu usage. By disabling the bridge and netfilter rules network performance can be increased. But to disable them the kernel has to be configured correctly.


\subsection{Hypervisor bypass IO}

In virtual machines the hypervisor managers I/O communication. This I/O access approach makes sure that they are performed safely and do not compromise security of the system. Therefore device access require context switch between the guest virtual machine and the hypervisor. This can lead to longer latency and higher cpu usage compared with non virtualized environments. In some hypervisor architectures there exists another virtual machine along the I/O access path. Eg:- Domain0 in xen hypervisor. This can degrade system performance a lot because such implmentations require a context switch between host and two other virtual machines in addition to a world switch. To share a single physical device with multiple guests the hypervisor has to multiplex and demultiplex its access. For large I/O tasks this can become CPU bound. So as a result I/O performance will depend on the speed of the CPU. These drawbacks can be eliminated if we can by pass the hypervisor for non privileged I/O operations \cite{highperf}. In addition to that with hyperisor bypass we can eliminate context switches between two virtual machines and also between a virtual machine and hypervisor \cite{highperf}.

~\\Problems with hypervisor bypass being implemented without device support \cite{bridgegapsofthard}.


\item lacks device driver isolation for safe access

\item lacks supports for guest virtual machine transparent services


\subsubsection{Paravirtualized Hypervisor bypass}

Liu, Huang, Abali and Panda has come up with a technique that uses paravirtualization based driver model to bypass the hyperisor when accessing infiniband device \cite{highperf}. There are two components in this system. The guest module and the backend module. The guest module is implemented in the guest os and the backend module is implemented in domain0. The backend module can also be implemented inside the hypervisor. The frontend module interfaces with the guest OS and the backend module interfaces with the underlying hardware. The backend module acts as the proxy for hardware access Figure ~\ref{fig:vmmbypassio}.




\caption{Hypervisor bypass I/O \cite{highperf}}



~\\Since this runs on the split driver model it poses some diffculties for user level direct access to the host channel adaptor of the infiniband device from guest domain. To resolve this special HCA resources such as UAR's and QP/CQ buffers are being used.

Communication in infiniband devices are done using a queue based approach. The Queue Pair which consists of the received queue and the send queue is used to hold receive instructions and send instructions respectively. The QP and CQ bufferes are made accesible to the guest domain by allocating the QP and CQ bufferes in the domainU and registering through the IDD and finally the front end sends the QP/CQ creation commands to the IDD.

There are two types of access methods used here. They are privileged I/O access and VMM bypass access. The front end module handles all the privileged access. Privileged access gies through the backend and hypervisor. see Figure ~\ref{fig:vmmbypassio}. All the I/O operatings that pass through the backend module are subjected to access checks to maintain system safety.

This method requires hardware support to implement. Therefore this method cannot be made to run with devices that does not have onboard resources such as UAR's and QP/CQ buffers. Lei Xia, Jack Lange and Peter Dinda \cite{towardsvpass} have come up with a technique that does not require hardware support but with a exception. It requires a completely new hypervisor architecture. They have developed their own hypervisor for it.

\subsubsection{Direct I/O}

The paravirtualzed bypass I/O model can only be benefited by operating systems that can be ported into it. Eg:- linux. Operating systems like windows that we cannot modify cannot take advantage of this technique.

In direct access approach a virtual machine is given dedicated access to a I/O device. The guest virtual machine is then able to control the device without having to go throught the hypervisor. This can give near native performance but the usefullness of virtualization is lost because the device can only be used by one virtual machine. Direct I/O can be achieved by using software based approach \cite{towardsvpass} \cite{highperf} or by using hardware based approach \cite{standardized} \cite{sriovnet}. Software based approach has high performance overheads. But in the future this will be a good option as more and more cpu cores will be made available on comuter systems. Seperate CPU cores can be assigned to perorm I/O processing.

In hardware based approach the device presents multiple logical interfaces that would make the virtual machine access it directly without having to go throught the virtualizing interface \cite{bridgegapsofthard}.

Even with direct I/O there are performance overheads \cite{highquality} \cite{bridgegapsofthard}. Most of the overhead is due to time related functions, hardware interrupt processing, and hypervisor entry and exit due to interrupts and hypercalls \cite{bridgegapsofthard}. They have come up with techniques for improving direct I/O performance. Device semantic preservation, technqiues of avoiding virtualization holes , hypervisor domain scheduler extensions are a few such techniques and also provide the foundation for SR-IOV.

~\\Direct I/O overhead reduction methods


\item Device semantic preservation

\item Reseting device state

\item Interrupt sharing

\item Caching of shared memory


\subsubsection{Self virtualizing devices. Hardware based approach for Direct I/O}

In current hypervisor the I/O virtualization part is done in a seperate driver domain (xen based approach). Devices that have onboard processing and memory can perform the I/O virtualization part on the device iteself thus offloading this processing part from the hypervisor.

~\\I/O virtualization functions for a network interface card \cite{tnic}


\item NIC virtualization

\item Packet Switching

\item Data Transfer

\item Traffic Management


Above are a few I/O virtualization processing done in the hypervisor. Imagine if we can remove such processing from the hypervisor and put them onto the device we could improve I/O performance \cite{standardized} \cite{sriovnet} \cite{tnic}. The devices that support onboard processing of such operations are termed as self virtualizing devices.


PCI-SIG is a group that proposed extensions to the PCI express specification to allow a PCI express device to be shared among multiple virtual machines and this specification is called single root I/O virtualization SR-IOV see Figure ~\ref{fig:sr-iov}.

~\\SR-IOV functions \cite{intelvtd} \cite{sriovnet}.


\item Physical function - It is a pci express function when used in the context of SR-IOV this is termed as physical function to deferentiate it from virtual function. Physical Functions are discovered, managed, and configured as normal PCI devices.

\item Virtual function - These functions process I/O. Each Virtual Function is derived from a Physical Function. There is usually a limitation on the amount of virtual functions a device may support. A single I/O port may be mapped to several virtual functions. Virtual functions perform better than paravirtualization or fully emulated solutions. Their performance approaches close to native performance.


With direct device assignment a I/O device is directly attached to a virtual machine. There after it cannot be used by other virtual machines. SR-IOV is a standardized method of sharing a physical function of the I/O port of a device without having to emulate it in software. In otherwords it allows the sharing of a I/O device by still giving more than one virtual machine direct access to it. A number of virtual functions are assigned to a physical function and each virtual function is directly assigned to a virtual machine thus giving a illusion of having more than one physical device. The core SR-IOV is implemented in the PCI subsystem. Software support is needed for physical function and virtual functions.

A huge benifit that virtual functions provide is that is forms the basis for Direct I/O. Each virtual machine will get increased I/O throughput with lower cpu utilization compared with software based approaches to I/O virtualization.Anoither benifit is that read and writes to onboard device registers need not be trapped and emulated in software. The device MMIO space is mapped to guest memory space using CPU paging features \cite{sriovnet}. This avoid's CPU intervention and boosts performance.

When a hardware device needs the attention of the operating system it issues a interrupt. The same thing has to happen in virtualized environments as well. The hypervisor acts as the interrupt manager it forwards interrupts to the correct virtual machine. In hypervisors a primary bottleneck in I/O is due to interrupt remapping latency \cite{sriovnet}. With virtual functions and IOMMU support, this bottleneck can be eliminated.




\caption{SR-IOV architecture \cite{sriovnet}}



~\\Xen can give domains access to PCI devices throught the use of paravirtualized drivers. Even thoe direct I/O privides better performance over paravirtualized device driver model its performance is still lower than native device access performance \cite{bridgegapsofthard} P98(write about some drawbacks performance hits in sriov).

\section{Other I/O subsystem optimization techniques}

\subsection{Sidecore approach}

Current hypervisors are having monolithic behavior. All the cores in the system execute the same hypervisor functionality. A few noted overheads of monolithic architecture are cache pollution , TLB thrashing, frequent processor state changes and costly core synchronization operations.

VMEntry is the transition from hypervisor to guest and VMExit is the transition from guest to hypervisor. During a VMExit operation the processor has to save the state of the exited virtual machine and load the state of the hypervisor. During a VMEntry the processor has to restore the virtual machines state and save the state of the hypervisor. With sidecore technique we can reduce the costly VMEXit and VMEntry operations to one \cite{highvmm}. A hypervisor consists of multiple components. These components are assigned to seperate cores. The process that issues hypervisor calls are assigned to run on a seperate core. This process is able to always run in hypervisor mode so there is no need for it to issue VMExit and VMEntry instructions.

In the future processors would consist of hundreds of cores. Each core can be assigned to execute a subset of the hypervisor functionality. Moving out Functions such as device polling or virtualization of device interrupt's which is used by hypervisor by pass I/O to seperate cpu cores will further improve system performance. The sidecore concept actually puts the hypervisor and its virtual machines into a client server type paradigm. The hypervisor can be thought of acting like a server where it responds and gives services to guest requests. A guest virtual machine may request service from a sidecore and this is called a sidecall.

One disadvantage of using the sidecore approach is that it requires modification to the guest operating system. However this modification is not as great as the modification required by the paravirtualized apporoach \cite{rearc}. Another drawback is that the cpu has to be continously polled to see weather there are any I/O requests \cite{rearc}. This can be mitigated if the CPU supports special instructions such as monitor/mwait. Recent CPU's contain these instructions so they can take advantage of this. Also the CPU core that executes sidecore functions cannot execute normal instructions \cite{rearc}.

\subsection{Domain0 optimizations}

\subsubsection{Packet Switching Optimization}

It has been found by experiment that the CPU consumption for small packet processing in domain0 is more than domainU \cite{softvio}. This is due to packet switching technique employed by the Xen hypervisor. The Xen hypervisor uses the linux bridge component in the linux kernel to perform multiplexing and demultiplexing thus removes the need of implementing its own and results in a simplified virtualization design.




\caption{ Linux Bridge vs Tailored Bridge \cite{softvio}}



~\\Figure ~\ref{fig:netbridge} shows the simplified bridge. Its kernel user interface has to be the same as the original one. As seen in the figure most of the functions of the netfilter interface are bypassed. This keeps the new design very simple to implement and CPU usage is reduce by a factor of 10.

\subsubsection{Seperating driver domain from Domain0}

~\\Domain0 hosts a general purpose operating system therefore its kernel is a full fledge one. It supports not only device driver functionality but also the running of standard linux utilities as well. Having a full fledged kernel in driver domain0 is not necessary because we only require a subset of a full fledged kernel. Having a full fledged kernel increases memory consumption and also processing time which results in increased overheads. Therefore seperating out the driver domain from domain0 is seen as a improvement \cite{bridgegapsofthard}.

\subsection{DomainU Scheduling}

The hypervisor manages the execution of domainU's. So there has to be some scheduling mechanism to make this possible. Researches have found that they can further optimize the I/O performance by performing optimizations to the current scheduling mechanism employed in the hypervisor \cite{highquality} \cite{sheio}.

~\\Scheduling techniques used in Xen \cite{sheio}


\item Borrowed Virtual Time

\item Simple earliest deadline first (SEDF) scheduler

\item Credit scheduler


\subsubsection{A brief overview of the xen credit scheduler}

The credit scheduler is a proportional fair share CPU scheduler and is the most widely used scheduling algorithm in xen. The system administrator must give credit's to each domain. Also the system administrator can change the priorities of each domain. We can increase the fraction of time a domain executes on the processor by giving it more credit. There are two states that a domain can be with respect to credits. They are OVER and UNDER. UNDER means they have credits remaining and Over means they have consumed credits more than that has been allocated. Domains with under credit is choosen to run before the domain with Over status. As a domain runs it consumes credits. Domains are run in FIFO order by state. OVER domains are only selected to run if there are no UNDER domains in the run queue.

Still there is a problem of high response latency in I/O. To minimize this effect a additional state is introduced to each domain. The BOOST State. A domain in the BOOST state will be given higher priority by the scheduler thereby reduces the I/O response latency.

\subsubsection{Scheduling problems in Xen}

Usually processor scheduling is concerned as the top requirement in hypervisors. I/O scheduling is concerned as a secondary requirement. This can lead to poor I/O performance and overall reduction in application performance because for a application to perform well both the processor and I/O tasks has to be scheduled well. This is strongly seen in applications that deal with high I/O bandwidth and low latency.

The Xen scheduler does not treat domain0 as a special domain \cite{sheio}. However you can increase the fraction of time that domain0 spends on the processor by giving it more credit.

\subsubsection{Scheduler enchancement methods for I/O}.

~\\A list of scheduler enhancement methods \cite{sheio}.


\item Fixing Event Channel Notification P98

\item Minimizing Preemptions

\item Ordering the Run Queue

\item Scheduling Based on Deadlines


\subsubsection{Task aware virtual machine scheduling}

A hypervisor selects a domainU to run based on its scheduling policy. Once a domainU is selected, the scheduler in its operating system selects which process to run. The domainU operating system scheduler has knowledge about its workload and so it can schedule without degrading system performance. But since domainU is selected by the hypervisor in the middle of doing a critical I/O operation the hypervisor could perform a context switch and load another domainU to run. To avoid this the hypervisor has to be made aware of the guest level tasks \cite{task}. One way of doing this is to make the guest level scheduler inform the hypervisor about its I/O bound tasks but this has two drawbacks. One is that the guest operating system scheduler needs to be modified. Such modification is not feasible since there are a large number of operating systems and also operating systems whose source code is not available cannot be modified. The second reason is that domains has to be trusted.

During the execution of the guest its scheduler should access the MMU when switching tasks. This is because a operating system keeps tasks in private virtual address spaces and this is provided by the paging facility of the MMU. Since the hypervisor virtualizes the MMU it can monitor the access of MMU at the virtualization level in a non invasive mannar. Since this is implemented in the virtualizing layer various domainuU level operating systems can take advantage of it.

~\\Grey box criteria \cite{task}.


\item The kernel policy for I/O-bound tasks

\item The characteristic of I/O-bound tasks


\subsection{Hardware Assisted Virtualization}

Full virtualization and paravirtualization are software techniques to make virtualization possible. With hardware assisted virtualization the hypervisor layer can be made thinner by moving out hypervisor functions that are performed in software, to the CPU thus making the hypervisor simple and robust implementation. Usually one instruction that a hypervisor issues will be translated into multiple CPU instructions. So with harware assisted virtualziation the hypervisor has to only issue that special CPU instruction to perform the job. Hardware assisted virtualization does not include I/O device support. However there are techniques to accelerate I/O using direct I/O \cite{intelvtd} and hardware IOMMU's \cite{iommu}.

When compared to para virtulization, hardware assisted virtualization has some trade offs. It allows the running of unmodified operating systems thus allows us to run proprietary operating systems with increased speed. Running unmodified operating system means that it is not aware of running in a virtualzed environment so making use of special virtualized hardware features in I/O devices difficult. Paravirtualized systems can take advantage of hardware assisted virtualization too. This makes use of the advantages of both approaches and is called hybrid virtualization.


Before hardware support for virtualizing the memory management unit existed, it was virtualized by the hypervisor. But today we are able give support to the hypervisor by virtualizing the MMU in hardware \cite{intelvtd} thus providing the ability to offload memory protection and tranlation tasks from the software based MMU to hardware. Support from CPU, Chipset and BIOS is needed to make this possible. By moving out this functionality from the hypervisor we can keep the hypervisor small and need not be concerned about low level hardware details except those that are absolutely needed.

\subsubsection{How IOMMU helps the hypervisor}

IOMMU's are hardware devices that translates DMA addresses of device's to physical memory addresses and provides the basis for direct I/O. They also provide isolation between virtual machine's with direct device access. This is achieved as follows. When a device attempts to access main memory it uses I/O page tables to verify weather the access is legimate. To reduce overhead the hardware may decide to cache them in IOTLB's \cite{iommu}.

\subsubsection{Reasons for using IOMMU}


\item To achieve memory isolation - The hypervisor should at all time prevent a virtual machine from causing a device to DMA into a memory region that it does not have authority to access.

\item To achieve fault isolation - Sometimes due to a translation error that happens in the IOMMU a virtual machine has the possibility of writing into a memory region that is owned by another virtual machine. It is advisable to take the virtual machine that caused this error offline but it is not acceptable to bring down other virtual machines.


\subsection{Software I/O improvements to the Xen hypervisor}



\item Move Data Copy to Guest - P98

\item Extending Grant Mechanism

\item Support for Multi-queue Devices

\item Caching and Reusing Grants


\subsection {Virtual Machine Disk Scheduling and its effect on Overall VM Performance}

Several disk scheduling methods are being used today. In linux for example Noop scheduler , Completely Fair Queuing scheduler , the Anticipatory scheduler, and finally the Deadline scheduler are some of the schedulers used in linux \cite{diskpasse}



In this section we describe the results.



We worked hard, and achieved very little.




HCA -host channel adaptor

UAR - user accss region