This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Cloud Computing offers wide computation and resource facilities for execution of various application workflows. Many different resources involved in execution of single workflow. Cloud Computing offers highly dynamic environment in which the system load and status of resource changes frequently. As the workload increases with increase in Cloud Services and clients there is a need to handle these requests or jobs. It needs to schedule them first to execute on different available VMs. The execution of cloud workflows faces many uncertain factors in allocating and scheduling workload. The first step is to provide an efficient workflow allocation model by considering the client's requirements. Workflow scheduling model will schedule jobs in such a way that all the jobs will get executed taking minimal possible time, maintain QoS and satisfy client's requirements.
Cloud Computing is a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted virtualized, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet . Cloud computing can also be defined as the new way of computing or a new way of using hardware and software resources. In Cloud computing user by sitting on his/her computer system and by using Internet and an application (commonly browser) can access number of services provided by various cloud providers.
One from many definitions of workflow, a Workflow is defined as the automation of a business process, in whole or in part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules .
A workï¬‚ow models a process as consisting of a series of steps that simplify the complexity of execution and management of applications .
Scheduling is nothing but a set of task versus set of processors and a workflow scheduling can be defined as the automation in scheduling of workload. A scheduling can be categories into two categories: Job Scheduling and Job Mapping and Scheduling. Job Scheduling is what in which independent jobs are scheduled among the processors of distributed computing for optimization. A Job Mapping and Scheduling requires the allocation of multiple interacting tasks of a single parallel program in order to minimize the completion time on parallel computer system .
A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. Processes in fixed set are statically assigned to processors, either at compile-time or at start-up. There are two types of scheduling: static and dynamic. In static load balancing, all information is known in advance and tasks are allocated according to the prior knowledge and will not be affected by the state of the system. Dynamic load-balancing mechanism has to allocate tasks to the processors dynamically as they arrive. Redistribution of tasks has to take place when some processors become overloaded .
There are various algorithms and models defined for workflow scheduling.
POSEC and Pareto Analysis
Especially in cloud, when to talk about Job mapping and Job mapping and scheduling, there are two algorithm which works on these jobs for the optimization scheduling: one is algorithm based on POSEC method  and the other is algorithm based on Pareto analysis. POSEC is Prioritize by Organize, Streamlining, Economizing and Contributing. The POSEC method prioritizes the jobs on the basis of their parameters like organize, streamlining, economizing and contributing and then categorize the jobs into four level keeping its urgency and importance as parameter. The scheduling of jobs is then applied on these levels for optimal execution. The objective of this algorithm is efficient time management and load balancing. There are Four Quadrants of Decision Making: It needs two types of Priority Scores to take decision, Urgency Score and Importance Score. Urgency Score given by Cluster Member of cloud. Importance Score is given by Cloud Resources Manager . There are Four Quadrants of Decision Making:
Level 1: Low Urgency & Low Importance
Level 2: Low Urgency & High Importance
Level 3: High Urgency & Low Importance
Level 4: High Urgency & High Importance
According to Pareto Analysis 80% of tasks completes its execution taking 20% of time and rest 20% jobs will take up rest 80% of time for their execution. This principle is used to sort tasks into two parts. According to this form of Pareto analysis it is recommended that tasks that fall into the first category be assigned a higher priority. The 80-20-rule can also be applied to increase productivity: it is assumed that 80% of the productivity can be achieved by doing 20% of the tasks. If productivity is the aim of time management, then these tasks should be prioritized higher. If the higher priority jobs are put into first 80% jobs category then the execution of jobs takes very less time as the important or prioritized jobs will execute first and thus model build is more optimal. It has been found that algorithm based on Pareto Analysis take less time to execute the same set of jobs as executed using algorithm based on POSEC method .
Hierarchical cloud workflow scheduling schema
The Cloud workflow system can coordinate multiple job submissions over cloud services. The goal of Cloud workflow scheduling schema is to make sure the proper activities are executed by the right service at the right time. Another way to achieve optimal schedule is by using hierarchical cloud workflow scheduling schema. According to this schema the whole workflow scheduling is divided into three stages: at very first stage whatever job requests are coming, it look for the parallel jobs and then splits all the parallel jobs. Each job needs some resources for its execution during its second step matching of jobs with corresponding candidate services takes place. This means the resources are assigned to different jobs as per their requirements. And in the third/last stage a scheduling algorithm is applied for the execution of these jobs. This Scheduling algorithm can be any depending on the requirements and for the optimality. This Algorithm focus on the hierarchical Cloud service workflow scheduling, Cloud workflow tasks parallel split, syntax and semantic based Cloud workflow tasks matching algorithm, and multiple QoS constraints based Cloud workflow scheduling and optimization, and also presents the experiments conducted to evaluate the efficiency of our algorithm. Using Heuristic Generic algorithm this scheme can achieve an optimal workflow schedule .
One-port model and Multi-port model
Specifically for the linear workflow, a linear workflow is what in which dependencies between stages can be represented by a linear graph, and to schedule such workflow two methods are used: one-port model and multiport model. One-port model is one in which each processor can perform computation, receiving incoming task or sending output one at a time only. There is no parallel processing in one-port model but in case of multiport model a processor can perform multiple operations like computation, receiving input etc at one time. Multiport model allows multiple incoming and outgoing at the same time. These two algorithmic models are useful for linear workflow optimization and helps in minimizing the latency . This will lead to an optimal workflow scheduling for linear workflow.
Activity based costing in cloud computing
Activity based costing in cloud computing is another model for optimal workflow scheduling. This Activity based costing is the way to measure both cost of the object and its performance. According to Activity based costing model, a task can be evaluated separately on the bases of their resources, space and time taken to completely execute, as shown in the figure below. A job can be categorized on the basis of Available and Partially Available factor.
An Available job is one whose all required resources for execution is available at same data center only and a Partially Available job is such whose required resources for its execution is not present on single data center which means resources are scattered among different data centers. For a Partially Available job resources need to be collect from different data centers for its execution. The jobs are further subdivided on the basis of their dependencies as Dependent and Independent jobs . The scheduling can be done at this bottom level for the optimality of workflow scheduling.
Two levels task scheduling mechanism based on load balancing
Two levels task scheduling mechanism based on load balancing is one way for task scheduling optimization in cloud computing system. According to this method tasks are scheduled at scheduling optimizer and this scheduling optimizer take information from system model and predicted execution time model which keep track of all the resources and predicted execution information. Scheduling optimizer itself checks whether if the current prepared scheduled is optimize or not if not it will regenerate the optimal schedule . Using this model the execution time decreases and also the resource utilization increases.
A cloud computing is a very big network where the millions of users accessing thousands of servers all times. These Servers may be present at single place or may be at different geographical places. These users send their request onto cloud server for processing and the execution of tasks/jobs are done at the cloud server. As the numbers of users are very large thus their requests in the form of tasks/Jobs, will also be very large. Scheduling the tasks/Jobs at the server end is very difficult for a server, because the requesting task/jobs are very large in number and each requesting job/task needs some computing or storage space resources to get executed. Here its work of a Scheduler to allocate required resources to the requesting job/task. The schedule thus build by the server must be optimal and good enough so that each request by the user gets response in time, and every Task/Job gets proper resources for its execution.
To overcome this scheduling problem and provide each job/task a better resource with minimum execution time, a Prioritized Workflow Scheduling Algorithm is proposed and implemented. The algorithm works on following three steps:
Step 1: whatever jobs/tasks are coming first of all cluster these jobs on the basis of their attributes.
Step 2: within these clusters apply priority, which means each job/task will assign some priority and on the basis of their priority, higher priority job/task get executed first.
Step 3: after prioritizing these jobs/tasks within the clusters, assign these jobs/tasks to particular number of VMs which are capable of performing operations and get these jobs/tasks execute.
Fig 4.1: A proposed model (Prioritized Workflow Scheduling)
SIMULATION AND RESULT
CloudSim v3.0.2 is used to implement Workflow Scheduling in Cloud Computing. The simulation is performed on a computer running Window 7. The configuration of computer is as Processor- IntelÂ® Coreâ„¢ i3-2350M CPU @ 2.30GHz 2.30GHz Processor, RAM- 4GB DDR3 Main Memory, HDD - 500GB 5400RPM Hard Drive.
Virtual Machine: A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine.Â A virtual machine was originally definedÂ as "an efficient, isolated duplicate of a real machine". Current use includes virtual machines which have no direct correspondence to any real hardware.
Table 5.1: Configuration of VMs
No. of Processors
Cloudlet: Cloudlet will work as Input job/task to the Cloud Environment. Cloudlet is an extension to the cloudlet. It stores, despite all the information encapsulated in the Cloudlet, the ID of the VM running it.
Table 5.2: Comparison of Execution Time with constant number of VMs
No. of Cloudlets
Execution Time using Proposed approach
Execution Time using Simple approach
Table 5.2 shows the Comparison of Execution time of cloudlets when executed by applying prioritized workflow algorithm with the sequential workflow execution algorithm keeping constant number of VMs as 50. Fig 5.1 shows a graphical representation of comparison between two algorithms keeping number of VMs constant.
Fig 5.1: Comparison of Execution Time with constant number of VMs
Table 5.3: Comparison of Execution Time with constant number of Cloudlets
No. of VM
Execution Time using Proposed Approach
Execution Time using Simple Approach
Table 5.3 shows the Comparison of Execution time of cloudlets when executed by applying prioritized workflow algorithm with the sequential workflow execution algorithm keeping constant number of Cloudlets as 500. Fig 5.2 shows a graphical representation of comparison between two algorithms keeping number of Cloudlets constant.
Fig 5.2: Comparison of Execution Time with constant number of Cloudlets
After the simulation, it is observed that the Prioritized workflow scheduling in cloud computing improves the execution time of jobs/tasks when compared with simple First come first serve (sequential) approach. The execution time is improving with increase in number of requesting jobs/tasks.