Enhanced Critical Path Method for Scheduling Workflow Applications in Cloud Computing

4740 words (19 pages) Essay in Information Technology

23/09/19 Information Technology Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

     Enhanced Critical Path Method for Scheduling        

        Workflow Applications in Cloud Computing

AbstractCloud computing is a pool of virtualized resources shared over the internet. It relies on sharing of resources between different tasks or applications, to achieve economy of scale and maximum utilization of resources. In today’s cloud computing environment, management of tasks and optimal utilization of resources is of utmost importance. The focus is to complete the tasks in time, as well as reduce the costs required for the task completion. As seen in the existing implementation of critical path scheduling algorithm, the total cost of execution of the workflow is reduced since the most cost efficient virtual machine is picked up every time a task is scheduled. This greatly reduces the execution cost, but also creates a bottleneck since all tasks are queued for the same VM. To reduce the trade-off between cost and time, Enhanced Critical Path Method (Enhanced CPM) is proposed, which will reduce the cost, at the same time will also improve the execution time by trying to load balance the tasks in different virtual machines available, in order of their increasing execution cost. This uses the concept of a knapsack, which picks the VMi of cost wi of task executionsuch that tasks in the same depth execute tasks optimally within the total weight capacity(deadline) of knapsack (which is depth). Experimental result shows that the enhanced algorithm reduces the execution cost as compared to the existing min-min, max-min algorithms.

Keywords—Critical Path Method, WorkflowSim, Knapsack, Cloud Computing.

I. INTRODUCTION

Cloud computing is the delivery of computing services—servers, storage, databases, networking, software, analytics, intelligence and more—over the Internet (“the cloud”) to offer faster innovation, flexible resources, and economies of scale. [1]. According to the National Institute of Standards and Technology (NIST): “Cloud computing is an emerging technology for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”[2]. Cloud service providers provide resources to users as per the request, which is a pay-as-you-go model. Cloud service providers include Amazon, Microsoft, Google, IBM and Citrix. The main goal of virtualization is to manage workloads by radically transforming traditional computing to make it more scalable. Virtualization allows multiple virtual machine instances to run concurrently on a single computer[3]. Basic concept of the virtualization is implemented by using a software is called the hypervisor [3]. Virtualization hypervisor technologies like Xen, KVM and VMware have become important in the cloud computing environment. With virtualization, we can achieve maximum possible utilization of available hardware resources by optimally scheduling tasks submitted by the users. The main objective functions which are considered during the task scheduling are reducing meeting deadline, load balancing, reducing execution cost and the make span.

The outline of the paper is as follows: Section II describes basic scheduling and its types. Section III details the related work in this area. Section IV explains our system model. Proposed scheduling algorithm is discussed in the section V. Performance evaluations and results are presented in section VI. In the last section VII, conclusion and future scopes in the proposed work is discussed.

II. BACKGROUND

A. Scheduling

Scheduling is a process that maps and schedule tasks of workflow application on distributed resources [5]. The tasks should be scheduled in such a way that the tasks should be completed within the deadline specified by the customer. The performance of task scheduling largely depends on the scheduling approach used for task scheduling. The scheduling approach can have significant impact on the performance of the cloud computing. In cloud computing paradigm, task scheduling can be categorized into two types: Independent and Dependent tasks scheduling. These can be further classified as static and dynamic scheduling of tasks.

1. Independent Scheduling

Tasks are run independently in this type of scheduling. There is no dependency data between tasks in the application. So the tasks can be scheduled in any order as long as they complete within the deadline.

2. Dependent Scheduling

Workflow applications tasks are dependent on each other. The inter-dependency of tasks is due to the data transfer that takes place between the tasks.

3. Static and Dynamic Scheduling

While the timer triggers dispatch based on pre-defined mapping and is predictable, static scheduling uses a priori knowledge about deadlines and arrival times. Dynamic scheduling uses information as available and useful in reaction to unpredictable events.

4. Preemptive vs non-preemptive Scheduling

Since tasks schedule depending on priority, preemptive scheduling results in unpredictable delays while the tasks are suspended and other tasks are executed. In case of non-preemptive scheduling, the running task is not preempted by any other task.

B. Related work

Yu and R. Buyya [4] worked on meeting the user specified deadline and aimed at reducing execution cost. Their proposed algorithm uses divide-and-conquer approach to solve scheduling problem. The proposed algorithm solves better than greedy-cost based algorithm as per the results. M.Rahman and R. Buyya [5] proposed algorithm to efficiently allocate resources to tasks in critical path within given deadline. The proposed algorithm performs better than the heuristic and meta-heuristic scheduling algorithms according to experiments, irrespective of the size of  workflow application. Workflow application cost is not considered in this paper by author. T. Amudha, T.T. Dhivyaprabha [6] focused to schedule high priority task to on most efficient resources for maximum utilization rate of resources. Proposed algorithm avoid resource load balancing problem and reduce makespan, but ignores cost of resources. Compared to the Max-Min and Min-Min algorithms, this algorithm performs better by reducing makespan as per the experiments. R.N. Calheiros and R. Buyya [7] use replicate tasks on resources and thereby the idle time of the resources is reduced. Algorithm increases chance of meeting deadline. With the increase in the budget of the workflow execution, the proposed algorithm performs better than the IC-PCP algorithm. In another paper [8], algorithm is proposed for selection of best resource from public or private cloud for cost saving. The method gives better results than greedy algorithm. In this paper, real scientific workflow applications are not considered. In paper [1], a critical-path based scheduling algorithm with which cost of execution of the scientific workflow application is greatly reduced. The execution time however, increases and load balancing is not achieved.

IV. SYSTEM MODEL

System model has the definitions and the problem statement. We have one data center in our cloud computing environment. Datacenter is comprised of various resources. The resource can be physical machines, or virtual machines hosted on physical machines. A workflow is a set of dependent tasks with a parent-child relationship. As shown in Figure 1, the nodes represent tasks T= {T1, T2, T3, …Tn} in the workflow. The edges denote the data dependencies between tasks.

Fig.1. Example DAG of Epigenomics workflow [15]

 

Task Length is the length of the task given in Million Instructions (MI). Total Cost, C, is the overall execution cost of workflow application. We calculate total cost by adding execution cost of each virtual machine(VM) used to execute some task of workflow application. Each cloud service provider provides different cost resource depends on its characteristics, configurations and type.[1] For an ith task Ti, let execution cost of a task on a VMi be ECi. Let TLi be the task length of the task scheduled for execution on VMi. Let computing power of the VM be PCi. The formula for calculating the execution cost is as follows-

    ECi=TLi/PCi               - (1)

Where TLi is in Million Instructions(MI) and PCi is in Million Instructions per second(MIPS). Execution time is the current simulation time of the workflow(clock) along with the queue delay for a particular task based on the depth(level). Execution time and cost are conceptually similar, they differ when you process tasks in parallel using different resources and also take queueing time into consideration. Cost is bare addition of maximum amount required for the workflow to complete in absence of scheduling, while execution time is parallel processing time. Critical path strategies have been used in scheduling of interdependent tasks on available resources[9]. Main motive of the critical path is to find the longest execution path (critical path) from entry task to the exit task of workflow application and schedule the critical tasks in critical path on the most efficient VM which assures timely workflow completion for the entire workflow application[10]. Critical Path is shown in the below Figure 2.

Fig.2. Critical path in a workflow[16]

To calculate critical path, we have to calculate Early Start Time (ES), Early Finish Time (EF), Late Start Time ( LS) and Late Finish Time (LF) for each task in given workflow application [1].

To make use of the knapsack concept in the algorithm, let us understand how this is mapped to the problem statement. A knapsack item is defined by its type, weight, and value. For our problem, we define a scheduling knapsack item SKIj = (VMTj, NTj, Cj) where the item type corresponds to a VM type VMTj , the weight is the maximum number of tasks NTj that can run in a VM of type VMTj before their deadline, and the value is the associated cost Cj of running NTj tasks in a VM of type VMTj . We define the knapsack weight limit as the deadline of the depth of workflow (depth becomes the knapsack). The goal is to find a set of items SKI so that their combined weight (time required to execute the tasks at that depth) is at least as large as the knapsack weight limit (the depth deadline) and whose combined value is minimum (the cost of running the tasks is minimum). Formally, the problem of scheduling tasks will be expressed as-

Minimise the cost ∑vixi subject to ∑wixi <= W where i=1 to n tasks, thereby maximizing the profit.

V. EXISTING AND PROPOSED ALGORITHM

In this section, the existing algorithm in section A[1] is for finding the critical path and scheduling the workflow application tasks on resources.

 

 A) Find the critical path-

 Step 1: Give a workflow application graph as an input to the algorithm

Step 2: Do the following steps to calculate the critical path

 1) For loop i= 1 to n in forward direction (this is the forward pass) 

 a) If task i has no predecessor then set

     ES (i) =0

    EF (i) = ES+ t (i) where t (i) represents time

                    duration of task i

 

b) Else we pick highest early finish time (EF)

     ES (i) = max [ES (j) + t (j)]

     where j=predecessor tasks (parent tasks) of task i

     EF (i) =ES (i) + t (i)

 

2) For loop i= n to 1 in backward direction (this is the backward pass)

    a) If task i has no successor then set

        LF (i) = EF (i)

        LS (i) = LF (i) – t (i)

   b) Else we pick lowest latest finish time

        LF (i) = min [LF (j) – t (j)] where j=successor

        tasks(child tasks) of task i

        LS (i) = LF (i) – t (i)

3) Condition for checking critical path tasks

    a) If (LF (i) – EF (i) ==0) //which means there is no slack

                      Return critical path tasks

    b) Else

                     Return Non-critical path tasks with slack

The above algorithm performs a forward and backward pass on the workflow. For the forward pass, ES and EF are calculated in the forward direction. Similarly, LF and LS for the same task are calculated in the backward direction. After the calculation of above values, if the task is critical or not by comparing the values EF and LF (or ES and LS). If the difference is 0, task is critical. This algorithm decomposes the workflow into the critical path task and non-critical path tasks, which help in scheduling of tasks. After finding the critical and non-critical tasks we can apply part B) of algorithm for the scheduling purpose. The tasks which have non-zero float can be relaxed and their delay by float value will not delay the complete workflow. Currently, for evaluation purpose, only the computational cost is considered.

Part B of the algorithm is assigning resources to the task. In this case the virtual machines with lowest cost will be assigned tasks from the critical path, which will ensure the completion of critical path tasks on an optimal machine.

B) Assign the most cost-efficient VM to every task.

 

The proposed algorithm is to enhance part B of the existing algorithm as follows-

B) Selecting Resources- 

Step 1: For every tasks in the Task List T= {t1, t2, t3 … … … tn}

If the task we pick is critical-

      Execution Time of t(i) remains the same.

Else

      Add slack of that task t (i) to Execution Time of the task t(i)

Step 2: For all VM’s in the VM List VM= {v1, v2, v3……… vm}

Step 3: Search VM list for the VM v (i) that give equal or less Execution cost (ET) as compare to the task t(i) duration given by the user. Add VM in the selected VM list for task t(i)

    Else

         There is no VM that processes within duration allowed for the task.

C) Create a deadlineMap for every depth in the workflow.

For every task t(i) in workflow-

1)sort VM list of task t(i) in increasing order of cost.

2)If task is critical, deadline becomes the minimum cost from sorted VM list.

3)If more than 1 critical task in same depth-

    deadline=Math.max(current_task_minimum_cost,

    previous_entry_for_this_depth)

 4)Add the deadline to map (depth, deadline)

The minimum_cost out of all critical tasks at every level becomes the deadline for all tasks to be completed on that level.

D) Allocate VM to every task

For every depth of the workflow-

Let currentIndex=0

Filter the siblings at same depth in the taskList

For every sibling-

1)weight=deadline at that depth

2)Sort the costsVm in increasing order.

3)If the task is critical, assign the first virtual machine with minimum cost

        else

4)select VM at currentIndex from sortedVMListi of that taski

5)remainingWt = weight – task cost on VMcurrentIndex

6)if (remainingWt < 0) increment index    

//knapsack full for item type VMi

7)if (index >= sortedMap.size() – 1) currentIndex = 0; //select other item in round-robin fashion.

8)re-select VM with new currentIndex from sortedVMList

9) Assign the selected VM to the task

10) Add cost to the totalCost

Part B) details the whole enhanced procedure of scheduling tasks on available resources. In part(B), we select Virtual Machines. If the task is critical, the execution time remains unchanged. For non-critical tasks, slack will be added to execution time.  We then check the status of task, if task is critical we find the VMs that can have equal or less execution cost as compared to specified task duration (duration of each task of workflow application given in the xml file). We store all selected VMs in new temporary VM array list specific to that task.

Part C) explains creation of deadlineMap for every depth of the workflow. There can be more than one critical tasks at the same depth. The critical task with maximum duration becomes the deadline for the particular depth in the workflow. All tasks must complete before the completion of this critical task.

Part D) explains use of the knapsack concept for scheduling. If the task is critical, we assign the most cost- efficient(fastest) VM to the task, else we start assigning the next most efficient virtual machines in the list.

If the same VM can execute more than one task within the deadline, it is assigned to the tasks again, else the next efficient free VM in queue is selected. After assigning all the VMs to the tasks, the total cost is calculated.

VI. PERFORMANCE ANALYSIS

 

A. Simulation Setup

 

WorkflowSim is an open source workflow simulator that was originally developed by Weiwei Chen. In WorkflowSim, workflow applications are represented with the help of DAG files and most of the popular static and dynamic workflow schedulers (e.g., HEFT, Min-Min) and task clustering algorithms are already implemented in this simulator [12].

Workflow Mapper: is used to import DAG files represented in XML format (Extended Markup Language) and other metadata information from Workflow Generator. Workflow Mapper creates a list of tasks and allocates these tasks to an execution site [12].

Workflow Engine: manages execution of the tasks of given workflow based on their inter dependency to assure that a child task can be executed when all of its parent tasks have complete their execution [13].

Workflow Planner: It calls workflow parser to parse a xml file and create a list of tasks and then submits the list to the clustering engine [13].

Clustering Engine: Job can contain multiple task that can be executed in the sequence or in parallel [14].

Workflow Scheduler: is used to schedule jobs to a resources based on the condition applied by users [14].

B. Analysis and Results

 

The figure below shows execution time and cost comparison for scientific workflow Montage_25.xml and Inspiral_30.xml. The execution cost and time both are lesser as compared to the MaxMin and MinMin algorithms. The execution cost increases as compared to the Critical Path Method, since Enhanced Critical Path Method schedules tasks on less cost-efficient machines to load-balance the tasks, but this prevents queueing of tasks for same VM. Thus enhanced method reduces wait-times which results in lesser execution time.

 

Fig.3. Execution time and cost comparison for Montage_25.xml

Fig.4. Execution time and cost comparison for Inspiral_30.xml

C. Time Complexity

The time complexities of all the modules is as follows-

1. forward pass = O(total_tasks*(p))

2. backward pass = O(total_tasks (c))

Where p = total parentTasks of a task.

           c = total childTasks of a task.

3. checkCriticalPath  = O(total_tasks)

4. selectVMsByExecutionTime = O(total_tasks *v)

           v=number of VMs

5. findDeadlines = O(total_tasks)

6. allocateVMs = O(total_depth*k) where k = siblings per depth

VII. CONCLUSION AND FUTURE WORK

Thus the enhanced Critical Path Method improves the existing algorithm using knapsack concept. In future, the algorithm can be modified to take into account the transfer cost of the task along with the execution cost on a VM for total cost comparison. Also, the simulator allows creation of only the static virtual machines to test the workflows. From a practical standpoint, if no VM matches the task, new VMs could be dynamically created which could execute the task within the specified task duration. Workflows with more number of tasks can be tested with the improved algorithm.

ReferenceS

[1]Jailalita; Sarbjeet Singh; Maitreyee Dutta, “Critical path-based scheduling algorithm for applications in cloud computing”, 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA) (Spring)

[2] P. Mell and T. Grance, “The NIST Definition of Cloud Computing,”

Downloaded from: http://www.nist.gov/itl/cloud , 10 February 2015.

[3] N. Chopra, S. Singh, “ Deadline and Cost based Workflow Scheduling

in Hybrid Cloud,” IEEE International Conference on Advances in

Computing, Communications and Informatics(ICACCI), Mysore, pp.

840-846, August 2013.

[4] Y. Jia, R. Buyya and K. T. Chen, “Cost-based scheduling of Scientific

Workflow Applications on Utility Grids,” IEEE 5th International

Conference on e-Science and Grid Computing, Melbourne, Australia,

pp. 140-147, July 2005.

[5] M. Rahman, S. Venugopal, R. Buyya, “A Dynamic Critical Path

Algorithm for Scheduling Scientific Workflow Applications on Cloud

Grids,” IEEE 3rd International Conference on E-Science and Grid

Computing, pp.35-42, December 2007.

[6] T. Amudha, T.T. Dhivyaprabha, “QoS Priority Based Scheduling and

Proposed Framework for task Scheduling in a Grid Environment,” IEEE

International Conference on Recent Trends in Information Technology,

Chennai, Tamil Nadu, pp.650-655, July 2011.

[7] R. N. Calheiros and R. Buyya, “Meeting Deadlines of Scientific

Workflows in Public Clouds with Tasks Replication,” IEEE

Transactions on Parallel and Distributed Systems, Vol.25, Issue No.7,

pp.1787-1796, June 2013.

[8] N. Chopra and S. Singh, “HEFT based Workflow Scheduling Algorithm

for Cost Optimization within Deadline in Hybrid Clouds,” IEEE 4th

International Conference on Computing Communications and Networking Technologies, Tiruchengode,Tamil Nadu, pp.1-6, July 2013.

[9] Y. Zhang, Y.Mao, “A SCP Based Critical Path Scheduling Strategy for

Data-Intensive Workflows,” 7th International Conference on Fuzzy

Systems and Knowledge Discovery, pp-1735-1739, 2010.

[10] S. Abrishami and M. Naghibzadeh, “Budget Constrained Scheduling of Grid Workflows using Partial Critical Paths,” International Conference

on Cloud Computing and Technology, pp.124-129, 2012.

[11] Maria A. Rodriguez and Rajkumar Buyya,“A Responsive Knapsack-based Algorithm for Resource Provisioning and Scheduling of Scientific Workflows in Clouds”, 2015 44th International Conference on Parallel Processing.

[12] W. Chen, “WorkflowSim,” Downloaded from: http://www.workflowsim.org/ , 10 March 2015.

[13] W. Chen and E. Deelman, “WorkflowSim: A toolkit for simulating scientific workflow in Distributed environments,” IEEE 8th International Conference on E-Science, Chicago, US, pp.1-8, October 2012.

[14] B. Berman, “WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments, Downloaded from: http://pegasus.isi.edu/projects/workflowsim, 20 June 2015

[15] http://gooa.las.ac.cn/external/index?type=-1&pid=1437698

[16]https://www.softwareadvice.com/resources/what-is-critical-path/

[17]https://www.projectmanager.com/blog/understanding-critical-path-project-management

 

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please: