# Developing Algorithm to Schedule Workflow onto Distributed Resources

7572 words (30 pages) Essay in Computer Science

18/05/20 Computer Science Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

1. Introduction

Distributed computing technologies such as cluster, grid, and now, cloud computing, have all aimed at allowing access to large amounts of computing power in a fully virtualized manner, by aggregating resources and offering a single system view. In addition, an important aim of these technologies has been delivering computing as a utility. Utility computing is a business model for providing services and computing power on-demand; consumers pay providers based on usage (“pay-as-you-go”), similar to the way in which we currently obtain services from traditional public utility services such as water, electricity, gas, and telephony. Clouds are a large pool of easily usable and accessible virtualized resources (such as hardware, development platforms and/or services) [3]. Today’s applications consist of immense number of interactive tasks. These applications include scientific workflow and big data that are often in the form of extremely large datasets in a broad range of domains, such as astronomy, bioinformatics, climate science, and others [1, 2, 14]. These tasks generally require enormous processing power that is beyond the ability of a single machine. With the emergence of distributed computing and cloud computing, the computing power required for processing these large datasets is provided. A popular representation of a workflow application is the directed acyclic graph (DAG). The workflow scheduling is a general form of task scheduling in which tasks are mapped into the distributed resources for execution by workflow management system [10]. However, the main challenge is how to schedule the dynamic workflow with heavy fluctuations onto distributed resources (e.g., Cloud) efficiently as the underlying distributed resources are highly dynamic, failure-prone and heterogeneous [8].

Estimates of task runtime, disk space usage, and memory consumption, are commonly used by scheduling and resource provisioning algorithms to support efficient and reliable workflow executions. Most of the proposed duplication heuristics algorithms assume the existence of accurate estimates for tasks resource requirements such as execution and communication times, disk space, or memory usage. In particular, it is not ordinary possible to know a priori the task execution requirements and one would at best have only probability distributions for the estimation. In this work, we introduce a different approach to construct a prediction model which is based on observed correlations between input data size and task requirements [16, 17, 18, 19] for the task duplication scheduling.

If you need assistance with writing your essay, our professional essay writing service is here to help!

The aim of the task duplication scheduling is to achieve shorter makespan. Though, it may ravel the scheduling problem and make it more complicated. The scheduling algorithm requires to monitor the priority of tasks and also needs to identify those tasks that need to be duplicated and the algorithm should also detect idle time slots for allocating duplicated tasks. For that, we need to capture profiling data of workflow such as runtime, I/O, CPU utilization, memory, data dependencies among tasks. These workflows information can be extracted using profiling and monitoring tools such as Pegasus workflow management system [10, 11] (Pegasus is open source and available under the Apache V2 license) and Kickstart [17]. Pegasus uses Kickstart for monitoring the workflow, or the information can be captured through instrumenting the tasks by using Linux system calls. These tools capture profiling data such as runtime, CPU and memory usage as well as I/O operations. We can evaluate the significance of each of these parameters using a feature selection process [16].

1.1.  Problem definition: Duplication-based workflow scheduling in distributed systems

In order to make a plan for the execution of a workflow in a cloud environment, we need to consider two sub problems; 1) resource provisioning that includes the selection and provisioning the compute resources that will be used to run and execute the tasks. This means having heuristics in place that are capable of determining how many VMs to lease, their type, and when to start them and shut them down,  2) scheduling or task mapping onto the computational resources, in which each task is assigned onto the best-suited resource. The term scheduling is often used to refer to the combination of these two sub problems. The main issue is to schedule parallel applications, represented as a Directed Acyclic Graph (DAG), onto processing elements of parallel and distributed systems efficiently and cost effective. The goals of the scheduling process are to efficiently utilize resources and to achieve performance objectives of the application (e.g., to minimize program parallel execution time). The communication overhead in distributed systems remains an inevitable penalty. Due to this problem the parallel program speedup may be limited or may not scale very well with size of the system. This can be expressed in two primary optimization problems as: minimizing cost under deadline constraint or minimizing scheduling length (makespan) under budget constraint.

1.2.  Motivation

Many scientific areas such as astronomy, bioinformatics, climate science, and etc. have adopted workflows as a way of expressing multi-step computational problems that can be efficiently processed and analyzed on distributed infrastructures. The ever-growing availability of data generated by powerful scientific instruments often require enormous processing power to analyze large quantities of data. This needs a distributed platform in order for relevant results to be achieved in a reasonable amount of time. The latest distributed computing paradigm, cloud computing, offers several key benefits for the deployment of large-scale scientific workflows. However, present scheduling algorithms suffer from the lack of precise information about the workflows and resources.

1.3.  Challenges

1. Dynamic and fluctuating workflows – the major challenge to elasticity in cloud computing is that the workloads are unpredictable.
2. Heterogeneous physical nodes in cloud datacentres – the tasks are allocated (or scheduled) across available nodes which are widely distributed at different location and vary in computational power, architecture, memory and even the network performance. Different tasks perform differently at different nodes.
3. Cloud systems are highly dynamic, failure-prone – While the cloud service offerings present a simplistic view of IT in case of IaaS, underlying systems level support challenges are huge and highly complex.

1. Aims
1. We aim to develop an efficient algorithm to schedule workflow onto distributed resources (e.g. Cloud).
2. We aim to present an analytical model to achieve auto-scaling in resource provisioning scheme for scientific application on a cloud platform. This can be expressed in two primary optimization problems as: minimizing cost under deadline constraint or minimizing scheduling length (makespan) under budget constraint.
3. We aim to develop an algorithm to predict the task execution runtime and also to determine which resource a task should be allocated by using the previous execution history of workflow process and resources as in practice, it is difficult to estimate these requirements accurately due to the highly dynamic nature of underlying cloud systems and workflows.
1. Survey

1. Workflow Scheduling Approaches

We can classify workflow scheduling algorithms into three categories to distinguish a variety of them based on available information of workflow and resource and task resource mapping: 1) Static scheduling, 2) Dynamic scheduling, 3) Hybrid as shown in Fig. 1.

Fig. 1 workflow scheduling classification

Workflow applications require a scheduling strategy that should take into account the precedence constraints among their component tasks. The workflow scheduling heuristics are classified into the following general categories:

• List scheduling algorithms
• Clustering algorithms
• Guided random search algorithms

2.1.1.     List scheduling

The basic idea of list scheduling is to make a scheduling list (a sequence of tasks for scheduling) by assigning them some priorities and sorting them according to their priorities, and then repeatedly execute the following two steps until all the tasks in the DAG are scheduled:

2. Resource selection Allocate the task to selected resource.

Some of the most important list scheduling heuristics are modified critical path (MCP) [168], mapping heuristic (MH) [57], insertion scheduling heuristic [88] and earliest time first (ETF) [74], heterogeneous earliest-finish-time (HEFT).

2.1.2.     Clustering scheduling

The main idea of clustering algorithms is the minimization of the communication cost between the tasks of a DAG, by grouping heavily communicating tasks into the same cluster and assigning all of the tasks in the cluster to the same processor.

2.1.3.     Duplication scheduling

Duplication-based scheduling is modelled for communication time optimization between data dependent tasks. The duplication method is often used in together with clustering scheduling or list scheduling. The main idea behind this type scheduling is to exactly duplicate task onto the same resource in order to avoid transmission time between task which is duplicated and other tasks. Thus, the earliest start time of tasks for execution on that resource is minimized that can result in a better makespan.

Duplication heuristic relies on the fact that some resources remain idle during different time intervals since they are allocated to tasks that are waiting for output of tasks that are assigned to some other resources, despite using an optimized task scheduling algorithm. Therefore the main motivation of duplication-based scheduling is to effectively employ these idle resources by discovering the critical tasks and redundantly assigning them in these time slots. Thus, the overall parallel execution time of tasks can be further decreased.

The following two concerns are essential to be considered when modelling an optimize task duplication-based algorithm:

1. Identifying tasks to be duplicated: this involves minimizing the start execution time of child task by selecting the parent tasks for duplication.
2. Discovering idle time slots: this involves how to locate an idle time slot for duplicating parent task on a resource.

In the literatures [4, 28], DBS algorithms are classified into two categories according to the task duplication approach used: Scheduling with Partial Duplication (SPD) and Scheduling with Full Duplication (SFD). Full duplication algorithms attempt to duplicate all the parents of a join node and apply the task duplication algorithm to all the processors that have any of the parents of the join node. A join node is defined as a node with an in-degree greater than one (i.e., a node with more than one incoming edge). DSH [20], BTDH [21], LCTD [22], CPFD [23], TCSD [24] and PY [25] belong to this category. Partial duplication algorithms do not duplicate the parent of a join node unless the parent is critical. Instead, they try to find the critical parent which is defined later in this paper as an immediate parent which gives the largest start time to the join node. The join node is scheduled on the processor where the critical parent has been scheduled. Because of the limited task duplication, algorithms in this category have a low complexity ‘but may not be appropriate for systems with high communication overhead. They typically provide good schedules for an input DAG where computation cost is strictly larger than communication cost. CPM [26], SDBS [27], DFRN [28], TDS [29], LWB [30], PLW [31] and FSS [32] belong to this category. SFD algorithms have a higher complexity but typically show better performance than SPD algorithms. A trade-off exists between algorithms in these two categories: performance (better application parallel execution time) versus time complexity (longer time to carry out the scheduling algorithm itself).

Most of the proposed duplication heuristics algorithms assume the existence of accurate estimates for tasks resource requirements such as execution and communication times, disk space, or memory usage. In particular, it is not ordinary possible to know a priori the task execution requirements and one would at best have only probability distributions for the estimation. Some of the task duplication scheduling algorithms have been extensively explored in the context of Grid systems [20, 21, 22, 23, 24] without addressing the issue of cost for resource utilization.

Table 1 summarizes the time complexity of duplication based algorithms and indicates the class of algorithms they belong to (i.e., whether they are SPD or SFD algorithms). Note that, for a DAG with V nodes, all the SFD algorithms have a complexity of O(

${\mathrm{v}}^{4}$

) while the SPD algorithms have a complexity of O(

${\mathrm{v}}^{2}$

).

 Scheduling Algorithms Classification Time Complexity Description TDS SPD O( ${\mathrm{v}}^{2}$ ) Only the critical parent is duplicated PLW SPD O(v(e + v log v)) Lower bound of start-time is approximated LWB SPD O( ${\mathrm{v}}^{2}$ ) Lower bound of start-time is approximated node weights are strictly larger than any edge weight DFRN SPD O( ${\mathrm{v}}^{2}$ ) Duplication first and reduction next CPM SPD O( ${\mathrm{v}}^{2}$ ) Limited duplication SDBS SPD O( ${\mathrm{v}}^{2}$ ) Limited duplication FSS SPD O( ${\mathrm{v}}^{2}$ ) Limited duplication LCDT SFD O( ${\mathrm{v}}^{4}$ ) High time complexity Optimization of linear clustering CPFD SFD O( ) High time complexity Task on critical path is considered firstly DSH SFD O( ${\mathrm{v}}^{4}$ ) High time complexity It considers only the idle time slot between the finish time of the last node scheduled to a processor and the earliest start time of the candidate node (the one being considered for scheduling), the degree of duplication is likely to be small duplication may not always be effective BTDH SFD O( ${\mathrm{v}}^{4}$ ) High time complexity does not indicate any preference as to which parent node to be considered for duplication The duplication process does not even if the start time of the candidate node is increased PY SFD O( ${\mathrm{v}}^{2}$ (e +v log v)) Lower bound of start-time is approximated TCSD SFD O( ${\mathrm{v}}^{3}$ log v) Lower bound of start-time is approximated

Table 1. Comparison of scheduling algorithms

In this research, we aim to introduce a new DBS algorithm that duplicates the parents of any join node as done in SFD algorithms but with reduced time complexity. We select the critical node for duplication based on the predicted output size of the node to achieve the performance of SFD algorithms with a computational complexity close to SPD algorithms.

2.1.4.     Dynamic scheduling

1. System Architecture and Modelling

1. Workflow Management System Architecture

Several scientific workflow management systems are developed to help scientists, analysts, and developers in different scientific domains to create and execute scientific workflows and analyses across broad areas of scientific communities. Kepler [39] is a free and open-source workflow management system which operates in a variety of formats on data locally and globally. By using Kepler’s GUI, users are able to create, execute scientific workflows. Pegasus [40] is a workflow management system which runs over varieties of hardware including a laptop, a campus cluster, a grid, or a commercial or academic cloud environment such as Amazon EC2 [43] and Nimbus. Triana [42] is an environment for workflow and data analysis, which provides a graphical user interface that helps users to develop and run their own programs. Triana has been developed at Cardiff University, initiating as a part of GEO600 gravitational wave detector software and more recently in a wider range of users. Taverna [41] is a powerful, open-source, and domain-independent tool for designing and executing workflows. It uses textual language SCUFL which is a mechanism for specifying Taverna workflows. A workflow management system architecture is shown in Fig. 2 which consists of Workflow and Clustering Engines, Workflow and Resource Monitoring Tools, and User Interface.

Fig. 2 Workflow management system architecture

Workflow and resource engine is the core of the workflow management system. Its main responsibilities consist of scheduling, data management, task execution management and resource provisioning.

3.2.   Workflow modelling and definition

The notation and terminology used here is identical with [38]. A task isa unit of computational activity in the sequencing problem. It might for example, be a job, a program, or an instruction. A task will be specified in terms of its external behavior, e.g., the inputs it requires, the output it generates, its action of function, and its execution time. If T is a task; two events are associated with T: initiation

, and termination . lett denote the time of occurrence of an event. We assume that t() – t(

) is nonzero and finite as long as all resources required by T are available.

In particular, a resource is any (not necessarily physical) device which is used by tasks in performing certain functions. Resources include control and input/output devices (tape, disk drives, card readers…), processors, storage media (drum, disks…), program, procedures, and data files. For the initiation and termination events of tasks, sets of states transitions are defined for the function of tasks as well as system capabilities: Initiation event corresponds to a state transition that reflect: 1) the acquisition and assignment of resources, 2) the initialization of the resource state, 3) the reading input values. A termination event corresponds to a state transition that reflect: 1) the release or discontinued use of resources, 2) the writing of output values, 3) if values or internal states are associated with resources, a termination event also correspond to “Saving” resource state.

Parallelism in task execution computational model

Let v = {

} be a set of tasks, and let → be a partial order (precedence graph) on v. The pair C = (v, →) is called a task system. The partial order represents operational precedence; i.e. TT’ means that task T is to be completed before task T’ is begun.

The precedence corresponding to the task system C has v as its vertices and the following set of directed edges. The edge (T, T’) from T to T’ is in the graph if and only if TT’ and there exists no T” such that TT”→T’. This definition ensures that there is no redundant specification of precedence in the graph.

Fig. 3 A precedence graph

A (directed) path (

${x}_{1}{x}_{2}$

) (

${x}_{2}{x}_{3}$

)… (

${x}_{k–1}{x}_{k}$

${x}_{1}\dots {x}_{k}$

in the graph C. The length of this path is k, i.e., the number of vertices in the path. For i and j such that 1 ≤ i j k,

${x}_{j}$

is a successor of

${x}_{i}$

and

${x}_{i}$

is a predecessor of

. If j = i + 1, we shall use the terms immediate successor and immediate predecessor, respectively. A task with no successor is an exit task, and a task with no predecessor is an entry task. If task T is neither a successor nor predecessor of task T’, the T and T’ are independent. Critical Path (CP) of a task graph, is a set of nodes and edges, forming a path from an entry node to an exit node, of which the sum of computation cost and communication cost is the maximum [23].

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

Since we are interesting → as a (temporal) precedence ordering on tasks, T and T’ may be concurrent if and only if they are independent (otherwise, T and T’ must occur temporally in some fix order).

An execution sequence of an n-task system C = (v, →) is any string α =

${a}_{1}{a}_{2}\dots {a}_{2n}$

of task initiation and termination events satisfying the precedence constraints of C. Stated accurately,

1. For each T in v the symbols

and appears exactly once in α.

2. If

${a}_{i}$

=

and

${a}_{j}$

= , then i < j.

3. If

${a}_{i}$

= and

${a}_{j}$

=

’, where T T’, then i < j.

Two valid execution sequences for Fig. 3 are

and

3.3.  Proposed Framework

In this project a novel framework is proposed for mapping workflow to the underlying resources in the distributed system that adopts a layered treatment method. The main aim of mapping is to accomplish auto-scaling in resource provisioning. The proposed framework in Fig. 5, consists of two layers: A logical layer and a physical layer. In the first stage of the logical layer workflow process and workflow resources are modelled and then a logical connection relations is established between workflow process activities and distributed resources. The modelling is based on dynamic information such as set of available tasks, location of data, and the state of the resources and static information such as task priorities computed from the whole task graph. The algorithm also exploits the previous execution history of workflow process and resources in order to predict the task execution runtime and also to determine which idle time slot of resources a work item should be duplicated. In the second stage, of the logical layer (running stage) the logical connection is searched to find out best scheduling solution. The algorithm learns from historical data in order to establish the connection faster and more accurate.

Fig. 5 Layered framework for workflow process and resource modelling

In the physical layer, on the availability of the resources the workflow management system allocates the resources to the corresponding workflow process. If there are more number of available resources (budget available), for duplicate execution, which allows provisioning of more resources than the minimal necessary for meeting the deadline. In practice, it is difficult to estimate these requirements accurately and also they don’t take into account the memory usage [6]. We can optimize resource provisioning by estimating characteristics of the resources required for workflow execution that can have a significant impact on costs and resource utilization for example when we use a cloud infrastructure.

1. Current Status of Research and Future Plan

At this stage, we have designed a framework for mapping workflow to the underlying resources for a heterogeneous and dynamic distributed resources (section 3.3). For this project, we consider Montage workflow which is created by the NASA Infrared Processing and Analysis Centre (IPAC) as an open source toolkit that can be used to generate custom mosaics of astronomical images in the Flexible Image Transport System (FITS) format workflow. Montage workflow is I/O- and data-intensive workflow. Therefore, it is a suitable case study for our project.

The next stage of this work will be to carry out experimental studies of the proposed framework and workflow and resource requirements such as task runtime, disk space usage, and memory consumption. Also, we shall incorporate additional consideration such as multitenancy workflow into the model.

Table 2 outlines the schedule for writing the thesis proposal.

References

1. R. Ferreira da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and E. Deelman, “A Characterization of Workflow Management Systems for Extreme-Scale Applications,” Future Generation Computer Systems, vol. 75, p. 228–238, 2017.
2. E. Deelman, T. Peterka, I. Altintas, C. D. Carothers, K. K. van Dam, K. Moreland, M. Parashar, L. Ramakrishnan, M. Taufer, and J. Vetter, “The future of scientific workflows,” The International Journal of High Performance Computing Applications, vol. 32, iss. 1, p. 159–175, 2018.
3. R. Buyya, J. Broberg, and A. Goscinski, Cloud computing: Principles and paradigms. Wiley, 2010.
4. F. Wu, Q. Wu, Y. Tan, Workflow scheduling in cloud: a survey, J. Supercomput. 71, 2015, pp. 3373–3418.
5. H. Topcuouglu, S. Hariri, M.-y. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Trans. Parallel Distrib. Syst. 13 (3) (2002) 260–274.
6. R. F. da Silva, G. Juve, M. Rynge, E. Deelman, and M. Livny, “Online Task Resource Consumption Prediction for Scientific Workflows,” Parallel Process. Lett., vol. 25, no. 3, p. 1541003, Sep. 2015.
7. F. Zhang, J. Cao, W. Tan, S.U. Khan, K. Li, A.Y. Zomaya, Evolutionary scheduling of dynamic multitasking workloads for big-data analytics in elastic cloud, IEEE Trans. Emerg. Top. Comput. 2, 2014, pp. 338–351.
8. F. Zhang, J. Cao, K. Hwang, K. Li, S.U. Khan, Adaptive workflow scheduling on cloud computing platforms with iterative ordinal optimization, IEEE Trans. Cloud Comput. 3, 2015, pp. 156–168.
9. M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, R. F. Freund, Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems, in: 8th Heterogeneous Computing Workshop, HCW ’99, 1999.
10. E. Deelman et al., “Pegasus, a workflow management system for science automation,” Futur. Gener. Comput. Syst., vol. 46, pp. 17–35, 2015.
11. A. Khan, X. Yan, S. Tao, and N. Anerousis, “Workload characterization and prediction in the cloud: A multiple time series approach,” in 2012 IEEE Network Operations and Management Symposium, 2012, pp. 1287–1294.
12. T. Shibata, S. Choi, and K. Taura, “File-access Patterns of Data-intensive Workflow Applications and Their Implications to Distributed Filesystems,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010, pp. 746–755.
13. I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” J. Mach. Learn. Res., vol. 3, pp. 1157–1182, Mar. 2003.
14. G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, and K. Vahi, “Characterizing and profiling scientific workflows,” Futur. Gener. Comput. Syst., vol. 29, no. 3, pp. 682–692, 2013.
15. Sonmez O, Yigitbasi N, Abrishami S, Iosup A, Epema D (2010) Performance analysis of dynamic workflow scheduling inmulticluster grids. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, ACM, pp 49–60.
16. G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, K. Vahi, Characterizing and profiling scientific workflows, Future Generation Computer Systems 29 (3) (2014) 682–692.
17. F. Nadeem, M. Yousaf, R. Prodan, T. Fahringer, Soft benchmarks-based application performance prediction using a minimum training set, in: 2nd IEEE International Conference on e-Science and Grid Computing, 2006.
18. W. Tang, J. Bischof, N. Desai, K. Mahadik, W. Gerlach, T. Harrison, A. Wilke, F. Meyer, Workload characterization for mg-rast metagenomic data analytics service in the cloud, in: IEEE International Conference on Big Data, 2014.
19. T. Shibata, S. Choi, K. Taura, File-access patterns of data-intensive workflow applications and their implications to distributed filesystems, in: 19th ACM International Symposium on High Performance Distributed Computing (HPDC), 2010.
20. Kruatrachue B, Lewis T (1988) Grain size determination for parallel processing. IEEE Softw 5(1):23–32
21. Chung YC, Ranka S (1992) Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors. In: Proceedings of supercomputing ’92. IEEE, pp 512–521.
22. Chen H, Shirazi B, Marquis J (1993) Performance evaluation of a novel scheduling method: linear clustering with task duplication. In: Proceedings of the 2nd international conference on parallel and distributed systems.
23. Ahmad I, Kwok YK (1998) On exploiting task duplication in parallel program scheduling. IEEE Trans Parallel Distrib Syst 9(9):872–892.
24. Li G, Chen D, Wang D, Zhang D (2003) Task clustering and scheduling to multiprocessors with duplication. In: Proceedings of the parallel and distributed processing symposium, IEEE.
25. Papadimitriou CH, Yannakakis, M (1988) Towards an architecture-independent analysis of parallel algorithms. In: Proceedings of the twentieth annual ACM symposium on theory of computing, STOC ’88, ACM, New York.
26. J. Y. Colin and P. Chretienne, “C.P.M. Scheduling with Small Communication Delays and Task Duplication,” Operations Research, 1991, pp. 680-684.
27. S. Darbha and D. P. Agrawal, “SDBS: A task duplication based optimal scheduling algorithm,” Proc. of Scalable High Performance Computing Conf., May 1994, pp. 756-763.
28. Park GL, Shirazi B, Marquis J (1997) Dfrn: a new approach for duplication based scheduling for distributed memory multiprocessor systems. In: Proceedings of 11th international parallel processing symposium, pp 157–166.
29. Darbha S, Agrawal DP (1998) Optimal scheduling algorithm for distributed-memorymachines. IEEE Trans Parallel Distrib Syst 9(1):87–95.
30. Colin J, Chretienne P (1991) C.p.m. scheduling with small computation delays and task duplication. In: Proceedings of operations research, pp 680–684.
31. Palis MA, Liou JC, Wei DS (1996) Task clustering and scheduling for distributed memory parallel architectures. IEEE Trans Parallel Distrib Syst 7(1):46–55.
32. S. Darbha and D. P. Agrawal, “A Fast and Scalable Scheduling Algorithm for Distributed Memory Systems,” Proc. of Symp. On Parallel and Distributed Processing, Oct. 1995, pp. 60-63.
33. W. Cirne, F. Brasileiro,D. Paranhos, L.F.W. Go´es, and W. Voorsluys, ‘‘On the Efficacy, Efficiency and Emergent Behavior of Task Replication in Large Distributed Systems,’’ Parallel Comput., vol. 33, no. 3, pp. 213-234, Apr. 2007.
34. G. Kandaswamy, A. Mandal, and D.A. Reed, ‘‘Fault Tolerance and Recovery of Scientific Workflows on Computational Grids,’’ in Proc. 8th Int’l Symp. CCGrid, 2008, pp. 777-782.
35. R. Sirvent, R.M. Badia, and J. Labarta, ‘‘Graph-Based Task Replication for Workflow Applications,’’ in Proc. 11th Int’l Conf. HPCC, 2009, pp. 20-28.
36. M. Dobber, R. van der Mei, and G. Koole, ‘‘Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison,’’ IEEE Trans. Parallel Distrib. Syst., vol. 20, no. 2, pp. 207-218, Feb. 2009.
37. X. Tang, K. Li, G. Liao, and R. Li, ‘‘List Scheduling with Duplication for Heterogeneous Computing Systems,’’ J. Parallel Distrib. Comput., vol. 70, no. 4, pp. 323-329, Apr. 2010.
38. Coffman, E. G., and Denning, P. J. Operating Systems Theory. Prentice-Hall, Englewood Cliffs, N.J.
39. Kepler Scientific Workflow System. https://kepler-project.org/
40. Pegasus Workflow Management System. https://pegasus.isi.edu/
41. Apache Taverna. http://www.taverna.org.uk/
42. Triana Scientific Workflow. http://www.trianacode.org/
43. Amazon Web Services. https://aws.amazon.com/

Our proposed project aims to engineer intelligence and sustainability in the next generation of scientific workflow scheduling and management of Distributed Systems (i.e. Cloud computing). These systems are characterized by scale, dynamism, uncertainty and hyper connectivity in operations.  Workflow is very important in the scheduling paradigm. For example, there are unprecedented amounts of requests from the customers of Amazon and eBay during the holiday seasons. Thereafter, the website traffic drops down dramatically. Mapping performance requirements to the underlying resources in the cloud is the key problem [4]. Over-provisioning of resources to assure performance requirements may results in unproductive instances leading to unnecessary costs, while under-provisioning of resources will undoubtedly hurt performance [7]. Clouds typically have highly dynamic demands for resources with highly heterogeneous and dynamic workflows. For example, the workflows associated with the application can be quite dynamic, in terms of both the number of tasks processed and the computation requirements of each task. Furthermore, different applications may have very different and dynamic quality of service (QoS) requirements; for example, one application may require high throughput while another may be constrained by a budget, and a third may have to balance both throughput and budget. The performance of a cloud service can also vary based on these varying loads as well as failures, network conditions, and so on, resulting in different “QoS” to the application. Workflow management and scheduling of these systems require high level of autonomy that enables dynamic application scale-out to address dynamic workflows, spikes in demands, and other extreme requirements, where scheduling should consider (i) the software world including functional requirements, logical flow, priorities and constraints of the workflow and (ii) the physical world, including computational resources, their dependability of the distributed resources and the end user Quality of Experience (QE) among the others. Our project builds on successful combination of research areas such as AI, machine learning, dynamic optimization and distributed computing to investigate the requirements of scientific workflow scheduling and management of distributed systems.  It will develop novel intelligent algorithms which leverage learning from history [6, 11, 12, 13] situation, context and/or various interactions between the physical and software worlds to provide seamless, dependable and more effective scheduling. The comparative evaluation of our algorithm and the related work are based on some metrics such as Schedule Length Ratio, Speedup, and Running Time of Algorithm [5].

## Related Services

View all

### DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please:

Related Services