Grid Monitoring

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Novel Resource Allocation Strategy Using Network Metrics in GRID


Grid monitoring involves the monitoring of the available resources and the network. Monitoring the resource metrics helps the grid middleware to decide which job to be submitted to which resource. The resource metrics is not enough for deciding a job to be submitted in a resource. A study and analysis of the network metrics also do equally contribute to the decision making while submitting a job.


Grid Monitoring; Resource metrics; Network metrics; Resource Selection.

1 Introduction

A Grid platform is an extended distributed environment wherein it is composed of loosely coupled computers acting in concert to perform very large tasks. The computers in the grid network may have different operating systems or hardware, which results in being a heterogeneous environment and are often in a decentralized network, rather than contained in a single location. Grid computing arena involves large amount of academic research projects and has proved to be a collaborative method of solving a given problem using the shared high-end computational computers.

2 Need for Grid Monitoring

The high-level computing jobs can be efficiently performed by analyzing various parameters that impact the process of computing. This process of analyzing the various parameters of the grid setup is known as monitoring [11]. Satoshi describes the need for maintaining the level of quality of the grid setup [26]. There are various factors affecting the quality such as network faults, component interdependencies etc. The quality of the grid setup can be maintained by regularly monitoring the activities within the grid setup, through the process of grid monitoring. This process of monitoring gives the details of the current execution scenario and can also help in predicting the future performance of the setup which will be useful to estimate the time required for completion of jobs as specified in [31]. After the submission of jobs to the grid setup, users often experience delay in job completion. This can be reduced to a great extent if the grid setup is monitored regularly and the performance problems [2] detected is rectified soon. The purpose of grid monitoring extends its concept leading to performance prediction and performance tuning of grid setup. In the grid setup there are various issues arising during the job execution. These issues namely the delayed job execution, blockage of the job may be due to the resource metrics or the network metrics. These issues should be identified by the process of monitoring, and should be corrected by the process of tuning. The process of identifying and tuning of grid performance is robust since the resources are distributed at different geological location and are connected together by network links [3]. Wu-Chun Chunga and Ruay-Shiung Chang have proposed an efficient protocol called the Grid Resource Information Retrieving (GRIR) protocol [10], which is based on the push data delivery model to obtain the accurate network status.

A monitoring and information system (MIS) is a key component of a distributed system or Grid which provides information about the available resource metrics and their status. MIS can be used in a variety of ways: a resource broker may query the MIS to locate computing elements for the CPU and memory requirements according to a job submitted by the end-users; a program may collect a stream of data generated by MIS to direct an application or to a system administrator to send a notification when system load or disk space availability changes while identifying the possible performance anomalies [4].

There are two types of monitoring.

Active monitoring -

Few test packets are injected in the original data channel and the performance is monitored. This measures the behavior of the packets on the network.

Passive monitoring -

Some observation posts are formed to monitor the flow of data packets without disturbing the actual flow of the data packets. This measures the behavior of the application while using the network.

3 Network Monitoring

Albert describes the use of networked computational resources for the implementation of high sensor applications [28]. These high sensor applications required parallel computing in which the network performance is vital and needs to be monitored regularly. The process of network monitoring involves the evaluation of network performance of the links between the clients and head node of the grid setup. Some of the common metrics identified for network monitoring [6] viz., latency, jitter, packet loss, throughput, link utilization, availability and reliability. Latency in a network may vary because of the congestion in the channel, router, load of the end - end hosts and also the path followed by the packet during it's to and fro travel. Jitter generally means short-term variations. Jitter is a delay that varies over time. Jitter is also known as variation latency. Packet loss may take place due to hardware fault, congestion in the channel, corruption in the data packet sent. Throughput is constituted by several parameters namely, packet loss ratio, latency, jitter, delay, round trip time and available bandwidth. Link utilization can be calculated from the above throughput divided by the access rate and expressed in percentage. For some types of link, the service provider may give Committed Information Rate (CIR). Availability refers to the channel availability for a particular application to use at certain point of time. Reliability is related with the packet loss ratio and availability. This also involves the retransmission rate. The system administrators and application developers need variety of monitoring tools to analyze various network metrics such as round trip time, packet loss, bandwidth, jitter, latency, throughput etc. Various network monitoring tools are available which helps in the efficient monitoring of the network [20].

4 Resource Monitoring

By maintaining the resource status constantly, the necessary information can be quickly provided as requested. However, the cost of maintaining resource status is highly related to the number of resources and the frequency of status updating. Therefore, trade-off between maintenance cost and data accuracy should be considered. Resource monitoring involves monitoring the available resources whenever a job is submitted to the grid middleware. A submitted job is often executed whenever sufficient amount of resources are freed by other jobs. Rajkumar enumerates various issues in Grid Resource Management [27]. The resources are geographically distributed and have their own scheduling mechanisms, prices, access permissions. All these factors need to be managed properly to provide better system performance and user satisfaction. Fufang describes the agent based resource management system [29] where the agents are designed to locate the largest available computation power within the grid setup and provide proper load balance. Junwei also describes the advantages of Agent-Based Resource Management Infrastructure [30] which reduces two major challenges of adaptability and scalability in grid environment.

In [10], Wu-Chun has introduced a Grid Resource Information Monitoring (GRIM) prototype. To take into account the dynamicity of the changing resources in grid, a push based data delivery protocol called Grid Resource Information Retrieving (GRIR) is used. Resource information is updated completely based on its availability and the requirement of sufficient resource metrics. One of the prominent techniques for resource monitoring is Grid Monitoring Architecture (GMA). Resources available in grid network are present in Producer or product Service. Monitoring of the resources in grid is done by Consumer Service. Director Service is one which makes the bridge between the consumers and the producers [22].

Resource monitoring will identify the bottlenecks among the available resources in the grid network. Globus Alliance developed the Globus toolkit, in which the Monitoring and Discovery System (MDS) is the most prominent monitoring software. The status of the resources is gathered by the Information provider [23]. Swift Scheduler allocates jobs in Computational Grid by considering the length of the jobs, processing time, jobs' memory, and CPU requirements with respect to the priority of resources [23].

5 Resource Monitoring Vs Network monitoring

In general, the node selection procedure for the job execution is done based on the resource parameters such as computational speed, CPU usage, memory etc. These parameters decide which node has the capability of performing the computation efficiently. In some cases the selection of nodes is done by also considering the network work performance between the links to the nodes. Though the nodes may have high resource availability, it may delay the execution of the job due to network performance degradation. To reduce this kind of issues, the network parameters are also to be considered along with the resource parameters during node selection. Job submission is based on both CPU loads among the servers and latencies available in the network [24].

Though the resource parameters may be monitored well, the network parameters may have a role in effective transfer of data between the compute and head nodes. The need for network monitoring in grid is mainly when the parallel jobs are submitted to compute nodes with equal resource performance. In this case though the resource performances of the compute nodes are equal, the response from both the nodes may not be received in the same time. This effect is due to the variation in network performance between the links of the computational nodes. The purpose of network performance monitoring becomes crucial as the network size of the grid setup increases. This involves monitoring of more number of links on the network and estimates its network performance and support fault detection [6].Collecting, relating and analyzing of network information are one of the important aspects of effective grid application and services. GMMPro, grid network and monitoring system provides the basic support for monitoring the grid network and has SNMP as its lower layer protocol [5].

Table 1 - Comparison of Existing monitoring systems




Information Collector

Information Provider



Information Server


Producer Servlet


Aggregate Information Server


Producer and Consumer


Directory Server




Use Case

Single Query

Streaming Network Data

Single Query

Push/Pull model


Push and Pull

Triggers Push

GRIS - Grid Resource Information Service

GIIS - Grid Index Information Service

6 Research issues

There are various research issues in improving the efficiency of grid setup. Though the resource metrics contribute to the decision making of the resource selection, network metrics do play a significant role in deciding the hosts for job execution. The network metrics gives more detailed information regarding the quality and the performance of the hosts. Resource metrics gives information of the local system's efficiency only in terms of CPU utilization, concurrent processing, memory utilization. Hence we are overlooking the degradation due to network metrics when we consider only the resource metrics. Hence, analysis of the network metrics along with the resource metrics for the selection of the compute nodes during job submission is essential. The strategy for resource allocation can be designed in the form of an algorithm. The algorithm can be optimized to handle both the resource metrics as well as the network metrics.

One of the key issues is to consider the network metrics for monitoring and prediction of the grid setup apart from the regular resource metrics used. The efficiency of a grid setup can be estimated more accurately considering the network metrics. The existing resource selection and job scheduling algorithms can be altered by including the network metrics to improve the efficiency. Such improved version of algorithms will also consider the dynamic change in network load and other bottleneck situations. Another research issue is to optimize the dynamically varying network load by the process of tuning. The impact of network monitoring and tuning has to be optimized such that it doesn't source the bottle neck situation while monitoring and as well as in tuning.


Tommaso Coviello, Tiziana Ferrari, Kostas Kavoussanakis, Loukik Kudarimoti, Mark Leese, Alistair Phipps, Martin Swany, Arthur S. Trew, "Bridging Network Monitoring and the Grid", CESNET Conference 2006

Dan Gunter, Brian Tierney, Keith Jackson, Jason Lee, Martin Stoufer, "Dynamic Monitoring of High-Performance Distributed Applications", 11th IEEE International Symposium on High Performance Distributed Computing, July, 2002, Edinburgh, Scotland.

A.P. Millar, “Grid monitoring: a holistic approach”, Grid PP UK Computing for Particle Physics (2006).

Xuehai Zhang, Jeffrey L. Freschl, Jennifer M. Schopf, “Scalability analysis of three monitoring and information systems: MDS2, R-GMA, and Hawkeye”, Journal of Parallel and Distributed Computing, Volume 67, Issue 8, 2007.

WANG Junfeng, ZHOU Mingtian, ZHOU Hongxia, "Providing Network Monitoring Service for Grid Computing", Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS'04).

Mark Leese, Rik Tyer and Robin Tasker, “Network Performance Monitoring for the Grid”, (UK e-Science, 2005 All Hands Meeting)

UDPmon webpage,

TCPmon webpage,

R. E. Hughes-Jones,”Writeup for UDPmon: A Network Diagnostic Program”, 2004.

Wu-Chun Chunga and Ruay-Shiung Chang, “A new mechanism for resource monitoring in Grid computing”, 2008.

Serafeim Zanikolas, Rizos Sakellariou, "A taxonomy of grid monitoring systems", Future Generation Computer Systems 21 (2005) 163-188.

A.C. Davenhall & M.J. Leese, “An Introduction to Computer Network Monitoring and Performance”, 2005.

Pinger webpage,

David Medinets & David A. Cafaro, “Monitoring and scheduling”, IBM, 2007.

V. Jacobson, “Traceroute: A tool for printing the route packets take to a network host”, available from

V. Jacobson, C. Leres, S. McCanne, tcpdump, available at

B. Mah, "pchar: A tool for measuring Internet path characteristics," bmah/Software/pchar/.

J. Goujun. “Methods for Network Analysis and Troubleshooting”

IPerf: http://openmaniak. Com/iperf.php.

Thomas J. Hacker, Brian D. Athey, Jason Sommerfield, Pittsburgh, Deborah S. Walker, "Experiences Using Web100 for End-to-End Network Performance Tuning for Visible Human Testbeds".

List of network measurement tools:

Brian Tierney, Ruth Aydt, Dan Gunter, Warren Smith, Martin Swany, Valerie Taylor, Rich Wolski, A grid monitoring architecture, The Global Grid Forum Draft Recommendation (GWD-Perf-16-3), August 2002.

Monitoring and discovery system,

K. Somasundaram, S. Radhakrishnan, “Task Resource Allocation in Grid using Swift Scheduler”, Int. J. of Computers, Communications & Control, ISSN 1841-9836, E-ISSN 1841-9844 Vol. IV (2009), No. 2, pp. 158-166 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS.

Supranamaya Ranjan, Edward Knightly, “High Performance Resource Allocation and Request Redirection Algorithms for Web Clusters”.

Ken'ichiro Shirose, Satoshi Matsuoka, "Autonomous Configuration of Grid Monitoring Systems", Proceedings of the 2004 International Symposium on Applications and the Internet Workshops (SAINTW'04).

Rajkumar Buyya†, David Abramson†, and Jonathan Giddy, "Grid Resource Management, Scheduling and Computational Economy", In Proceedings of the 2nd International Workshop on Global and Cluster Computing (WGCC'2000).

Albert Reuther and Joel Goodman, "Dynamic Resource Management for a Sensor-Fusion Application via Distributed Parallel Grid Computing".
Fufang Li, Deyu Qi, Limin Zhang, Xianguang Zhang, and Zhili Zhang, "Research on Novel Dynamic Resource Management and Job Scheduling in Grid Computing", Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

Junwei Cao, Darren J. Kerbyson, and Graham R. Nudd, "Performance Evaluation of an Agent-Based Resource Management Infrastructure for Grid Computing".