High Performance Computing Servers By Utilization Of Free Systems Using Clustering Computer Science Essay

Published:

Clustering is the use of multiple computers typically PCs or UNIX work stations , multiple storage devices and redundant interconnections, to form what appears to users as a single highly available system. The concept of a cluster was born when people first tried to spread different jobs over more computers and then gather back the data those jobs produced. The performance gap between a Supercomputer and a cluster of multiple personal computers has become smaller.The openMosix software package turns networked computers running GNU/Linux into a cluster. It is a kernel extension for single-system image clustering. openMosix is a tool for a Unix-like kernel, such as Linux, consisting of adaptive resource sharing algorithms. It automatically balances the load between different nodes of the cluster, and nodes can join or leave the running cluster without disruption of the service. openMosix, tightly couples all the systems, so that they cannot be used for other purposes. This disadvantage is over come by our implementation.

Lady using a tablet
Lady using a tablet

Professional

Essay Writers

Lady Using Tablet

Get your grade
or your money back

using our Essay Writing Service!

Essay Writing Service

The networks in colleges, all the systems (client systems) are not always in use. Consider a set of systems is free at a moment. When the server is busy at this moment, why can't we make the server to use the resources of these systems and load balance? The main point to be noted here is that the server should not be given much of the loads, like load balancing, resource allocation etc. For this purpose my plan is to design an Intelligence system that will monitor all the systems in the network (at specific intervals) and report the server, the systems that are free at that moment. Now our Intelligence system will migrate the processes to those systems and get the process done.

An Introduction to Clustering:

Clustering is the use of multiple computers typically PCs or UNIX work stations , multiple storage devices and redundant interconnections, to form what appears to users as a single highly available system. Clustering is the term used to describe the means of connecting multiple computers so that they behave as a single unit. In a computing cluster individual processors function as an integrated computing resource. Combining multiple computers to form one computing unit as the benefits of increased computing power, and ensuring high availability of computing resources by the elimination of single points of failure, and the ability to balance loads on the resources available. To users and applications, the clusters feel and behave like a single, very power computer. Cluster computing is therefore is the use of clustered systems to run applications. Key components of the clustered system are the number of nodes (computers) involved, the means of connecting them together (the hardware and communication protocols used), and the manner in which the nodes access resources (memory, disk) within the cluster.

Supercomputers vs. clusters

Traditionally Supercomputers have only been built by a selected number of vendors: a company or organization that required the performance of such a machine had to have a huge budget available for its Supercomputer. Lots of universities could not afford the costs of a Supercomputer by themselves; therefore other alternatives were being researched by them. The concept of a cluster was born when people first tried to spread different jobs over more computers and then gather back the data those jobs produced. With cheaper and more common hardware available to everybody, results similar to real Supercomputers were only to be dreamed of during the first years, but as the PC platform developed further, the performance gap between a Supercomputer and a cluster of multiple personal computers became smaller.

Types Of Clusters:

Basically there are 3 types of clusters,

Fail-over clusters

Load-balancing clusters

High Performance Computing clusters

The most deployed ones are probably the Failover cluster and the Load-balancing Cluster.

Fail-over Clusters: It consists of 2 or more network connected computers with a separate heartbeat connection between the 2 hosts. The Heartbeat connection between the 2 machines is being used to monitor whether all the services are still in use: as soon as a service on one machine breaks down the other machines try to take over.

Load-balancing cluster: With load-balancing clusters the concept is that when a request for say a web-server comes in, the cluster checks which machine is the least busy and then sends the request to that machine. Actually most of the times a Load-balancing cluster is also a Fail-over cluster but with the extra load balancing functionality and often with more nodes.

Lady using a tablet
Lady using a tablet

Comprehensive

Writing Services

Lady Using Tablet

Plagiarism-free
Always on Time

Marked to Standard

Order Now

High Performance Computing Cluster: the machines are being configured specially to give data centers that require extreme performance what they need. Beowulfs have been developed especially to give research facilities the computing speed they need. These kinds of clusters also have some load-balancing features; they try to spread different processes to more machines in order to gain performance. But what it mainly comes down to in this situation is that a process is being parallelized and that routines that can be ran separately will be spread on different machines instead of having to wait till they get done one after another.

Most common known examples of loadbalancing and failover clusters are webfarms, databases or firewalls.

Cluster Components:

The key components comprising a commodity cluster are the nodes performing the computing and dedicated interconnection network providing the data communication among the nodes. They include the following

Cluster Node Hardware:

A node of a cluster provides the system computing and data storage capability. The node integrates several key subsystems in a single unit these are

Processor

Memory

Secondary Storage

Cluster Network Hardware:

Installing a basic cluster requires at least 2 network connected machines, either using a cross−cable between the two network cards or using a switch or hub (a switch is much better than a hub though and only costs a few bucks more). Of course the faster your network−cards the easier you will get better performance for your cluster. These days Fast Ethernet (100 Mbps) is standard; putting multiple ports in a machine isn't that difficult, but make sure to connect them through other physical networks in order to gain the speed you want.

openMosix's Role in Clustering

The openMosix software package turns networked computers running GNU/Linux into a cluster. It automatically balances the load between different nodes of the cluster, and nodes can join or leave the running cluster without disruption of the service. The load is spread out among nodes according to their connection and CPU speeds.

openMosix is a kernel extension for single-system image clustering. openMosix is a tool for a Unix-like kernel, such as Linux, consisting of adaptive resource sharing algorithms. It allows multiple uniprocessors (UP) and symmetric multiprocessors (SMP nodes) running the same kernel to work in close cooperation.

Since openMosix is part of the kernel and maintains full compatibility with Linux, a user's programs, files, and other resources will all work as before without any further changes. The casual user will not notice the difference between a Linux and an openMosix system. To her, the whole cluster will function as one (fast) GNU/Linux system.

The openMosix resource sharing algorithms are designed to respond on-line to variations in the resource usage among the nodes. This is achieved by migrating processes from one node to another, preemptively and transparently, for load-balancing and to prevent thrashing due to memory swapping. The goal is to improve the cluster-wide performance and to create a convenient multi-user, time-sharing environment for the execution of both sequential and parallel applications. The standard runtime environment of openMosix is a computing cluster, in which the cluster-wide resources are available to each node.

openMosix Technology

The openMosix technology consists of two parts:

Preemptive Process Migration (PPM) mechanism.

A set of algorithms for adaptive resource sharing.

Both parts are implemented at the kernel level, using a loadable module, such that the kernel interface remains unmodified. Thus they are completely transparent to the application level.

The PPM can migrate any process, at anytime, to any available node. Each process has a Unique Home-Node (UHN) where it was created. Normally this is the node to which the user has logged-in. The PPM is the main tool for the resource management algorithms. As long as the requirements for resources, such as the CPU or main memory are below a certain threshold, the user's processes are confined to the UHN. When the requirements for resources exceed some threshold levels, then some processes may be migrated to other nodes to take advantage of available remote resources. The overall goal is to maximize the performance by efficient utilization of the network-wide resources.

Lady using a tablet
Lady using a tablet

This Essay is

a Student's Work

Lady Using Tablet

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Examples of our work

The main resource sharing algorithms of openMosix are the load-balancing and the memory ushering. The dynamic load-balancing algorithm continuously attempts to reduce the load differences between pairs of nodes, by migrating processes from higher loaded to less loaded nodes. This scheme is decentralized all the nodes execute the same algorithms and the reduction of the load differences is performed independently by pairs of nodes. The number of processors at each node and their speed are important factors for the loadbalancing algorithm. This algorithm responds to changes in the loads of the nodes or the runtime characteristics of the processes. It prevails as long as there is no extreme shortage of other resources such as free memory or empty process slots.

Implementation Part:

Utilizations of College Lab Networks

Consider the networks in our colleges, all the systems (client systems) are not always in use. Consider a set of systems is free at a moment. When the server is busy at this moment, why can't we make the server to use the resources of these systems and load balance? The main point to be noted here is that the server should not be given much of the loads, like load balancing, resource allocation etc. For this purpose my plan is to design an Intelligence system that will monitor all the systems in the network (at specific intervals) and report the server, the systems that are free at that moment. Now our Intelligence system will migrate the processes to those systems and get the process done.

Thus the load on the server is easily managed with those resources that are free with us. With this, our server can be made a high performance computing system like a super computer at a very low budget. The cost is only for the Intelligence system to be designed.

This project can not only be used to enhance the servers, but also be used to enhance the loads in the clients. For example, installation of software in a client needs much of the resources. The resources of other systems which are free at that moment can be used, so that the system currently in use will be much faster.

Kernel-level clustering such as OpenMOSIX will not be possible. This allows any parallel tasks to be distributed, but introduces very tight coupling between machines. Systems based on process migration may not perform well with the many short-lived processes generated during compilation.

Algorithms:

Remote login the local hosts

Append the host name to the table like data structure

Maintain and refresh the table checking the hosts periodically.

Write a separate thread for checking new available systems.

On the request to the server take the first available remote host to serve the request

Update the resource available to the table (Increment the serving count by the one)

When request terminates decrement the serving count appropriate to the host in the table

Loop back to the Step 3

Design of the Model

FREE MACHINES (CLIENTS)

SERVER

LOAD

BALAN-CING SERVER

Scheduler in the Load Balancing Server

This idea can make use of a basic load-balancing algorithm to choose a volunteer to run each particular job. Scheduling is managed by small state files kept in each user's home directory. In addition, each server imposes a limit on the number of jobs which will be accepted at any time. This prevents any single server being overloaded, and provides basic coordination over servers being used by several clients. By default, the limit on the number of jobs is set to two greater than the number of processors in the server, to allow for up to two jobs being in transit across the network at any time.

The server uses a pre-forking design similar to Apache: the maximum number of client processes are forked at startup, and all of them wait to accept connections from a client. This puts a limit on concurrency, and slightly reduces the latency of accepting a new connection.

Clients keep a temporary note of machines which are unreachable or failing. These notes time out after sixty seconds, so machines which reboot or are temporarily unreachable will shortly rejoin the cluster. Machines which are abruptly terminated or shut down while running jobs may cause those particular compilations to fail, which can normally be addressed by re-running the job.

SSH

The project has the option of using a helper program such as ssh to open connections rather than simply opening a TCP socket.

When OpenSSH is used to open connections, all data is strongly encrypted. The volunteer machines do not need any additional listening ports or long-running processes. All remote compilations run under the account of the user that initiated them.

Using OpenSSH or another pluggable connection method is a powerful Unix design pattern. Just a few hundred lines in distcc to fork the connection program and open pipes allows it to use world-class encryption and authentication that easily integrates into existing networks.

Any connection that reaches the load balancing server is by definition authorized either by ssh opening the socket, or because the administrator allowed connections from that address.

LZO compression

This project can make use of the LZO fast compressor to compress requests and replies. This option is explicitly set by the client in the host list.The preprocessed source is typically very gassy because of long runs of whitespace and repeated identifiers, and compresses well. Object files also compress moderately well.

Characteristics of the proposed idea

A performance model

Performance is predicted by the network bandwidth, the time to run the preprocessor and the compiler, the relative speeds of the client and volunteer computers, and the overhead of the client and server. The speed of executing a job on a particular machine depends on available memory, contention for the CPU, and the speed of the CPU, RAM, and bus.

Is it safe?

Any bugs reported in the volunteer systems will be appended in the log . The error or bugs can be detected as in the same system. This would enable easy access and control of the executions in those systems. The log files can be read for the detailed explanations of the reported bugs.

Scalability

This project is nearly linearly scalable for small numbers of CPUs. Execution across three identical machines is typically 2.5 to 2.8 times faster than local execution. Executions across sixteen machines have been reported at over ten times faster than a local execution. These numbers include the overhead of execution, and the time for non-parallel or non-distributed tasks.

Lifting compilation

It is fairly common for a developer to be working on a relatively slower machine: perhaps on a laptop, or an embedded system, or perhaps a large server machine is available. It is quite possible for the benefit in running on a faster CPU to be greater than the overhead of executions and transferring the build across the network.

Security

Remote execution of compile jobs introduces a trust relationship between the client and server machines. The client completely trusts the server to compile code correctly. A corrupt server could introduce malicious code into the results, or attack the client in other ways. The server completely trusts an authorized client. A malicious client could execute arbitrary commands on the server.

In TCP mode network transmissions are not encrypted or signed. An attacker with passive access to the network traffic can see the source or object code. An attacker with active access can modify the source or object code, or execute arbitrary commands on the server.

The TCP server can limit clients by either checking the client address, or listening only on particular IP interfaces. However, in some circumstances an attacker can spoof connections from a false address. On untrusted networks SSH is the only safe choice.

In SSH mode the server is started by each user under their own account on the server. There is no long-lived daemon. Connections are authenticated and encrypted. Every user is allowed only the privileges that their normal account allows.

Because the server and client necessarily trust each other, there has been no security audit of the code that runs after a connection is established. It is possible that a hostile server could gain control of a client directly, as well as modifying the object code.

Good logging

Errors and trace messages from the server are written both into a log file, and also into the compiler error stream passed back to the client. This makes it more likely that if a problem occurs on the server, it will be obvious to the remote user without their having to look

in the server's log file. This pattern is very useful, though it might not be appropriate for software that handles requests from untrusted users. Errors during server startup are written to error output as well as to the log file, to make it more obvious to the administrator that the server did not actually start.

Conclusion:

The availability of low cost high performance systems and network technologies, coupled with the availability of the required software for commercial venders, as well as academic and open sources, makes it very feasible for any organization to setup a cluster and use the resulting benefits of high availability, scalability and robustness. A common use of cluster computing is to load balance traffic on high-traffic websites. Cluster computing can also be used as a relatively low cost form of parallel processing and for scientific and other applications that lend themselves to parallel operations. The main advantages of these clusters are

High availability:

If one node of the cluster fails, the other nodes starts providing service making mission critical application continually available to clients. This process is called failover.

Scalability:

Using clustering technology, less power computing systems can be enhanced when grouped into clusters.

Manageability:

Since both nodes of the cluster are managed as a single system administrative tasks are made easier.