Study On Shared Memory And Cluster Computing Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Cache memory is a random access memory (RAM) that a computer microprocessor can access more quickly than it can access static memory. As the microprocessor processes data, it looks first for it in the cache memory. If the microprocessor find data it wins time. Indeed cache memory is really useful for the microprocessor. What happens if the memory is shared in a cluster computing system? What impact does a shared memory have on speed exchange in cluster computing? That will be my principal way but I will focus on architectural impacts. In this analysis I will define cluster computing, furthermore I will focus on the impact of shared memory on the architecture. Afterwards, I will attempt to compare architectural performance. Finally I will draw some conclusions from the research.

Cluster Computing is the use of multiple computers. Usually PCs or UNIX workstations, multiple storage devices and redundant interconnections. The system appears as a single highly available system. The computers are linking in order to take advantage of the parallel processing power of those computers. There are 3 types of cluster computing. High availability clusters are designed to ensure constant access to service applications. Load-balancing clusters are operate by routing all work through one or more load-balancing front-end nodes, which then distribute the workload efficiently between the remaining active nodes. Hight-performance clusters are designed to exploit the parallel processing power of multiple nodes. Actually, computer clusters offers a number of benefits like reduced cost, processing power, improved network technology, scalability and high availability.

In order to learn a lot of things about clusters I have to compare an architecture share-everything (cluster) with an share-nothing architecture.

1.2. Analysis of different architectures

In this part, I want to describe the two architectures: SMP (Symmetric Multiprocessors) and MPP (Massively Parallel Processors). In a computer with more than 1 processor the memory can be shared or not. So, I want to find differences about this two types of architecture.

1.2.1 SMP Symmetric Multiprocessors (share-everything)

An SMP architecture is a multi-processor machine with a set of identical processors that share physical memory, inputs and outputs. These machines that access the same speed advantages of the memory areas are also machines called UMA (Uniform Memory Access). An SMP machine generally has a single operating system that manages the entire architecture. On the contrary, traditional parallel computers, SMP machines do not require an operating system architecture. Indeed there are now common OS like Linux, Windows NT or Sun Solaris for SMPs.

Schematic architecture of SMP

Often, these operating systems simulate the principle of a single system image. They represent to the user the complex architecture of an SMP as a simple desktop computer.

As the operating system is unique and the coordination (communication, synchronisation, ...) is done via shared memory between processors, a SMP machine is easy to program. Particular attention should be paid to competition concerns, as in shared memory, the concurrent same data is possible. A big disadvantage of SMP is its limit of extension. Indeed, we can not increase indefinitely numbers of processors because they all access the same memory.

1.2.2 MPP Massively Parallel Processors (share-nothing)

In a multi-processor architecture type MPP, on the contrary machines SMP processors do not share a single memory or inputs and outputs. Each processor has its own memory and has a fast interconnection with other processors. Each processor in an MPP has its own operating system, making it more difficult to achieve a single system image.

Schematic architecture of MPP

The use of standard components result in a good cost / performance. MPP machines have no limit on the number of processors and are easily extensible. However, given the lack of a single system image operating, their programming is more difficult than that of SMP machines. Thus, communication and coordination is explicitly between different processors.

Unlike SMPs, the memory access virtually common then depends on the physical location of the processor and memory in the parallel machine. This machine is called (NUMA) Non Uniform Memory Access. But if that wouldn't be enough, the programmer's job requires also the different levels of cache processors are synchronised with the memory addresses they represent. It is call machine ccNUMA (Cache coherency NUMA).

1.2.3 Limits

I have already talked about limits. I showed that the limits for SMP architecture are a problem because the memory is shared so the number of processors are limited. Nevertheless NUMA (Non Uniform Memory Access) may permit to solve this problem. In a SMP architecture is it easy to improve the performance because we just have to add some processors. Besides, if the application is running on the cluster it is not developed for this architecture, therefore the performance will not be improve. It will be faster in the way of two different applications will be working at the same time. As the system is managed by only one operating system if it just one part doesn't work, the whole system will break down.

For MPP as SMP the applications have to be developed respecting the architecture.

1.3. Managers for SMP and MPP

In order to manage the memory in SMP or MPP new managers had been create. To improve the performance limits of SMP and MPP

1.3.1 NUMA (Non Uniform Memory Access)

To solve the problems of simultaneous access to shared memory in SMP it exists a system named NUMA (Non Uniform Memory Access). Indeed, NUMA is a multi-processor system. This system permit to separate the memory and put it in different places.

Schematic architecture of NUMA

1.3.2 UMA (Uniform Memory Access)

UMA means Uniform Memory Access. It is a shared memory architecture used in parallel computers. All the processors in the UMA model share the physical memory uniformly. In a UMA architecture, access time to a memory location is independent of which processor makes the request or which memory chip contains the transferred data. Uniform Memory Access computer architectures are often contrasted with Non-Uniform Memory Access (NUMA) architectures.

Schematic architecture of UMA

In the UMA architecture, each processor may use a private cache. Peripherals are also shared in some fashion, The UMA model is suitable for general purpose and time sharing applications by multiple users. It can be used to speed up the execution of a single large program in time critical applications. Unified Memory Architecture (UMA) is a computer architecture in which graphics chips are built into the motherboard and part of the computer's main memory is used for video memory.

1.3. Memory Hierarchy

Speed exchange also depends of memory hierarchies, sometimes the computer have to access cache memory. Like that the query will be quicker. However, it will be longer if the processor want to access to the hard drives.

The term memory hierarchies is used in the theory of computation when discussing performance issues in computer architectural design, algorithm predictions, and the lower level programming constructs such as involving locality of reference. Besides, a memory hierarchy in computer storage distinguishes each level in the hierarchy by response time.

There are four major storage levels:

- Internal - processors register and cache

- Main - the system RAM and controller cards.

- On-line mass storage - secondary storage.

Off-line bulk storage - tertiary and Off-line storage.

Schematic of computer memory hierarchy

This picture shows us the memory hierarchy. We can see that the more the memory is close to the processor the more the capacity is small and expensive to the product. However the respond time is better.

Obviously, the type of architecture chosen is a factor in speed exchange. Besides, the access level in memory hierarchies is also a factor. Besides, we saw that models of programming play a role in speed exchange.

1.4. Models of programming

Nowadays, there are two models of parallel processing named SIMD and MIMD.

SIMD (Single Instruction stream Multiple Data Streams) where the same operation is performing on different sets.

MIMD (Multiple Instruction stream Multiple Data Streams) where different programs are performing on different sets of data.

There is a particular case of MIMD named SPMD (Single Program Multiple Data). That is the case where a program is executed in parallel on different sets of data. MIMD and SPMD are equivalent.

There are two family of programming model how process and threads dialogue.

communication by shared memory

communication by messages

The model by shared memory correspond to multiprocessor architecture. This model is the more efficient but it has some inconvenience. If a processor is failed we need to reinitiate the whole system.

The communication by messages is using by architecture sharing nothing. There is a standard named PVM. PVM (Parallel Virtual Machine) is used to program scientific applications. Now, there is a new tool named MPI (Message Passing Interface). This two tools permit to program using a standard of programming.

2. Performances

As we have seen, architecture plays a role in performance. We need to detail this point to understand how it works.

2.1. Architectural impacts on query performance

Three fundamental operations composing the steps of query execution plans are table scans, joins, and index lookup operations. Since decision support system performance depends on how well each of these operations are executed, it's important to consider how their performance varies between SMP and MPP architectures.

2.1.1 Table scans

On MPP systems where the database records happen to be uniformly partitioned across nodes, good performance on single-user batch jobs can be achieved because each node/memory/disk combination can be fully utilised in parallel table scans, and each node has an equivalent amount of work to complete the scan. When the data is not evenly distributed, or less than the full set of data is accessed, load skew can occur, causing some nodes to finish their scanning quickly and remain idle until the processor having the largest number of records to process is finished. Because the database is statically partitioned, and the cost of eliminating data skew by moving parts of tables across the interconnect is prohibitively high, table scans may or may not equally utilise all processors depending on the uniformity of the data layout. Thus the impact of database partitioning on an MPP can allow it to perform as well as an SMP, or significantly less well, depending on the nature of the query.

On an SMP system, all processors have equal access to the database tables, so consistent performance is achieved regardless of the database partitioning. The database query coordinator simply allocates a set of processes to the table scan based on the number of processors available and the current load on the Data Warehousing Performance with SMP, Cluster, and MPP Architectures. Table scans can be parallelized by dividing up the table's records between processes and having each processor examine an equal number of records avoiding the problems of load skew that can cripple MPP architectures.

2.1.2 Join operations

Consider a database join operation in which the tables to be joined are equally distributed across the nodes in an MPP architecture. A join of this data may have very good or very poor performance depending on the relationship between the partition key and the join key:

If the partition key is equal to the join key, a process on each of the MPP nodes can perform the join operation on its local data, most effectively utilizing the processor complex

If the partition key is not equal to the join key, each record on each node has the potential to join with matching records on all of the other nodes. When the MPP hosts N nodes, the join operation requires each of the N nodes to transmit each record to the remaining N-1 nodes, increasing communication overhead and reducing join performance. The problem gets even more complex when real-world data having an uneven distribution is analyzed. Unfortunately, with ad hoc queries predominating in decision support systems, the case of partition key not equal to the join key can be quite common.

To make matters worse, MPP partitioning decisions become more complicated when joins among multiple tables are required. For example, consider the schema down, where the DBA must decide how to physically partition three tables: Supplier, PartSupp, and Part. It is likely that queries will involve joins between Supplier and PartSupp, as well as between PartSupp and Part. If the DBA decides to partition PartSupp across MPP nodes on the Supplier key, then joins to Supplier will proceed optimally and with minimum inter-node traffic. But then joins between Part and PartSupp could require high inter-node communication, as explained above. The situation is similar if instead the DBA partitions PartSupp on the Part key.

The database logical schema may make it impossible to partition all tables on the optimum join keys. Partitioning causes un-even performance on MPP, while performance on SMP performance is optimal.

The database logical schema may make it impossible to partition all tables on the optimum join keys. Partitioning causes un-even performance on MPP, while performance on SMP performance is optimal.

For an SMP, the records selected for a join operation are communicated through the shared memory area. Each process that the query coordinator allocates to the join operation has equal access to the database records, and when communication is required between processes it is accomplished at memory speeds that are two orders of magnitude faster than MPP interconnect speeds. Again, an SMP has consistently good performance independent of database partitioning decisions.

2.1.3 Index lookups

The query optimizer chooses index lookups when the number of records to retrieve is a small (significantly less than one percent) portion of the table size. During an index lookup, the table is accessed through the relevant index, thus avoiding a full table scan. In cases where the desired attributes can be found in the index itself, the query optimizer will access the index alone, perhaps through a parallel full-index scan, not needing to examine the base table at all. For example, assume that the index is partitioned evenly across all nodes of an MPP, using the same partition key as used for the data table. All nodes can be equally involved in satisfying the query to the extent that matching data rows are evenly distributed across all nodes. If a global index one not partitioned across nodes is used, then the workload distribution is likely to be uneven and scalability low.

On SMP architectures, performance is consistent regardless of the placement of the index. Index lookups are easily parallelized on SMPs because each processor can be assigned to access its portion of the index in a large shared memory area. All processors can be involved in every index lookup, and the higher interconnect bandwidth can cause SMPs to outperform MPPs even in the case where data is also evenly partitioned across the MPP architecture.

2.1.4 Summary

The lesson here is clear: for the basic building blocks of database queries table scans, joins, and index lookups the scalability of an MPP architecture depends on the partitioning of data across the processor complex. Any one choice in partition key may cause some queries to be as fast as an SMP, and it will cause other queries to fall far short of the performance available from an SMP. This puts database administrators into a bind that has no easy solution. The choice of partition key is critical, and even if the best choice is made, it will still be the wrong choice for some queries demanded of a decision support system.

An SMP gives consistent performance because all processors have equal access to disk resources, and when communication between processes in the DBMS are required, it is accomplished at memory speeds which are two orders of magnitude faster than MPP message-based interconnects. With the consequences of disk layout being so minor compared to database partitioning on an MPP, the choice of SMP architectures is one that brings consistency in performance, as well as scalability.

A skeptical database administrator, however, will not accept these qualitative arguments without real performance data to back them up, and fortunately the TPC-D benchmarks provide a means for all vendors to put forth their best performance measurements.