This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
MIMD (Multiple Instruction stream, Multiple Data stream) computer system has a number of independent processors operate upon separate data concurrently. Hence each processor has its own program memory or has access to program memory. Similarly, each processor has its own data memory or access to data memory. There needs to be a mechanism to load the program and data memories and a mechanism for passing information between processors as they work on some problem. MIMD has clearly emerges the architecture of choice for general-purpose mutiprocessors. MIMD machines offer flexibility. With the correct hardware and software support, MIMDs can function as single user machines focusing on high performance for one application, as multiprogrammed machines running many tasks simultaneously, or as some combination of these functions. There are two types of MIMD architectures: distributed memory MIMD architecture, and shared memory MIMD architecture.
Shared Memory MIMD Architecture
The distinguishing feature of shared memory systems is that no matter how many memory blocks are used in them and how these memory blocks are connected to the processors and address spaces of these memory blocks are unified into a global address space which is completely visible to all processors of the shared memory system. Issuing a certain memory address by any processor will access the same memory block location. However, according to the physical organization of the logically shared memory, two main types of shared memory system could be distinguished:
Physically shared memory systems
Virtual (or distributed) shared memory systems
Shared memory systems all memory blocks can be accessed uniformly by all processors. In distributed shared memory systems the memory blocks are physically distributed among the processors as local memory units.
The three main design issues in increasing the scalability of shared memory systems are:
Organization of memory
Design of interconnection networks
Design of cache coherent protocols
Shared memory systems are basically classified according to their memory organization since this is the most fundamental design issue. Accordingly, shared memory systems can be divided into four main classes:
Uniform memory access (UMA) machines
Non-uniform memory access (NUMA) machines
Cache-coherent non-uniform memory access (CC_NUMA) machines
Cache-only memory access (COMA) machines
UMA machines belong to the physically shared memory architecture class, while NUMA, CC_NUMA, and COMA machines form the class of distributed shared memory architectures. The four classes cover the three generations of shared memory systems. Their first generation contains the UMA machines where the interconnection network was based either on the concept of shared bus in order to construct low-price multiprocessors or on multistage networks in order to build massively parallel shared memory systems.
Dynamic networks have some drawbacks compared with the static networks applied in multicomputers. Dynamic networks are either too expensive (switching networks) or they can support only a limited number of processors.
Uniprocessors have successfully demonstrated the benefits of cache memory in order to increase memory bandwidth. Accordingly, most of the shared memory systems employ cache memories, too. However, the application of caches in a multiprocessor environment gives rise to the so-called cache consistency problem. In order to solve the problem of maintaining data consistency in the caches, the cache coherence protocol must be added to the traffic on the network. The extra traffic deriving from the protocol reduces the benefits of the caches and hence, careful design is necessary to introduce a protocol of minimal complexity.
The Design Space and Classification of Shared Memory Architectures
Shared Path Networks
Single shared bus
One of the most popular interconnection networks is the single shared bus which has several advantageous features. Firstly, its organization is simply a generalization and extension of the buses employed in uniprocessor systems. It contains the same bus lines (address, data, control, interrupt) as uniprocessors and some additional ones to solve the contention on the bus when several processors simultaneously want to use the shared bus. These lines are called arbitration lines and play a crucial role in the implementation of shared buses. Secondly, the shared bus ins a very cost-effective interconnection scheme. Increasing the number of processors does not increase the price of the shared bus. However, the contention on the shared bus represents a strong limitation on the number of applicable processors. Obviously, as the number of processors on the bus increases, the probability of contention also increases proportionally, reaching a point when the whole bandwidth of the bus is exhausted by the processors and hence, adding a new processor will not cause any potential speed-up in the multiprocessor. One of the main design issues in shared bus multiprocessors is the enhancement of the number of applicable processors by different methods. The three most important techniques are:
Multiple shared bus
The limited bandwidth of the single shared bus represents a major limitation in building scalable multiprocessors. There are several ways to increase the bandwidth of the interconnection network. A natural idea is to multiply the number of buses, like the processors and memory units. Four different ways have been proposed for connecting buses to the processors, memory units and other buses:
1-dimension multiple bus system
2- or 3-dimension bus systems
cluster bus system
hierarchical bus system
The simplest generalization of the single bus system towards a multiple bus system is the 1-dimension multiple bus system. This approach leads to a typical UMA machine where any processor can access any memory unit through any of the buses. A further generalization of the 1-dimension multiple buses is the introduction of the second and third dimensions.
Two-dimension multiple buses are employed in the Aquarius Multi-Multi architecture, while the use of a 3-dimension multiple bus system was proposed in the Wisconsin Multicube machine. In these systems, multiple buses compose a grid interconnection network. Each processor node is connected to a row bus and to a column bus. Processors along a row or column constitute a conventional single bus multiprocessor. The memory can be distributed in several ways. The most traditional approach is to attach memory units to each bus, but the main problem is the maintenance of cache coherency.
The third alternative to introduce several buses into the multiprocessor is the cluster architecture which represents a NUMA machine concept. The main idea of cluster architectures is that single bus multiprocessors, called clusters, are connected by a higher-level bus. Each cluster has its own local memory. The access time of a local cluster memory is much less than the access time of a remote cluster memory. Keeping the code and stacks in the cluster memory can significantly reduce the need to access remote cluster memory. However, this structure cannot avoid traffic jams on higher-level buses without cache support.
Another natural generalization of the single bus system is the hierarchical bus system where single bus â€˜supernodesâ€™ are connected to a higher-level bus via a higher-level cache. By recursively applying these construction techniques, arbitrarily large networks can be built. The main advantage of this approach is that each bus level can work as a single bus system. However, it raises the problem of maintaining cache consistency in a hierarchical system. The hierarchical bus system can also be used without main memory, employing only caches.
Multistage networks represent a compromise between the single bus and the crossbar switch interconnections from the point of view of implementation complexity, cost, connectivity and bandwidth. A multistage network consists of alternating stages of links and switches. Many kinds of multistage networks have been proposed and built so far. They can be categorized according to the number of stages, the number of switches at a stage, the topology of links connecting subsequent stages, the type of switches employed at the stages and the possible operation nodes. The complete design space of multistage networks is shown below.
The chief drawback of these machines is that their processors are individually quite slow, so that less than completely parallel algorithms can do poorly. Also, the SIMD restriction leads to an inevitable loss of performance: while a few processors are working on enforcing boundary conditions, for example, the others have to wait. Finally, these machines (or rather efficient programs for them) tend to require the use of very large working storage. Thus, they are relatively inefficient in their use of memory. This may be a significant drawback in that it limits the largest problems that can be solved.