Systolic Architecture: History and Applications
Disclaimer: This work has been submitted by a student. This is not an example of the work written by our professional academic writers. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.
Published: Wed, 13 Dec 2017
A network of PEs that rhythmically produce and pass data through the system is called systolic architecture. It is used as a co processor in combination with a host computer and the behavior is analogous to the flow of blood through heart; thus named SYSTOLIC.
· A systolic architecture has the following characteristics :
- A massive and non-centralised parallelism
- Local communications
- Synchronous evaluation
· Example of systolic network
1. Linear network
2. Bi-dimensional network
3. Hexagonal network
The systolic architecture paradigm, data-stream-driven by data counters, is the counterpart of thevon Neumann paradigm, instruction-stream-driven by a program counter. Because a systolic architecture usually sends and receives multiple data streams, and multiple data counters are needed to generate these data streams, it supportsdata parallelism.The namederives from analogy with the regular pumping of blood by the heart.
H. T. KungandCharles E. Leierson published the first paper describing systolic arrays in 1978; however, the first machine known to have used a similar technique was theColossus Mark IIin 1944.
NEED FOR SYSTOLIC ARCHITECHTURE:-
We need a high-performance, special-purpose computer system to meet specific application. I/O and computation imbalance is a notable problem .The concept of Systolic architecture can map high-level computation into hardware structures .Systolic system is easy to implement because of its regularity and easy to recon .Systolic architecture can result in cost-effective , high- performance special-purpose systems for a wide range of problems.
An efficient approach to design very large scale integration (VLSI) architectures and a scheme for the implementation of the discrete sine transform (DST), based on an appropriate decomposition method that uses circular correlations, is presented. The proposed design uses an efficient restructuring of the computation of the DST into two circular correlations, having similar structures and only one half of the length of the original transform; these can be concurrently computed and mapped on to the same systolic array. Significant improvement in the computational speed can be obtained at a reduced input-output (I/O) cost and low hardware complexity, retaining all the other benefits of the VLSI implementations of the discrete transforms, which use circular correlation or cyclic convolution structures. These features are demonstrated by comparing the proposed design with some of the previously reported schemes.
A more computationally efficient and scalable systolic architecture is provided for computing the discrete Fourier transform. The systolic architecture also provides a method for reducing the array area by limiting the number of complex multi pliers. In one embodiment, the design improvement is achieved by taking advantage of a more efficient computation scheme based on symmetries in the Fourier transform coefficient matrix and the radix-4 butterfly. The resulting design provides an array comprised of a plurality of smaller base-4 matrices that can simply be added or removed to provide scalability of the design for applications involving different transform lengths to be calculated. In this embodiment, the systolic array size provides greater flexibility because it can be applied for use with any transform length which is an integer multiple of sixteen. A systolic network is often use with a host station responsible for the communication with the outside world.As a result of the local-communication scheme, a systolic network is easily extended without to add any burden to the I/O.
CHARACHTERSTICS OF SYSTOLIC ARCHITECHTURE:-
- A massive and non-centralised parallelism
- Local communications
- Synchronous evaluation
- Data coming from the memory are used several time before to come back to it.
- These architectures are well suited for a VLSI or FPGA network implementation.
· Other characteristics :
- A systolic network is often use with a host station responsible for the communication with the outside world.
- As a result of the local-communication scheme, a systolic network is easily extended without to add any burden to the I/O.
PRINCIPLE OF SYSTOLIC ARCHITECHTURE:-
Systolic system consists of a set interconnected cells, each capable of performing some simple operation. Systolic approach can speed up a compute-bound computation in a relatively simple and inexpensive manner. Through systolic array we achieve higher computation throughput without increasing memory bandwidth.
Which means that by using systolic architechture we can speed up our system. For eg. When we are using our simple aechitecture than we can perform atmost five million operatoins per second whereas by using systolic arrays in systolic architecture we can operate the system at a speed of 30 million operations per second.
Systolic architecture consists of simple cells arranged in some regular pattern (linear, bi-directional, triangular, hexagonal, etc.) where each cell usually performs one operation. Each processing cell is connected to its neighbour or to a neighbour hood of processing elements by short signal paths. Both parallel and pipelined execution is implemented. A function that is to be performed can be represented by a set of Functional Primitives.
The systolic structure has advantages of regularity and modularity over implementations of the block-state-variable form, as it is regular and an nth order filter is simply formed by cascading second order filters. Therefore it is more suitable for the VLSI implementation. The idea is to exploit VLSI efficiently by laying out algorithms (and hence architectures) in 2-D (not all systolic machines are 2-D, but most are). The architectures thus produced are not general but tied to specific algorithms. This is good for computation-intensive tasks (e.g. signal processing).
TOOLS FOR SYSTOLIC ARCHITECTURE:-
Incomputer architecture, asystolic arrayis a pipe network arrangement of processing units called cells. It is a specialized form ofparallel computing, where cells (i.e. processors), compute data and store it independently of each other.
We need a high-performance, special-purpose computer system to meet specific application. I/O and computation imbalance is a notable problem.
The concept of Systolic architecture can map high-level computation into hardware structures. Systolic system works like an automobile assembly line. Systolic system is easy to implement because of its regularity and easy to recon. Systolic architecture can result in cost-effective, high-performance special-purpose systems for a wide range of problem.
Systolic Array Example:
3×3 Systolic Array Matrix Multiplication:-
DESCRIPTION OF SYSTOLIC ARRAYS:-
· Description :
- It is a network of interconnected processing units.
- Only the processors at the border of the architecture can communicate outside.
- The task of one cell can be summarised as : receive-compute-transmit
A systolic array is composed of matrix-like rows of data processing units called cells. Data processing unitsDPUsare similar tocentral processing units(CPU)s,(except for aprogram counter, since operation istransport-triggered, i.e., by the arrival of a data object). Each cell shares the information with its neighbors immediately after processing. The systolic array is often rectangular where data flows across the array between neighbour DPUs, often with different data flowing in different directions. The data streams entering and leaving the ports of the array are generated byauto-sequencing memoryunits, ASMs. Each ASM includes adata counter. Inembedded systemsa data stream may also be input from and/or output to an external source.
An example of a systolicalgorithmmight be designed formatrix multiplication. Onematrixis fed in a row at a time from the top of the array and is passed down the array, the other matrix is fed in a column at a time from the left hand side of the array and passes from left to right. Dummy values are then passed in until each processor has seen one whole row and one whole column. At this point, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing down or across the array.
Systolic arrays are arrays of DPUs which are connected to a small number of nearest neighbour DPUs in a mesh-like topology. DPUs perform a sequence of operations on data that flows between them. Because the traditional systolic array synthesis methods have been practiced by algebraic algorithms, only uniform arrays with only linear pipes can be obtained, so that the architectures are the same in all DPUs. The consequence is that only applications with regular data dependencies can be implemented on classical systolic arrays. LikeSIMDmachines, clocked systolic arrays compute in “lock-step” with each processor undertaking alternate compute | communicate phases. But systolic arrays with asynchronous handshake between DPUs are calledwave front arrays. One well-known systolic array is CMU’s I Warp processor, which has been manufactured by Intel. An I Warp system has a linear array processor connected by data buses going in both directions.
· Super Systolic Array :
The super systolic array is a generalization of the systolic array. Because the classical synthesis methods (algebraic, i. e. projection-based synthesis), yielding only uniformDPUarrays permitting only linear pipes, systolic arrays could be used only to implement applications with regular data dependencies. By using simulated annealinginstead,Rainer Kresshas introduced the generalized systolic array: the super systolic array. Its application is not restricted to applications with regular data dependencies.
An application Example – Polynomial Evaluation
Horner’s rule for evaluating a polynomial is:
y= (…(((an*x+an− 1) *x+an− 2) *x+an− 3) *x+ … +a1) *x+a0
A linear systolic array in which the processors are arranged in pairs: one multiplies its input byxand passes the result to the right, the next addsajand passes the result to the right.
Advantages and Disadvantages:-
- Highly specialized for particular applications
- Difficult to build
More about systolic architectures :-
Systolic architectures are designed by using linear mapping techniques on regular dependence graphs (DG).
- Regular Dependence Graph : The presence of an edge in a certain direction at any node in the DG represents presence of an edge in the same direction at all nodes in the DG.
- DG corresponds to space representation no time instance is assigned to any computation t=0. • Systolic architectures have a space-time representation where each node is mapped to a certain processing element(PE) and is scheduled at a particular time instance.
- Systolic design methodology maps an N-dimensional DG to a lower dimensional systolic architecture.
- Mapping of N-dimensional DG to (N-1) dimensional
o systolic array is considered.
- A massively parallel processing with limited input output communication with host computer.
- Suitable for many interactive operations.
- Replace single processor with an array of regular processing elements
- Orchestrate data flow for high throughput with less memory access a. Different from pipelining
- Nonlinear array structure, multidirection data flow, each PE
- may have (small) local instruction and data memory
- Different from SIMD: each PE may do something different
- Initial motivation: VLSI enables inexpensive special-purpose chips
- Represent algorithms directly by chips connected in regular pattern
* Text Book
Cite This Work
To export a reference to this article please select a referencing stye below: