This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The bus is a subsystem that transfers data between computer components inside a computer or outside computer. In early computers buses were parallel electrical buses with multiple connections, but in modern computers busses can be use both parallel and serial connection.
1.1.2 - History of Bus Architecture
When considering history of bus architecture it can be divided in to three generations
The earliest buses were bunch of wires that attached processing unit, memory and peripherals and there was one bus for memory, and another bus for peripherals and they were accessed by separate instructions, with completely different timings and protocols.
The first bus architecture was S-100 bus in the Altair, and also in the IBM PC similar physical architecture is employed to feed instructions, access peripherals and memory.
These simple bus architectures had a serious weakness when it is used for general-purpose computers. All the peripherals on the bus have to communicate at the same speed, because they share a single clock cycle.
Increasing the speed of the CPU becomes harder, because the speed of all the devices must increase to cope with CPU speed. It is not practical or economical to have all devices as fast as the CPU, so then the CPU must enter a wait state, or work at a slower clock speed temporarily, to work with other component in the computer. While it was acceptable in embedded systems, this problem was not tolerated for long in general purpose computers.
Bus systems like that are also difficult to configure when constructed from common type of equipment. Normally each added expansion card requires many jumpers in order to set memory addresses, I/O addresses, interrupt priorities, and interrupt numbers.
In second generation bus systems like NuBus work out to some of the problems that occur in 1st generation. They separated the computer into two parts, the CPU and memory on one side, and the other devices on a one side. A bus controller accepted data from the CPU side to be moved to the peripherals side, therefore shifting the communications protocol load from the CPU itself. This allowed the CPU and memory side to develop separately from the device bus, or just bus. Devices on the bus could work with each other with no CPU intervention. This led to much better performance, but also required the adaptor cards to be much more complex. These buses also regularly addressed speed issues by being bigger in the size of the data path, moving from 8-bit parallel buses in the first generation, to 16 or 32-bit in the second, as well as adding software setup (Plug-n-play) to replace the jumpers.
But these newer systems had one common quality with their earlier relatives, in that everyone on the bus had to work at the same speed. While the CPU was now isolated and could increase speed without any problem, CPUs and memory continued to increase it speed much faster than the buses that they worked with. The result was that speeds of the bus were now very much slower than what a modern system needed, and the left of lack of data. The common example of this problem was that video cards rapidly outran even new bus systems like PCI, and computers began to incorporate AGP just to drive the video card. An in 2004 AGP was replacing again by high end video cards and other peripherals that support new PCI Express bus.
A growing number of external devices started use their own bus systems as well. When disk drives were first introduced, it would be attached to the computer by a card plugged into the bus, which is the reason computer have so many slots on the bus. Although by 1980s and 1990s, new systems like SCSI and IDE were introduced to coup with this need, leaving most slots in modern systems to be empty. Today there are about five different buses in the typical computer, supporting a range of devices.
The third generations buses have been enter into the world since 2001, with HyperTransport and InfiniBand. They are made to be flexible, keeping used both as internal buses and connecting different systems together. It lead to complex problems when trying to service different requests, so much of the work on these systems consider software design opposite to the hardware. These buses are likely to look more like a network rather than the original concept of a bus. It allowed multiple devices to use the bus at once in the system. And bus architecture like Wishbone were developed by source hardware movement to have touch on this game.
Internal computer buses
External computer buses
Internal/external computer buses
Extended ISA or EISA
Industry Standard Architecture or ISA
Low Pin Count or LPC
MicroChannel or MCA
Multibus for industrial systems
NuBus or IEEE 1196
OPTi local bus
S-100 bus or IEEE 696
SBus or IEEE 1496
PCI Express or PCIe
* Serial ATA (SATA)
USB Universal Serial Bus
Controller area network ("CAN bus")
* IEEE 1394 (FireWire)
Scalable Coherent Interface (SCI)
SCSI Small Computer System Interface, disk/tape peripheral attachment bus
Serial Attached SCSI (SAS) and other serial SCSI buses
Table 1.1 - System buses
1.2 - System Bus
The system bus is a scale down version of the von Neumann computer architecture. The system bus divides the computer into three individual sub component the CPU, memory and input/output. The system bus comes from the von Neumann architecture by combining the arithmetic logic unit (ALU) and the central processing unit (CPU) into a single unit.
The system bus contains three main parts:
Control lines - It allow the CPU to control and operate the devices attached
Address lines - Allows the CPU to reference Memory locations
Data lines - Data which is to be sent or retrieved from a device is placed on to these lines
Figure 1.1 - Von Neumann architecture System bus
2 - MEMORY MANAGEMENT
2.1 - Introduction
Memory management is the managing computer memory. It involves provide location to allocate memory to programs at their request, and freeing memory for reuse when no longer needed. Management main memory of a computer is critical to the every computer system. The MMU or Memory Management Unit is responsible for handling and managing memory inside CPU.
2.2 - Components of the Von Neumann Model
Memory: Storage of information (data/program)
Processing Unit: Computation/Processing of Information
Input: Means of getting information into the computer. e.g. keyboard, mouse
Output: Means of getting information out of the computer. e.g. printer, monitor
Control Unit: Makes sure that all the other parts perform their tasks correctly and at the correct time.
Figure 2.1 - Von Neumann architecture system
2.2 - Communication between Memory and Processing Unit
Communication between memory and processing unit consists of two registers:
Memory Address Register (MAR).
Memory Data Register (MDR).
The address of the location is put in MAR.
The memory is enabled for a read.
The value is put in MDR by the memory.
The address of the location is put in MAR.
The data is put in MDR.
The Write Enable signal is asserted.
The value in MDR is written to the location specified.
Figure 2.2 - Communication between systems
2.3 - CPU data-path
Hardware units like ALU's, registers, memory, etc., are linked together into a data-path.
The flow of bits around the data-path is controlled by the "gates" which allow the bits to flow (on) or not flow (off) through the data-path.
The binary instructions (1 = on; 0 = off) that control the flow are called micro-instructions.
Figure 2.3 - simplified data path
Figure 2.4 - simplified data path in x86
2.4 - Memory Operations
There are two key operations on memory:
Fetch (address) returns value without changing the value stored at that address.
Store (address, value) writes new value into the cell at the given address.
This type of memory is random-access, meaning that CPU can access any value of the array at any time (vs. sequential access, like on a tape).
Such memories are called RAM (random-access memory.)
Some memory is non-volatile, or read-only (ROM or read-only memory.)
Figure 2.4 - Memory Operations
2.5 - MAR and the MDR
MAR stands for memory address register:
MAR is connected to the address bus.
MAR is "the only way" for the CPU to communicate with address bus.
Tri-state buffer between MAR and the address bus prevent MAR from continuously dumping its output to the address bus.
MAR can hold either an instruction address or a data address.
MDR Stands for memory data register.
MDR is connected to the data bus.
Data can go in both directions: to and from memory, therefore, MDR can load its data from
The data bus (for reading data)
One of the CPU registers (for storing data.)
A 2-1 MUX circuit selects input from one of the two.
3 - PIPELINE ARCHITECTURE
3.1 - Introduction
In computing, a pipeline is a set of data processing elements connected in series, so that the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements.
Instruction pipelines, such as the classic RISC pipeline, which are used in processors to allow overlapping execution of multiple instructions with the same circuitry. The circuitry is usually divided up into stages, including instruction decoding, arithmetic, and register fetching stages, wherein each stage processes one instruction at a time.
Graphics pipelines, found in most graphics cards, which consist of multiple arithmetic units, or complete CPUs, that implement the various stages of common rendering operations (perspective projection, window clipping, color and light calculation, rendering, etc.).
Software pipelines, where commands can be written so that the output of one operation is automatically used as the input to the next, following operation. The Unix command pipe is a classic example of this concept; although other operating systems do support pipes as well.clude.
3.1 - Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput (the number of instructions that can be executed in a unit of time).
The fundamental idea is to split the processing of a computer instruction into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step, which is much faster than the time needed to perform all steps at once. The term pipeline refers to the fact that each step is carrying data at once (like water), and each step is connected to the next (like the links of a pipe.)
The origin of pipelining is thought to be either the ILLIAC II project or the IBM Stretch project though a simple version was used earlier in the Z1 in 1939 and the Z3 in 1941.
The IBM Stretch Project proposed the terms, "Fetch, Decode, and Execute" that became common usage.
Most modern CPUs are driven by a clock. The CPU consists internally of logic and memory (flip flops). When the clock signal arrives, the flip flops take their new value and the logic then requires a period of time to decode the new values. Then the next clock pulse arrives and the flip flops again take their new values, and so on. By breaking the logic into smaller pieces and inserting flip flops between the pieces of logic, the delay before the logic gives valid outputs is reduced. In this way the clock period can be reduced. For example, the classic RISC pipeline is broken into five stages with a set of flip flops between each stage.
1. Instruction fetch
2. Instruction decodes and registers fetch
4. Memory access
5. Register writes back
When a programmer (or compiler) writes assembly code, they make the assumption that each instruction is executed before execution of the subsequent instruction is begun. This assumption is invalidated by pipelining. When this causes a program to behave incorrectly, the situation is known as a hazard. Various techniques for resolving hazards such as forwarding and stalling exist.
A non pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle. Pipelining does not completely cancel out idle time in a CPU but making those modules work in parallel improves program execution significantly.
Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another stage until the job is done. This organization of the processor allows overall processing time to be significantly reduced.
A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in each stage. This generally means that the processor's frequency can be increased as the cycle time is lowered. This happens because there are fewer components in each stage of the pipeline, so the propagation delay is decreased for the overall stage.
Unfortunately, not all instructions are independent. In a simple pipeline, completing an instruction may require 5 stages. To operate at full performance, this pipeline will need to run 4 subsequent independent instructions while the first is completing. If 4 instructions that do not depend on the output of the first instruction are not available, the pipeline control logic must insert a stall or wasted clock cycle into the pipeline until the dependency is resolved. Fortunately, techniques such as forwarding can significantly reduce the cases where stalling is required. While pipelining can in theory increase performance over an unpipelined core by a factor of the number of stages (assuming the clock frequency also scales with the number of stages), in reality, most code does not allow for ideal execution.
Figure 3.1 - Basic five-stage pipeline in a RISC machine
3.2 - Advantages and Disadvantages
Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline.
Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases.
Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.
Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor.
A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.