Scalable Switch And Router Architecture Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Switches and routers are the backbone for data communication. Switches establish temporary connections between input and output ports for data communications and terminate the temporary links after data transmission. Without temporary connections user's nodes must have to be directly connected to the destination links, which is not a scalable approach.

Switching nodes, switch data over temporary line and network links carry multiple data traffic at the same time to different destination. A redundant path is

available if the operational path is failed, more than one path can be used for transmission.

Scalable Switch Architecture:

There are many switches architectures, some of them are scalable design whereas others are not so well scalable. Different architecture has different functionality. Here we discuss only about three functional areas of different switches, i.e. Input port queue, output port queue and interconnection network queue switches.

Input port queue is the area which receives data units from the input line, Output Port queue is that area which delivers data units to an output line, where as Interconnection network is the functional area which provides full Connectivity between Input and Output ports.

Existing Switching Architectures

There are many switching architectures in different switches, depending on their speed, scalability, fabric and Quality of Service. Here, we will only define some of the most common switch architectures; these are (a) Output-Queued Switches, (b) Input-Queued Switches and (c) Combined Input-Output-Queued Switches. These are explained here.

(A) Output-Queued Switches (OQ):

In output queue switches, buffering taking place in the output ports, where arriving packets are immediately forwarded. OQ switch architecture is not suitable for large network because it uses the algorithm “First -In-and-First-Out (FIFO)”, meaning that the N number of packets can be read and written during one packet transmission cycle [1].

OQ switches are ideal for performance but they are not well scalable. Shared memory is the mostly implemented in OQ switch. Common memory is accessible by all input and output ports at the same time. The bandwidth is a problem because it's very limited, to improve bandwidth ”bit slicing” is used through which data units are stored in separate memory areas. These memory areas can be set up on different chips. Separate queuing for each output is used, so that the flows of packets.

For all outputs will also be kept separate and can not interfere each other. By scheduling time that a packet is transfer to the exit line, a router or switch can handle the packet's latency and provide quality of service (QoS) guarantees.

To further enhance switching capacity, split common memory into N separate buffers, so that each corresponds to one of the outputs. This overcomes the need of the number of memory accesses per timeslot, lowering them from 2N to N+1. [1]

But output queuing (OQ) is unfeasible for switches with large number of ports or with high line rates. As it is classical switching architecture [6] and don't meet the requirements of up to date switching needs

(B) Input Queue switches (IO Switch):

In IQ switches, the output contention for cells is completely resolved before transfer of cells through fabric to the output ports. Due to common output ports, multiple cells arrive at the input ports and build queues for the output ports. To avoid head of line (HOL) blocking, cells are directed to different virtual output queues (VOQ) depending on their output [8].

To replace the memory in the interconnection network, usually a crossbar is implemented. The crossbar arbitration algorithm is used to prepare the matching matrix and transfer data units from input ports to the output ports. In order to establish connectivity through the center stage, the crossbar arbiter is used which exchanges the running information with the output and input ports to finalize the matching matrix. Another path used to control the flow of data between input and output ports, is either time-shared of fully dedicated to this session. Additional input is provided to the arbitration process by the information stored in the control memory of the crossbar.

Performance of Input Queue Switches with crossbar and centralized arbitration has been discussed and proposed that 100% throughput can be achieved with the use of maximum weight matching algorithm provided no speedup. But maximum weight matching is not possible with no speedup, however, 100% throughput can be achieved with use of maximum weight matching algorithm providing a speedup of 2 [10].

The processing ‘P' and memory ‘M' both results in Arbiter complexity ‘A' which is expressed in bit/sec. The Arbiter processing complexity ‘P' shows the average amount of information it gets and transfers to the input and output ports. It depends on the following factors.

  • Message size: shows the number of bits in each control message.
  • Message redundancy: it is the number of exchanged massages in each finalized matching matrix.
  • Fabric Speedup S: Fabric Speedup ‘S' is used to avoid the blocking behavior under traffic flow. It is the internal capacity of fabric higher than the sum of capacities of input and output lines.
  • Message parallelism: is the simultaneous flow of messages that travels through the control path.
  • Timeslot frequency: it is the number of timeslots in a particular time.

The design and complexity of the memory ‘M' depends on the amount of information stored for support of arbitration process. Typically the more is the supportive information for arbitration the more is the complex design of memory.

Switches using crossbar with centralized arbitration are well known for being very compact and switching efficiency. However, at large use of this architecture has some scalability limitations.

Load-balanced Birkhoff-von Neumann switches [12] try to minimize the arbitration process and eventually the memory complexity. In this approach a crossbar is proposed to be placed between the input ports and the line cards shown in the following Figure.

The crossbar is used to spreads all the traffic uniformly on the input ports. Another crossbar is used to route that traffic to the desired output ports. With this approach the memory controller of the crossbar is less complicated, no best-effort algorithm is required to get maximum throughput which also makes the processing simple.

Although it has improved the scalability issue but some additional issues arise with this approach such as hard QoS guarantees depend on the traffic components. Output ports may receive cells in disorder sequence which required another mechanism to reorder them accordingly. Similarly multicasting also requires some additional support.

To overcome the limitations of the centralized arbitration is to distribute the crossbar by using Concurrent Dispatching Algorithm (CDA) [9]. In this approach the arbitration task is distributed over multiple crossbars which reduce the control transactions between the fabric and the input ports

The fabric output does not have to choose which virtual output queue (VOQ) should be treated at each input port but the algorithm leaves that first decision to the ports. The function of the fabric outputs is only to deal with the contending requests. Concurrent dispatching algorithm provides spatial speedup over the concurrent crossbars, which doesn't affect the individual scalability of the fabric components [11]. Another feature of CDA is that fabric channels require QoS which has a direct impact on the selection of VOQ at the input line cards. During the arrival of highly bursty traffic, there is a probability that CDA will lose some throughput even if it implements more flexible scheme used in precomputed sequence of matching matrices [7], but the distribution of multiple crossbars has made the CDA implementation very simple.

(C). Combine Input-Output Queue Switch

One way to lessen HOL blocking is to raise the speedup of switch. Speedup is the relation between the buffers speed and lines speed. A switch having speedup of ‘S' can deliver up to S packets to each output and up to S packets can be removed from each input with in a time unit, where a time unit is the time between packet entries at input ports. Therefore, an IQ switch has speedup of 1 while an OQ switch has speedup of N. Packet values between 1 and N for the value of S need to be buffered at the outputs after switching and at the inputs before switching. This architecture is known as combined input and output queued (CIOQ) switch.

Simulation and analytical studies of a CIOQ switch which maintain a single FIFO at each input have been constructed for various speedup values [3][4]. A conclusion from these studies is that S=4 or 5, 99% throughput can be achieved with independent arrivals which are identically spread out at each input and these spread ups of packet destinations is uniform across outputs.

But, it is known that 100% of throughput can be achieved with speedup of 1, if input queues are arranged differently, HOL blocking can be entirely eliminated by using a scheme “virtual output queuing”, in which every input maintains a separate queue for every output. It has also been shown that throughput of an IQ switch can be increased to 100% for independent arrivals [5]. We may draw a conclusion that, to eliminate the effect of HOL blocking speedup is not necessary.


In this paper we have reviewed the common switch architecture with reference to the most common switching techniques, output-queued switches, input-queued switches and combined-input-output queued switches.

In the first instance, we have pointed out the architecture of output-queued switches. We looked at the workings and enhancements of output-queuing and how the bandwidth problem associated with this particular switching technique, is solved by “Bit slicing” and “splitting the common memory”.

With the implementation of multiple crossbars and the provision of high speedup, the performance and QoS can be improved.

At the end it is mentioned that HOL blocking is solved by Combined Input-Output Queuing Switch.


Output Queue Switch and Conclusion by Muhammad Younas, Input Queue Switch and introduction by Jehan Badshah and summary and combine Input output Queue switch by Muhammad kamran.


[l] F.MChiussi, A. Francini “Scalable Electronic Packet Switches” , IEEE journal ON SELECTED AREAS IN COMMUNICATIONS, VOL. 21, NO. 4, MAY 2003.

[2] M. J. Karol, M. G. Hluchyj, S. P. Morgan, “Input versus output queueing on a space-division packet switch” , IEEE Trans. Comm (pp.1347-1356).

[3] I. Iliadis and W.E. Denzel, “Performance of packet switches with input and output queueing,” in Proc. ICC ‘90, Atlanta, GA, Apr. 1990. P.747-53.

[4] A.L. Gupta and N.D. Georganas, “Analysis of a packet switch with input and output buffers and speed constraints,” in Proc. InfoCom ‘91, Bal Harbour, FL, Apr. 1991, p.694-700.

[5] N. McKeown; V. Anantharam; J. Walrand, “Achieving 100% Throughput in an input-queued switch,” Infocom ‘96'.

[6] W. Bux, E. Denzel, T. Engbersen A. Herkersdorf, and P. Luijten “Technologies and Building Blocks for Fast Packet Forwarding” IBM Research

[7] C. S. Chang, W. J. Chen, and H, Y. Huang, “Birkhoff-von Neumann input-buffered crossbar switches,” in proc. IEEE INFOCOM 200, Tel Aviv, Israel, Mar. 2000, pp. 1614-1623

[8] F. M. Chiussi, A. Francini; Member IEEE, “Scalable Electronic Packet Switches”, IEEE journal on selected areas in communications, Vol. 21 No. 4 MAY 2003.[Online Access]

[9] F. M. Chiussi, J. G. Kneuer, and V. P. Kumar, “Low-cost scalable switching solutions for broadband networking: the

ATLANTA architecture and chipset,” IEEE Commun. Mag., vol. 35, pp. 44-53, 1997.

[10] J.G Dai and B. Prabhakar, “The throughput of data switches with and without speedup” in Proc. IEEEINFOCOM Trans. Networking, Vol. 1, pp.397-413, Aug. 1993.

[11] A. Hung, G. kesidis, and N. Mckeown, “ATM input-buffered switches with the guaranteed-rate property,” in Proc. IEEE ISCC, Athens, Greece, June 1998, pp. 331-335.

[12] I. Keslassy and N. McKeown, “Maintaining packet order in two-stage switches,” in Proc. IEEE INFOCOM 2002, New York, June 2002, pp. 10320-1041.