Design Using Redundant Binary Number Systems Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Conventional number systems is the weighted fixed positive radix number systems, where signed number uses the sign as a symbol followed by the number part either in magnitude or r's complement form. Addition of conventional number systems requires carry propagation (serial signal propagation) from LSD to MSD and the addition time depends on word-length, which is the main limitation of the VLSI performance.

But Redundant number systems (RNS) is to allow addition of two numbers in which no serial signal propagation is required along the adder; that is, the time duration of the operation is independent of length of the operands and is the time required for the addition of two digits. This is the advantage of RNS over conventional number systems.

Because of this advantage, in this thesis it proposed to design an FIR filter based on RNS. In order to implement FIR filter, it is necessary to design adder, multiplier and D-FF. For implementation, the structural blocks are to be designed such as PPM adder, MMP subtractor, D-FF, Digit-serial multiplier.

In this thesis, a 368.18MHZ 3-tap FIR filter and 80MHZ Box-car FIR filter be designed based on bottom-up design flow using CADENCE 5.1.41, cadence IC design environment. The design was based on the CMOS 90nm technology process. Bottom level transistors are used from gpdk090 library. The advantages of full custom are maximum circuit performance, minimum design size, and minimum high-volume production cost.


Chapter 1 Introduction

Introduction 1

Motivation and Goals 1

VLSI Design Flow 2

Introduction on Bottom-up and Top-down Design Flow 2

Bottom-up Design Flow 2

Thesis Organization 6

Chapter 2 Computer Binary Number Systems

Binary Number Systems 7

Signed Digit Number Systems 7

Redundant Number Systems (RNS) 8

Arithmetic Operations of RNS 8

CARRY-FREE Radix-2 addition(PPM ADDER) 10

Radix-2 Subtraction (MMP Subtractor) 12

Digit-serial SBD redundant adder 14

Radix-2 Redundant Binary Multiplier 14

Redundant Binary to Binary Conversion 17

Chapter 3 FIR Filter

FIR Filter Theory 18

Architecture of FIR filter 19

BOX-CAR FIR filter 21

Chapter 4 Architecture, Design and Implementation of

Different Digital Cells

Architecture and Design of Different Digital Cells 22

CMOS Inverter 22

Design 23

Simulation 25

Layout 26

Parametric extraction and post-layout simulation 26

NAND-2 27

NAND-3 29

D-FF (Delay) 31

PPM Adder 31

MMP Subtractor 36

Digit-serial SBD redundant adder 38

Box-car FIR Filter 39

Digit-serial Multiplier 41

3-tap FIR Filter 44

Chapter 5 Conclusions and Future Works

Conclusion 48

Future Work 48

References 49

Appendix I 51

Appendix II 52

List of Figures

Fig 1.3.1 Bottom-up Design Flow

Fig 2.5.1 PPM Adder

Fig 2.5.2 Lsd PPMAdder

Fig 2.5.3 4-bit digit PPM Adder

Fig 2.7.1 Digit serial SBD Adder

Fig 4.2.1 CMOS Inverter schematic

Fig 4.2.3 Layout of inverter

Fig 4.3.1 NAND2 schematic

Fig 4.3.2 layout of NAND2

Fig 4.3.3 extraction of NAND2

Fig 4.4.1 NAND3 schematic

Fig 4.4.2 NAND3 layout

Fig 4.5.1 D-FF schematic

Fig 4.5.2 D-FF layout

Fig 4.5.3 extraction of D-FF

Fig 4.6.1 PPM Adder schematic

Fig 4.6.2 PPM Adder layout

Fig 4.6.3 extraction of PPM Adder

Fig 4.7.1 MMP Subtractor schematic

Fig 4.7.2 layout MMP Subtractor

Fig 4.8.1 SBD adder

Fig 4.8.2 simulation waveform of SBD adder

Fig 4.9.1 4-tap Box-car FIR filter (1-bit input) schematic

Fig 4.9.2 4-tap Box-car FIR filter (4-bit input) schematic

Fig 4.9.3 Simulation waveforms of Box-car FIR filter

Fig 4.10.3 Digit serial multiplier schematic

Fig 4.11.1 9-FA schematic

Fig 4.11.2 9-DFF schematic

Fig 4.11.3 FIR filter schematic

Fig 4.11.4 Test-bench of FIR filter

Fig 4.11.5 Simulation waveforms of FIR filter

List of Tables

Table 2.5.1 Digit sets in addition

Table 2.6.1 Digit sets in subtraction

Table 2.8.1 Recoding of bj

Chapter 1


1.1 Introduction

Since the theory of digital signal processing (DSP) is developed and applied to the electrical engineering world, digital filtering always plays a very important role. Digital filtering techniques is used to suppress noise, enhance signal in selected frequency ranges, constrain bandwidth, remove or attenuate specific frequencies and other special operations. Digital filters are classified into finite impulse response (FIR) and infinite impulse response (IIR) filters. FIR digital filters can have exactly linear phase response and a very regular architecture, and suffer less from the effects of finite word length as compared with IIR digital filters. This thesis presents the design and an implementation of such a filter based on redundant binary number systems. The main components of FIR filter are adder, multiplier and delay. The carry propagation delay is a limiting factor of the adder and multiplier. Based on redundant number, adders and multipliers are designed in such a way that the propagation delay is reduced of the FIR filter.

In this thesis the FIR filter is designed based on bottom-up or full custom design flow using CDS 5.1.41. The advantages of full custom are maximum circuit performance, minimum design size, and minimum high-volume production cost. Finally, designing box-car FIR filter and 3-tap FIR(multiplier coefficient 4-bit) filter can serve as a basic of IC design students to work with as a tool in their understanding of digital design. It is also a stepping-stone for students in designing other CMOS chips using the 90nm CMOS technology and to encourage them to make improvements in the design.

1.2 Motivation and goals

Area, delay (performance) and power are the three important design constraints for designing an embedded real-time digital signal processing systems. The area constraint is imposed primarily by considerations of cost. Area efficient implementation results in a smaller die size and hence becomes more cost effective. It also enables integrating more functionality on a single chip. The performance requirements of a system are driven by its data processing needs. For DSP systems, throughput is the primary performance criterion. The performance constraint is thus dependent on the rate at which the input signals are sampled and on the complexity of processing to be performed. Low power dissipation is a key requirement for portable, battery operated systems as it extends battery life. Low power dissipation also helps reduce the packaging cost (plastic instead of ceramic), eliminate / reduce cooling (heat sinks) overhead and increase the reliability of the device.

For the requirement of high-speed and low-power applications, the development and implementation of high-speed FIR digital filters need both increased parallelism and reduced complexity in order to meet both sampling rate and power dissipation goals. In this thesis, FIR filter is designed based on RNS to achieve high speed operation. Bottom-up design flow is used for maximum circuit performance, minimum design size, and minimum high-volume production cost.

1.3 VLSI Design Flow

1.3.1 Introduction on Bottom-up and Top-down Design Flow

The designer usually follows some design phases to create his project. At the beginning the designer has to specify the functionality of the system. Basic blocks of the hardware are identified and their interfaces, composed of data and control signals, are fixed. Today, there are two principal ways to design a VLSI circuit with traditional tools that have been developed in these last years. The designer can choose at discretion an approach Bottom-up or a Top-down flow but sometimes the choice can be forced in consequence of particular design requirements or circuit structure. Top-down is a process of iterative refinements. The designer starts with a top view of the system and decomposes single blocks into smaller ones. Bottom-up flow starts with low-level building blocks and interconnects them to greater ones. In reality, these two techniques are not very incompatible and, for instance, the designer can also choose to use particular self-made cells and to do not touch their structure within a top-down approach [10]. The approach Bottom-up is preferable in digital design if the designer desires to plain a particular cell achieving specific performance with transistors full-custom designed and then he wants to replicate this structure in his project.

1.3.2 Bottom-Up Design Flow

The Bottom-Up design flow is given in Fig 1.3.1. The Bottom-Up design flow starts with a set of design specifications. The "specs" typically describe the expected functionality of the designed circuit as well as other properties like delay times, area, etc. To meet the various design specifications certain design trade offs (area verses delay) are required [10].

Design Specifications

Schematics Capture

Create Symbol

Pre-layout Simulation


Design Rule Check


Layout Verses Schematic Check

Post-Layout Simulation

Fig 1.3.1 Bottom-Up Design Flow

A. Schematic Capture

A Schematic Editor is used for capturing (i.e. describing) the transistor-level design. The Schematic Editors provide simple, intuitive means to draw, to place and to connect individual components that make up the design. The resulting schematic drawing must accurately describe the main electrical properties of all components and their interconnections. Also included in the schematic are the supply connections (VDD and gnd), as well as all pins for the input and output signals of the circuit. From the schematic, a netlist is generated, which is used in later stages of the design. The generation of a complete circuit schematic is therefore the first important step of the transistor-level design.

B. Symbol Creation

A symbol view of the circuit is also required for some of the subsequent simulation steps or for documentation purposes. Thus, the schematic capture of the circuit topology is usually followed by the creation of a symbol to represent the entire circuit. The shape of the icon to be used for the symbol may suggest the function of the module (logic gates - AND, OR, etc.), but the default symbol icon is a simple rectangular box with input and output pins. The symbol creation will also help the circuit designer to create a system level design consisting of multiple hierarchy level.

C. Layout

The creation of the mask layout is one of the most important steps in the full-custom design flow, where the designer describes the detailed geometrics and the relative positioning of each mask layer to be used in actual fabrication, using a Layout Editor. Physical layout design is very tightly linked to overall circuit performance since the physical structures determines the transconductances of the transistors, the parasitic capacitances and resistances, and obviously the silicon area which is used to realize a certain function. But the process is very intensive and time-consuming design effort. It is also extremely important that the layout design must not violate any of the layout design rules, in order to ensure a defect free fabrication of the design. The layout process can be a manual process, in which layout of each design is done manually or an automatic process using a CAD tool. But the quality of the layouts produced using automatic processes are still far from hand optimized layouts.

D. Design Rule Check (DRC)

The created mask layout must conform to a complex set of design rules, in order to ensure a lower probability of fabrication defects. A tool built in to the layout editor called Design Rule Checker, is used to detect any design rule violations during and after the mask layout design. If errors are detected, they should be removed from the mask layout, before the final design is saved.

E. Circuit Extraction

After the mask layout has been made free from design rule errors, circuit extraction is performed to create a detailed netlist for the simulation of the circuit. The circuit extractor identifies the individual transistors and their connections as well as the parasitic capacitances and resistances that are inevitably present. The extracted netlist can give a very accurate estimation of the device dimensions and device parasitic that ultimately determine the circuit performance. The extracted netlist are used in transistor level simulations and in Layout Verses Schematic comparison.

F. Layout Verses Schematic Check

After the mask layout design of the circuit is completed, the design should be checked against the schematic circuit description created earlier. The 'Layout Verses Schematic (LVS) Check' will compare the original network with the one extracted from the mask layout. The LVS step provides an additional level of confidence for the integrity of the design, and ensures that the mask layout is a correct realization of the intended circuit topology. Also it should be noted that a successful LVS would not guarantee that the extracted circuit would actually satisfy the performance requirements since LVS check guarantees only a topological match. If any errors show up during LVS, then it should be corrected before proceeding to post layout simulation.

G. Post-Layout Simulation

The electrical performance of a full custom design can be best analyzed by performing a post-layout simulation on the extracted circuit netlist. The detailed simulation performed using the extracted netlist will provide a clear assessment of the circuit speed and the influence of circuit parasitic. If the results of the post-layout simulation are not satisfactory, the designer should modify the transistor dimensions or the circuit topology, in order to achieve the desired circuit performance. Thus, it may require multiple iterations on the design, until the postlayout simulation results satisfy the original design requirements.

Finally, it should be noted that a satisfactory result in post-layout simulation is still no guarantee for a completely successful product, since the actual performance of the chip can be only be verified by testing the fabricated prototype.

1.4 Thesis Organization

The organization of this thesis is as follows. In Chapter 2, a review of computer binary number systems, redundant number systems and its arithmetic operations. In Chapter 3, it describes the FIR filter theory, box-car fir filter and components of the filter. In Chapter 4, it gives the architecture, design and implementation of different digital cells and finally implemented FIR filter. Conclusion and future works are given in Chapter 5.

Chapter 2

Computer Binary Number Systems

2.1 Binary Number Systems

A number system is defined by the set of values that each digit can assume and by an interpretation rule that define the mapping between the sequences of digits and their numerical values. There are two types of number systems namely conventional (e.g. binary, decimal) and unconventional (e.g. signed-digit number).In conventional number systems, every number has a unique representations i.e. no two sequences have the same numerical value and hence these numbers are called non-redundant number systems [2]. In conventional digital computers, integers are represented as binary numbers of fixed length n having a represents the integer value by


The weight of the digit xi is the ith power of the 2 where 2 is the radix of the binary number system e.g. the integer X (5) can be represented as 5=1*22+0*21+1*20. Because of the tradeoff between the word length and hardware size and between the propagation delay, various types of number representations have been proposed. In non-redundant number systems, carry propagation is the limitation of VLSI implementation of high speed multiplication and addition. In the addition of two conventional binary numbers the carry may propagate all the way from the least significant digit to the most significant. The addition time thus dependent on the word- length. To reduce the addition time i.e. propagation delay, we need another number systems called unconventional or signed digit number systems [2].

2.2 Signed Digit Number Systems

In an unconventional radix-r number system, a digit can take on values {0, 1, 2, …….. , r-1 } and the digit set is S = {-(r-1), -(r-2), …… , -1, 0, 1, …. , (r-1) }. For example, the digit set {-1, 0, 1} is used for radix-2 (r =2) number system. A signed-digit is represented by the digits zi and has the algebraic value


In this case, the number 3 can be represented as 0011 or 0101-1. Hence every number allows multiple representations in signed-digit format and these numbers are called Redundant Number Systems. Signed-digit representations limit carry propagation to one position to the left during the operation of addition and subtraction in digital computers [1-2]. Carry propagation chains are eliminated by the use of redundant representations for the operands.

2.3 Redundant Number Systems (RNS)

The class of signed-digit number or redundant number representations is derived according to four requirements which are postulated as necessary for number representations in fast parallel arithmetic.

The purpose of Redundant number representations is to allow addition and subtraction of two numbers in which no serial signal propagation is required along the adder; that is, the time duration of the operation is independent of length of the operands and is equal to the time required for the addition or subtraction of two digits. The signed-digit representation must have a unique representation of zero algebraic value of a number [3]. The redundant number is represented by n+m+1 digits zi (i=-n……… , -1,0,1,….., m) has the integer value


Where the values of r and zi are such that the following requirements are satisfied:

The radix r is a positive integer.

The algebraic value Z=0 has a unique representation.

There exits transformations between the conventional representation and the signed-digit representation for every algebraic value Z within a specified range.

Totally parallel addition and subtraction is possible for all digits in corresponding positions of two representations.

2.4 Arithmetic Operations of RNS

The arithmetic operations of totally parallel addition and subtraction of two digits zi and yi from the corresponding positions of the representations of numbers Z and Y are defined as follows [3]:

Definition 1: Addition of digits zi and yi is totally parallel if the following two conditions are satisfied:

The sum digits si (ith digit of the sum S=Z+Y) is a function only of the augned digit zi addend digit yi and the transfer digit ti from the ( i+1)th position on the right: si=f (zi, yi, ti ). The term "transfer digit" is used here instead of the commonly used terms "carry" or "borrow" for two reasons. First the transfer digit may assume both positive and negative values for their addition or subtraction; secondly unlike the "carry" or "borrow" of conventional addition or subtraction, the transfer digit is never propagated past the first adder position on the left.

The transfer digit ti-1 to the (i-1)th position on the left is a function only of the augend digit zi and the addend digit yi : ti-1=f (zi, yi ).

Definition 2 : Totally parallel subtraction of the subtrahend digit yi from the minuend digit zi is performed as the totally parallel addition of the additive inverse of yi, i.e., zi-yi=zi+(-yi).

The addition of two digits is performed in two successive steps. First, an outgoing transfer digit ti-1 and an interim sum digit wi are formed:

zi+yi =rti-1 +wi (2.4.1)

Then the sum digit si is formed:

si =wi +ti (2.4.2)

Definition 1 will be satisfied if the range of values which si may assume in (4) does not exceed the allowed the range of values for the digits zi and yi may assume in (4) does not exceed the allowed range of values for the digits zi and yi in (3). Definition will be satisfied if, every allowed nonzero value of the digit yi;

For every yi=a, there exits yi=-a such that a+(-a)=0.The requirement for unique representations of the zero value of a number will be satisfied by the condition:

│zi│ r-1 (2.4.3)


Redundant number representations limit the carry propagation to a few bit positions, which is usually independent of the word length W. This carry propagation-free feature enables fast addition.

A radix-2 signed digit number is coded using two unsigned binary numbers, one is positive and other is negative, as X = X+ - X- . Hence each signed digit is represented using 2 bits as xi = xi+ - xi-, where xi+, xi- Є {0, 1} and xi Є {1-, 0, 1}. In adder shown in Fig (2.5.1), one signed digit number xi is to be added to an unsigned digit yi. This addition can be carried out in two steps. The first step is carried out in parallel for all bit positions i (0 ≤ i ≤ w-1). An intermediate sum pi = xi + yi is computed, which lies in the range {1-, 0, 1, 2} [1], [5-7]. This addition is expressed as


Fig. 2.5.1 PPM adder

Where ti is the transfer digit and has the value either 0 or 1, and is denoted as ti+; ui is the interim sum and has the value either 1- or 0, and is denoted as -ui-. The least significant transfer digit t-1 is assigned the zero value, the same as the most significant interim sum digit uw. In the second step, the sum digits si is formed by combining and ui- as 1 digit as shown in fig (2.5.2) :


Table 2.1 summarizes the digit sets involved in adder operation.

Then the addition operation, performed by the adder, is

xi+ - xi- + yi+ = 2ti+ - u-i (2.5.3)

This arithmetic operation can be performed by the adder known as plus-plus-minus adder (PPM).The PPM adder is also called Redundant Binary Full Adder (RBFA).

Fig 2.5.2 lsd PPM Adder

Table 2.5.1 Digit sets in addition.


Digit Set

Binary Code


{-1, 0, 1}






{-1, 0, 1, 2}



{-1, 0}



{0, 1}



{-1, 0, 1}


Fig 2.5.3 shows the structure of a 4-digit parallel addition. In fig. the sum has 5 digits, i.e, 1 more digit than the addends [1].

Fig 2.5.3 four-bit digit PPM adder

2.6 Radix-2 subtraction (MMP Subtractor)

The subtractor shown in fig 2.6.1 can subtract an unsigned number from a signed digit number. A radix-2 signed digit number is coded using two unsigned binary numbers, one is positive and other is negative, as X = X+ - X- . Hence each signed digit is represented using 2 bits as xi = xi+ - xi-, where xi+, xi- Є {0, 1} and xi Є {1-, 0, 1}. One signed digit number xi is to be added to an unsigned digit yi. This subtraction can be carried out in two steps. In the first step, an intermediate difference pi = xi - yi is computed digit independently, which lies in the range {2-, 1-, 0, 1} shown in Table 2.2 and is expressed using following equation [1] :


where the transfer digit ti has value either 1- or 0, and is denoted as -ti-, the interim difference ui has value either 0 or 1, and is denoted as ui+. In the second step, the sum digit si is formed by combining t-i-1 and ui+ as 1 digit:


Then the subtraction operation, performed by the subtractor, is

xi+ - xi- - yi- = -2t- + u+ (2.6.3)

Fig 2.6.1 MMP adder

This arithmetic operation can be performed by the subtraction known as minus-minus-plus (MMP) subtractor or type-2 full adder. Fig 2.6.2 shows the structure of a 4-digit parallel radix-2 subtractor.


Fig 2.6.2 four-bit digit MMP subtractor.


Radix-2 Digit Set

Binary Code


{-1, 0, 1}



{0, 1}



{-2, -1, 0, 1}






{-1, 0}



{-1, 0, 1}


Table 2.6.1 Digit sets in subtraction

2.7 Digit-serial SBD redundant adder

In Digit-serial SBD adder shown in fig() , two redundant binary numbers xi (= xi+ - xi-) and yi (= yi+ - yi-) can be added simultaneously and gives the result as a redundant binary digit sum si (= si+- si-). This adder consists of PPM adder, MMP subtractor and D-FF (delay). This adder behaves as pipelining architecture, by which critical path will be reduced and hence reduction of the propagation delays [1].

Fig 2.7 Digit-serial SBD redundant adder.

2.8 Radix-2 Redundant Binary Multiplier

Consider the bit-serial multiplication of two W-bit numbers a and b to yield a product p as described by the algorithm below [9]:


INPUT: a, b


INITIALIZE: ai, bi = 0 for i> W-1

ci, j si, j = 0 i, j


for i=0 to W-1


for j=0 to W


ai * bj + ci, j-1 + si-1, j+1 = 2ci, j + si, j (2.8.1)

Using systolic design method, the resulting bit-serial multipliers of above equation are shown in fig (2.8.1) [1, 13,14].

Fig 2.8.1 redundant multiplier architecture.

Consider the multiplicand B(b3b2b1b0) is to be a radix-2 redundant number, the number A(a3a2a1a0) is to be in unsigned representation. Each digit bj = bj+ - bj- of a radix-2 redundant number B is recoded (shown Table 2.8) using a sign bit and a magnitude bit as follows:

Table 2.8.1 Recoding of bj













If the input bit bj is positive, then the adder cell corresponding to coefficient ai, 0 ≤ i ≤ 2 in fig 2.8.1 can be implemented as an full adder; the last adder cell, which involves the most significant sign bit of A with negative weight, carries out the following computation [1]:

-a3 * bj + carryin + sumin = 2* carryout - sumout (2.8.2)

Which can be implemented as a PPM adder consisting of an full adder and 2 inverters. If the input digit bj is negative, then combining above two equations and the detailed multiplier circuit shown in fig 2.8.2.

Fig 2.8.2 redundant multiplier with PPM

Fig 2.8.3 Recoding of bj

2.9 Redundant Binary to Binary Conversion

The conversion process from redundant binary to binary format in lsd-format mode can be carried out by considering x+ and x- as 2 independent unsigned numbers and subtracting x- from x+ as follows:

xi+ - xi- - ci = -2ci+1 + si , (2.9.1) where an MMP adder is used at each bit position. Lsd-first redundant binary to binary conversion circuit is shown in fig 2.9.1 for a word-length of 4 bits. In this circuit carryout at any stage can be either 0 or 1- [1, 14].

Fig 2.9 RB-to-Binary conversion.

Chapter 3

3.1 FIR Filter Theory

A filter is used to remove some component or modify some characteristic of a signal, but often the two terms are used interchangeably. A digital filter is simply a discrete-time, discrete-amplitude convolved. Basic Fourier transform theory states that the linear convolution of two sequences in the time domain is the same as multiplication of two corresponding spectral sequences in the frequency domain. Filtering is in essence the multiplication of the signal spectrum by the frequency domain impulse response of the filter [1].

A finite impulse response (FIR) filter performs a weighted average of a finite number of samples of the input sequence. The basic input-output structure of the FIR filter is a time-domain computation based on a feed-forward difference equation. Figure 3.1 shows a flow diagram of a standard 3-tap FIR filter. The filter has seven data registers. The FIR is often termed a transversal filter since the input data transverses through the data registers in shift register fashion. The output of each register (D1 to D2) is called a tap and is termed x[n], where n is the tap number. Each tap is multiplied by a coefficient ck and the resulting products are summed. A general expression for the FIR filter's output can be derived in terms of the impulse response. Since the filter coefficients are identical to the impulse response values, the general form of a standard FIR filter can be represented as Equation 3.1.


When the relation between the input and the output of the FIR filter is expressed in terms of the input and the impulse response, it is called a finite convolution sum. We say that the output is obtained by convolving the sequences x[n] and h[n]. There is a simple interpretation that leads to a better algorithm for achieving convolution. This algorithm can be implemented using the tableau that tracks the relative position of the signal values. The example in Figure 2.3 shows how to convolve x[n] with h[n]. The determination of filter coefficients controls the characteristic of the FIR filter.

Fig 3.1 FIR filter.

3.2 Architecture of FIR filter

The speed of the filter is defined as the rate at which input samples can be processed. To increase the speed it is necessary to reduce the critical path between input and output. The critical path is defined to be the path with the longest computation time among all paths that contain zero delays. Fig 3.1 shows direct chain has an estimated delay of

Tchain = Tm + (N-1)Ta (3.2)

The sample period (Tsample) is given by,

Tsample ≥ Tm + (N-1)Ta (3.3)

Therefore the sampling frequency (fsample) is given by

fsample ≤ (3.4)

For 3-tap FIR filter, the critical path delay is (Tm+2Ta). Pipelining reduces the effective critical path by introducing pipelining latches along the data path. The critical path is now reduced from Tm+2Ta to Tm+Ta shown in fig (3.2a and 3.2b). In this arrangement while the left adder initiates the computation of the current iteration the right adder is completing the computation of the previous iteration result [1].

Fig 3.2a datapath

Fig 3.2b 2-level pipelined structure

Another FIR filter known as transposed or data-broadcast structured shown in fig3.3. The critical path of the filter of fig3.1 can be reduced without introducing any pipelining latches by transposing structure. Now the propagation delay is Tm+Ta.

Fig 3.3 transposed FIR filter.

3.3 BOX-CAR FIR filter

If the multiplier coefficients of the filter are 1, then the filter is called box-car FIR filter. The critical path depends upon only the time needed for addition operation.

The FIR filter consists of three main components:

A D-FF to implement a simple delay.

A Multiplier to implement the coefficients.

An Adder to sum the nodes at the end of each tap.

Chapter 4

4.1 Architecture, Design and Implementation of Different Digital Cells

Design of 3-tap FIR filter is being implemented in the 090nm CMOS technology and all architecture and simulation is done using Cadence Design Environment 5.1.41. The FIR filter IC design consists of D-FF, the multiplier, and the adder. By using functional description approach design, all individual architecture digital logic cells be designed. Because functional description utilizes the modular design making it easier to understand. Since each block will be thought out individually, the designer has intimate knowledge of how their circuit works. The downside of this method is that the circuit may be of less than optimal size. From transistor level to gate level design of different digital cells such as D-FF, adder and multiplier, first CMOS inverter is be taken as reference cell.

4.2 CMOS Inverter

For 90nm CMOS technology, power supply VDD is 1.8v.The schematic of inverter shown in fig- 4.2.1.

Using gpdk090nm technology library, (see APPENDIX-I)

µn COX = 300 µA/V2, µp COX =170 µA/V2

Width of PMOS = Wp

Width of NMOS = Wn

Length of PMOS and NMOS = Lp = Ln =100nm.

For better noise margin or symmetrical inverter design, the voltage VI is called the inverter gate threshold voltage, and is defined by the point where the voltage transfer curve intersects the unity gain line defined by Vout = Vin [3].

Device transconductances value on NMOS is βn = kn(W/L)n and for PMOS βp = kp(W/L)p.

(βn/βp) =1.083

But (βn/βp) =1.76(Wn/Wp)

Wp/Wn =1.63.

C:\Documents and Settings\Lenovo\Desktop\figre\inv.may\Screenshot-1.png

Fig 4.2.1 CMOS Inverter schematic.

Specification of CMOS inverter

Maximum switching frequency fmax =10 GHz, rise time(tr) = fall time(tf)

4.2.1 Design

The sum of the transient times (tr +tf) represents the minimum time needed for a gate to undergo

a complete switching cycle, i.e, for the output to change from a logic 1 to a logic 0 voltage, and then back up to a logic 1 value. We may use this to define the maximum switching frequency by

fmax = 1/(tr+tf) (4.2.1)

Switching performance of CMOS digital circuits are characterized by the time intervals required to charge and discharge capacitors at output nodes. CMOS inverters use transistors to provide current flow paths between the power supply (Mp) and ground (Mn). All switching times are thus set by the current levels and the value of Cout. The output high-to-low time represents the time interval needed for the output capacitor to discharge through the n-channel MOSFET Mn when Mp is in cutoff. is also referred to as the fall time (tf) for the circuit since it gives the time needed for the output to decay from a well-defined logic 1 state to a well-defined logic 0 state. The low-to-high time also known as the rise time (tr) represents the time interval needed for the output capacitor to charge through the p-channel MOSFET Mp. During this time interval, Mn is in cutoff while Mp is conducting from the power supply [3].

From the design specifications:

tr =(1/20 GHz)



Where VTn : threshold voltage of NMOS.

Rn : resistance of NMOS.

Cout : capacitive load applied to the output of the inverter=50f.

V0 : 0.1VDD.

V1 : 0.9VDD.

Putting all the values together, we have

(W/L)n =5.725

Wn =5.725*100nm =572.5nm

But (Wp/Wn) =1.63

Wp =1.63*Wn =933.175nm.

Using gpdk090nm CMOS technology in cadence IC 5.1.41, Wp =935n, Wn =575n. The schematic of the inverter shown in fig 4.2.1 [11].

4.2.2 Simulation of CMOS inverter

Transistors levels are simulated by using SPECTRE simulator in the Analog Design Envirnoment. Both DC and transient analysis are done shown in fig4.2.2. Then the Affirm Analog test bench was created to test the schematics [11]. The Analog simulations will show the effects of capacitance related to transistor sizing and therefore clock skew, signal delays and setup-and-hold violations will become evident.

The propagation delay time tP is the logic delay through a gate. Physically

we interpret as the average time needed for the output to respond to a change in the input logic

state. By definition,


Where tPHL and tPLH represent the propagation delays for a high-to-low, and a low-to-high transition, respectively. Let us define the 50% voltage points as .Then, and tPLH are defined by the time intervals between the input and output voltages.

From the simulation

C:\Documents and Settings\Lenovo\Desktop\figre\inv.may\Screenshot-6.png

Fig 4.2.2 transient and DC analysis waveforms.

4.2.3 Layout

C:\Documents and Settings\Lenovo\Desktop\figre\inv.may\Screenshot-2.png

Fig 4.2.3 CMOS inverter layout

4.2.4 Parametric Extraction and Post-layout simulation

Fig 4.1.4 shows the parametric extraction of CMOS inverter. After post-layout simulation the propagation delay is 90.2 psec.

C:\Documents and Settings\Lenovo\Desktop\figre\inv.may\Screenshot-3.png

Fig 4.2.4 avs extraction of CMOS Inverter.

4.3 NAND-2

Two PMOS transistors are connected in parallel and two NMOS are connected in series shown in fig 4.3.1 [3].

(Wn)NAND = 2 (Wn)INV =575*2=1150nm

(Wp)NAND = (Wp)INV= 935nm.

C:\Documents and Settings\Lenovo\Desktop\figre\nand2.may\Screenshot-1.png

Fig 4.3.1 NAND2 schematics.

. The layout and parametric extraction are shown in fig 4.3.2 and fig 4.3.3. The propagation delay after post-layout is 71.41psec.

C:\Documents and Settings\Lenovo\Desktop\figre\nand2.may\Screenshot-3.png

Fig 4.3.2 NAND2 Layout

C:\Documents and Settings\Lenovo\Desktop\figre\nand2.may\Screenshot-6.png

Fig 4.3.3 Extraction of NAND2

4.4 NAND3

Three PMOS transistors are connected in parallel and three NMOS are connected in series shown in fig 4.4.1 [3].

(Wn)NAND = 3 (Wn)INV =575*3=1725nm.

(Wp)NAND = (Wp)INV= 935nm.

C:\Documents and Settings\Lenovo\Desktop\figre\nand3\Screenshot-2.png

Fig 4.4.1 NAND3 schematics.

The layout and parametric extraction are shown in fig 4.4.2 and fig 4.4.3. The propagation delay in pre-layout and post-layout is measured as 56 psec.

C:\Documents and Settings\Lenovo\Desktop\figre\nand3\Screenshot-1.png.

Fig 4.4.2 layout of NAND3.

C:\Documents and Settings\Lenovo\Desktop\figre\nand3\Screenshot-3.png

4.4.3 Extraction of NAND3.

4.5 D-FF (Delay)

A D-flip-flop was made from NAND3s, NAND2s, and an Inverter shown in fig 4.5.1.

D-FF with Set

With S = 1.8V, Q = D

With S = 0V, Q = 0V

C:\Documents and Settings\Lenovo\Desktop\untitled.bmp

. Fig 4.5.1 D-FF schematics

After simulation of D-FF, the result gives

C:\Documents and Settings\Lenovo\Desktop\figre\project1\DFFlayut.bmp

Fig 4.5.2 layout of D-FF.

C:\Documents and Settings\Lenovo\Desktop\figre\project1\Screenshot-1.png

Fig 4.5.3 extraction of D-FF.

4.6 PPM Adder

The PPM Adder performs the following operation:


Using above equation t+ and u- are represented in



Based on equations (4.6.2, 4.6.3), u- consists of two XNOR gates and t+ be used using XNOR, XOR and pass transistors shown in fig 4.6.1. The Adder consists of 10 transistors. From figure , using two 4-transistors XNOR gates to generate u- and two pass gates to generate t+. These two gates are based on pass transistor logic causing threshold voltage VTn losses for specific input sets. Because n-MOS pass transistors from voltage loss when transmitting logic 1, while p-MOS degrades the transmission of logic 0 voltage level by VTp instead of 0 [3], [5-7]. The logic 1 output voltage of the 10-transistors PPM Adder degraded to instead of VDD and 2VT instead of 0v shown in fig 4.6.2.


Fig 4.6.1 PPM Adder schematics.


Fig 4.6.2 waveform of PPM Adder.

Layout and extractions are shown in fig 4.6.3 and 4.6.4. Comparison of tP between pre-layout and post-layout simulation shown below:

Pre-layout post-layout

tPHL 75.91psec 83.06psec

tPLH 79.85psec 81.44psec

tP 77.74psec 82.25psec

avg pwr 1.968µwatt. 3.305µwatt.

C:\Documents and Settings\Lenovo\Desktop\figre\project1\ppmlayout.bmp

Fig 4.6.3 PPM Adder layout.

C:\Documents and Settings\Lenovo\Desktop\figre\project1\ppm1avs.bmp

Fig 4.6.4 Extraction PPM Adder.

4.7 MMP Subtractor

The MMP subtractor performs the following operation:


Using above equation t- and u+ are represented in



Based on equations (4.7.2, 4.7.3), u+ consists of two XNOR gates and t- be used using XNOR, XOR and pass transistors shown in fig 4.7.1. The subtractor consists of 10 transistors. From figure, using two 4-transistors XNOR gates to generate u+ and two pass gates to generate t-. These two gates are based on pass transistor logic causing threshold voltage VTn losses for specific input sets. Because n-MOS pass transistors from voltage loss when transmitting logic 1, while p-MOS degrades the transmission of logic 0 voltage level by VTp instead of 0 . The logic 1 output voltage of the 10-transistors MMP subtractor degraded to instead of VDD and 2VT instead of 0v as shown in fig 4.7.2 [3], [6].


Fig 4.7.1 MMP Subtractor schematics.


Fig 4.7.2 waveform of MMP Subtractor.


Fig 4.7.3 layout of MMP Subtractor.

The layout and extraction of MMP are shown in fig 4.7.3 and fig 4.7.4 and comparison of simulation results shown.

Pre-layout post-layout

tPHL 75.91psec 84.39psec

tPLH 79.53psec 81.42psec

tP 77.22psec 82.91psec

avg pwr 3.49µwatt. 3.299µwatt.


Fig 4.7.4 Extraction of MMP Adder.

4.8 Digit-serial SBD redundant adder

The digit-serial SBD redundant adder consists of three components, PPM Adder, MMP subtractor and delays shown in fig 4.8.1, simulation waveform shown in fig 4.8.2 and propagation delay and average power dissipation are measured.


Fig 4.8.1 SBD Adder schematic.


Fig 4.8.2 Simulation waveform SBD Adder.

4.9 Box-car FIR filter

4-tap, 1-bit input and 4-bit input Box-car FIR filter are shown in fig 4.9.1 and fig 4.9.2. The simulation waveform shown in fig 4.9.3, and measured propagation delay is 11.48nsec. But avg pwr for 1-bit input is 663.4µwatt, and 2.653mwatt for 4-bit input.


Fig 4.9.1 4-tapBoxcar FIR filter(1-bit input).


Fig 4.9.2 4-tap Box-car FIR filter(4-bit input).


Fig 4.9.3 Simulation waveforms of Box-car FIR filter.

4.10 Digit-serial Multiplier

Multiplication is just a series of repeated addition that are shifted. Consider the following signed

binary multiplication of a two 4-bit integer value. The multiplicand B (b3b2b1b0) is signed binary and the multiplier A (a3a2a1a0) is normal binary representations. Depending the value bj, it is recoded (using Table 2.8) and its gate implementation is shown in fig 4.10.1. In Digit-serial multiplier, for every bit, starting with the most significant bit (MSB) and ending with the least

significant bit (LSB), the multiplier is multiplied with the multiplicand. Every multiplication bit is just combination of X-OR and AND operation [1,9,13] shown in fig 4.10.2. For an N-bit wide multiplicand and Multiplier (an N x N multiplication), the product will have a 2N-bit wide product. The result of our desired 4 X 4 multiplication has a 8-bit product. But here it is 9-bit, because of PPM and MMP adders are used. At the final stage the 9-bit multiplier output is in normally binary form as shown in fig 4.10.3.

C:\Documents and Settings\Lenovo\Desktop\figre\multiplier\recoding.bmp

Fig 4.10.1 Recoding of bj(schematics).

C:\Documents and Settings\Lenovo\Desktop\figre\multiplier\Screenshot-17.png

Fig 4.10.2 multiplier (schematics).

The simulation results of the Digit-serial multiplier are shown below:

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\Screenshot-1.png

Fig 4.10.3 Digit-serial multiplier.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\Screenshot-2.png

Fig 4.10.4 symbol of multiplier

4.11 3-tap FIR filter

Here we have design the data-broadcast or transposed 3-tap FIR filter. It consists of three multipliers, two adders with D-FFs. X (n) is the 4-bit impulse input and 4-bit multiplier coefficient having value less than 1. We choose the coefficient A is 0.125(see APPENDIX II). 9-bit output is produced at the multipliers and then these are connected to 9-FA with 9-D-FFs shown in fig4.11.1 and fig4.11.2. Finally, 10-bit (S9……S0) is produced at the output of the filter.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\fa9sym.bmp

Fig 4.11.1 9-FA schematic.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\9dffs.bmp

Fig 4.11.2 9-DFF schematic.

The schematic and test-bench of 3-tap FIR filter shown in fig4.11.3 and fig 4.11.4. The simulation waveforms are shown in fig 4.11.5.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\fir3tap.bmp

Fig 4.11.3 FIR filter schematic.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\firtb.bmp

Fig 4.11.4 test-bench of FIR filter.

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\firwav3.bmp

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\firwav1.bmp

C:\Documents and Settings\Lenovo\Desktop\figre\may23.1\firwav4.bmp

Fig 4.11.5 simulation waveforms of FIR filter.

The simulation results are summarized below:

As the propagation delay of the designed filter is 2.716nsec, frequency of operation is 368.18MHz.The sampling frequency of the filter is 368.18MHZ.

Chapter 5

5.1 Conclusion

1) A bottom-up 3-tap FIR filter is designed with sampling frequency 368.18MHZ i.e. the design can run at 368.18 MHZ and uses 7.825 mwatt per clock of power.

2) Complexity is more.

3) High performance Boxcar FIR filter was designed.

4) Because FIR Filters are such an important element of DSP design, it is beneficial to do a project like this to strengthen understanding of the concept.

5) This project is a good start for students to learn IC design flow with CDS tool.

5.2 Future Work

It is impossible to design N-tap FIR filter using bottom-up flow. A top-down ASIC design flow is used to design N-tap filter with optimization algorithm technique.