Increasing performance of CMOS

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

General introduction

Today, with the ever increasing performance of CMOS, the majority of manufactured integrated circuits are CMOS circuits. This is due to three characteristics of CMOS devices: high noise immunity, low static power, and high density. It is actually MOS technology that enabled the realization of the first high density semiconductor memories. For instance, the first 4Kbit MOS memory was introduced in 1970. Since then, memory chips have evolved exponentially in terms of size, power consumption, performance, density and complexity.

This evolution, especially in size, has permit to us to integrate large semiconductor memory blocks on a single chip, but we have to ensure the performance of such memories as this can deeply affect the speed and the performance of the overall system. The main limiting factor is high bit line capacitance, which results in increased time to develop bit line differential voltage leading to slow read operation.

So, for fast and power efficient memory design, we need to minimize both time and signal swing on the bit lines. To do so, a sense amplifier is used to generate full rail output voltage using minimum bitline differential voltage making fast read operation possible in memories.

However, as we continue to move in sub nanometer regime, the major barriers that the CMOS devices face at nanometer scale is increasing process parameter variations. Due to limitations of the fabrication process and variation in the dopants' number in the devices' channel, device parameters such as length (L), width (W), oxide thickness (Tox) or threshold voltage (VT) suffer from large variations. So, it is important to accept that process variation is a reality and who designs circuits must have variations in mind. For all these reasons, designing high performance, reliable and robust sense amplifier has become extremely important for designing embedded memories.

In this context and on the occasion of my final graduation project, I had the chance to join the eSRAM team of STMicroelectronics Tunis. The purpose of this trainee was, on one hand to design a sense amplifier under 32nm technology useful for the improvement of the read frequency of a Ternary Content Addressable Memory. On the other hand, to verify the reliability and the robustness of this designed sense when there is variation in the process parameters.

In this report we are going to explain in a clear and simple manner the design steps of the sense amplifier. To achieve this purpose, the work was divided to three chapters:

The first chapter provides an overview about the work environment starting by presenting STMicroelectronics and its eSRAM team and finishing by giving an idea about the design flow in analog circuits.

The second chapter will present an overview about SRAM, TCAM and the different types of sense amplifiers. At the end of this chapter, the sense that will be implemented will be chosen.

And finally, the third and the fourth chapters are devoted to present the sense amplifier design, the results of simulations, the mask layout design, simulations done to validate the sense as well as Monte Carlo simulations used to evaluate the offset.

Chapter I

Work environment


Semiconductor memories are integrated circuits (ICs) characterized by their technologies and specifications. Both rapidity and complexity of these chips are increasing exponentially through the time. This has been the result of the advances shown in ICs design and manufacture.

In this chapter, we are going to present STMicroelectronics which is one of the largest semiconductor companies that designs and manufactures these chips as well as IC (Integrated Circuit) design technologies based on CAD (Computer Aided Design) frameworks.

STMicroelectronics profile:

Key points:

The ST group was formed in June 1987 as a result of the merger between SGS Microelectronica from Italy and Thomson Semiconductors from France. In May 1998, the company changed its name from SGS-THOMSON Microelectronics into STMicroelectronics.

STMicroelectronics is a global independent semiconductor company and is a leader in developing and delivering semiconductor solutions across the spectrum of microelectronics applications.

The group totals approximately 51 000 employees, 16 advanced research and development units, 39 design and application centers, 15 main manufacturing sites and 78 direct sales offices in 36 countries [1].

ST's sales are balanced among the semiconductor industry's six major high-growth sectors (approximate percentage of ST's sales in 2010): Communications (35%), Consumer (12%), Computer (12%), Automotive (14%), Industrial (8%) and Distribution (19%). Each group is composed of several divisions or business units. Each division is responsible for the design, industrialization, manufacturing and marketing for its own product portfolio. Operations are assisted by a central R&D (Research & Development) organization and the local sales offices [1].

The Company currently offers over 3,000 main types of products to more than 1,500 customers, including Alcatel, Bosch, DaimlerChrysler, Ford, Hewlett-Packard, IBM, Motorola, Nokia, Nortel Networks, Philips, Seagate Technology, Siemens, Sony, Thomson and Western Digital [1].

Company assets and awards:

Over the past 15 years, the Company's sites have received more than 100 awards and accolades worldwide for excellence in all areas of Corporate Responsibility, from quality to corporate governance, social issues and environmental protection. Indeed, STMicroelectronics was one of the first global industrial companies to recognize the importance of environmental responsibility, its initial efforts beginning in the early 1990s. Since then ST has made outstanding progress in reducing energy and water consumption and CO2 emissions per product unit.

Besides, STMicroelectronics is the world's leading manufacturer of set-top boxes and hard drives; number two maker of chips for DVDs and smart cards; the number three maker of ICs for the automotive industry; and the number four position as a chip supplier to the mobile telephone and other telecommunications applications [1].

ST Microelectronics Tunis Center:

STMicroelectronics Tunis Center is qualified as a Design and Application Center since it works in collaboration with HPC (Home Personal Communication), MPA (Micro, Power, Analog) and FTM (Front-End Technology & Manufacturing) groups for many projects belonging to several divisions: Micro-Controller Division, Software Tools and Support, Home Entertaining Group, HPC Design Division, as well as Front-End Manufacturing and Development activities.

Tunis center's mission is to develop broad range of semiconductor integrated circuits and discrete devices for the digital consumer market, and to develop software applications related to those circuits and the conception tools for the nanometer technologies.

STMicroelectronics in Tunis is located in the City of Communication Technology EL Gazella Tunis (Cité Technologique des communications). This Center started on December 2001 with 9 engineers, and is showing an impressive growth with 227 engineers by April 2008 [1].

The Mediterranean Development Team:

This final gradual project occurs in the midst of the Mediterranean Development Team called also eSRAM (embedded SRAM) team of ST Tunis. This team is divided into 3 teams: the first is in Crolles, the second is in Tunis and the third is in Agrate. This team is a Central CAD (Computer Aided Design) and Design Solution. The figure 1.4 gives the organization chart of the Central CAD & Design Solutions.

The Mediterranean Development team job is to provide embedded SRAM memory blocks to internal and external customers. It creates SRAM generators and is in charge of the development of SRAM libraries in most recent technologies (90nm, 65nm, 32nm…) [1].

SRAMs are semiconductor devices with a very complex behavior. The current ICs integrate millions of transistors and no circuit designer will ever seriously consider the solid-state physics equations governing the behavior of the device when designing a digital gate. Instead he will use a simplified model that adequately describes the input-output behavior of the transistor. Obviously, this is made possible thanks to the rapid evolution of design technologies.

IC design technology:

Actually, early designs were truly hand-crafted. Every transistor was laid out and optimized individually and carefully fitted into its environment. Since then, designers have increasingly adhered to rigid design methodologies and strategies that are more amenable to design automation. Thus, instead of the individualized approach of the earlier designs, a circuit is constructed in a hierarchical way: a memory chip is a collection of modules, each module consists of a number of cells on its own. Cells are reused as much as possible to reduce the design effort and to enhance the chances for a first-time-right implementation.

This design philosophy has been the enabler for the emergence of elaborated Computer Aided Design (CAD) frameworks for digital integrated circuits; without it the current design complexity would not have been achievable. Design tools include simulation at the various complexity levels, design verification, layout generation, and design synthesis.

The following chart describes the general IC design process based on CAD.

Layout basics:

The creation of the mask layout is one of the most important steps in the full-custom design flow, where the designer describes the detailed geometry and the relative positioning of each mask layer to be used in actual fabrication, using a Layout Editor [2].

Actually, Layout creation is converting a schematic view to a plan. It is cubic design of the circuit elements and wires that connect the elements as shown in the next figure1.6.

Physical layout design is very tightly linked to overall circuit performance (area, speed and power dissipation) since the physical structure determines the transconductances of the transistors, the parasitic capacitances and resistances, and obviously, the silicon area that is used to realize a certain function.

As for the schematic, the layout of a complex circuit usually is performed hierarchically. First, basic components are created. Then they are instantiated in another layout at a higher hierarchical level. Such components are instantiated as pcells. The only difference is that now they are selected from the user libraries.

There are several tasks in layout creation that are mentioned in the next chart:

The top-level schematic is the first graphic data representation of the IC's function. It is a conceptual formed by modules that enclose gates essentially based on Nmos and Pmos transistors. The schematic view is then converted to a layout view.

Layout drawing:

The creation of the mask layout is one of the most important steps in the full-custom design flow, where the designer describes the detailed geometrics and the relative positioning of each mask layer, using a layout editor.

Layout drawing or mask layout is basically formed by active zones (highly doped N or P), polysilicon which plays the role of the grid and different metal levels to ensure connections and supplies…

Design rule check (DRC):

The created layout must conform to a complex set of design rules, in order to ensure a lower probability of fabrication defects. A tool built into the Layout Editor, called Design Rule Checker, is used to detect any design rule violations during and after the layout design. The detected errors are displayed on the layout editor window as error markers, and the corresponding rule is also displayed in a separate window. The designer must perform DRC and make sure that all layout errors are eventually removed from the mask layout, before the final design is saved.

Layout Versus Schematic (LVS):

Once the layout fulfills all the design rules, the next verification step follows. The netlist behind the layout view (file.GDS) is extracted and compared to that of the schematic view (file.CDL). This is the Layout Versus Schematic (LVS) Check. Actually, it ensures that the layout implements the required functionality.

Netlist extraction:

It is important to extract the netlist underlying the layout view. This step is performed after the mask layout design is completed in order to create a detailed netlist for the post-layout simulation. It is the extraction of the functionalities, including parasitic (such as parasitic capacitances and resistances), of all the elements existing in layout view.

Post layout simulation:

The electrical performance of a full custom design can be best analyzed by performing a post-layout simulation on the extracted circuit netlist. Then the designer will be able to compare the obtained results with those of the schematic netlist's simulation. At this point, the designer should have a complete mask layout of the intended circuit/system, and should have passed the DRC and LVS steps with no violations. The detailed transistor-level simulation performed using the extracted netlist will provide a clear assessment of the circuit speed, the influence of circuit parasitics, and any glitches that may occur due to signal delay mismatches. If the results of the extracted netlist (from layout view) simulation are not satisfactory, means that they don't meet schematic netlist's simulation results, the designer should modify some of the transistor dimensions and/or the circuit topology, in order to achieve the desired circuit performance under "realistic" conditions, i.e., taking into account all of the circuit parasitics (parasitic resistances and capacitances).

Actually, when a design is done for nominal operating conditions (at 25°C and with VDD equal to 1V) and typical device parameters (typical doping level), the designer should always be aware that the actual operating temperature might vary over a large range (between 125°C and -40°C) as well as the bias voltage (may vary from 0.9V to 1.1V), and that the device parameters after fabrication probably will deviate from the nominal values he used in his design optimization process. During the manufacturing stage, the NMOS and PMOS devices' doping levels may vary from a worst to a best case. Therefore, the designer should simulate the extracted netlist in five process corners:

  • typNtypP corner. That means that NMOS and PMOS transistors are considered as nominal devices during simulations.
  • maxNmaxP corner. That means that we consider the best case for both NMOS and PMOS devices during simulation.
  • minNminP corner. That means that we consider the worst case for both NMOS and PMOS devices during simulation.
  • maxNminP corner. That means that we consider the best case for NMOS devices and the worst case for PMOS devices during simulation.
  • minNmaxP corner. That means that we consider the worst case for NMOS devices and the best case for PMOS devices during simulation.

For each process corner, the designer should consider temperature and voltage variations in simulations. These options and parameters as well as the extracted netlist and the stimuli applied on inputs are included in a single file with the extension ".cir" on which simulation will be performed.

There will be place to around a thousand of simulations to conclude regarding the operation. This may require multiple iterations on the design, until the post-layout simulation results satisfy the original design requirements.

Design environment:

The software used during the training are:

  • Cadence[1] Composer Schematic Editor to create schematic and symbol views,
  • Cadence Virtuoso to create layout views,
  • Calibre to perform DRC and LVS tasks,
  • PLS for extraction,
  • Eldo to perform simulations.

Cadence can be run only on UNIX terminals or PCs loaded with Linux and X windows servers. For our case, we use the Sun-Solaris platform that has CDE (Common Desktop Environment) graphic environment.

Project specification:

In order to improve the read operation frequency of the TCAM 32nm, a sense amplifier will be implemented on its global bitlines. So, my project consists on, first, the designing of this sense. On the other hand, after the creation of the sense's mask layout, we have to characterize the minimal differential voltage that ensures a good functionality of the sense under worst case operation using Monte Carlo simulations. This final differential voltage must be &1t;50mV. The width of the sense's cell is fixed at 2.404µm which is the width of two memory cells.


This chapter was dedicated to present an overview about the work environment and description of the design flow used later to perform the sense amplifier. The next chapter will present SRAMs, TCAMs and the different architectures of sense amplifiers.

Chapter II

State of the art


Semiconductors memory devices are used in a wide variety of contexts. One type of memory is a Static Random Access Memory (SRAM). This type of memory is considered as static since it will retain its state without need to refresh. Another type is TCAM which is based on SRAM's memory cells and that have specific functionality.

In this chapter, Static Random Access Memory and Ternary Content Addressable Memory will be presented, and then we will deal with one of the most important memory peripheral which is the Sense Amplifier SA.

Static Random Access Memory (SRAM):

One type of memory is a Static Random Access Memory (SRAM). This type is considered as static since it will retain its state without need to refresh, unlike Dynamic RAM (DRAM). SRAM is a little bit more expensive, but faster and significantly less power hungry than DRAM. It is therefore used where either speed or low power, or both, are principle considerations. SRAM is also easier to control and generally more truly random access than modern types of DRAM. Due to a more complex internal structure (uses four to six transistors), SRAM is less dense than DRAM and is therefore not used for high-capacity and low-cost applications such as the main memory in personal computers.

SRAM can be also used in workstations, routers and peripheral equipment like external burst mode SRAM caches, hard disk buffers, internal CPU caches and router buffers, etc. Printers and LCD screens also employ static RAM to hold the image displayed (or to be printed). Small SRAM buffers are also found in CDROM and CDRW drives [3].

The memory Cell Structure:

A memory allows us to stock information which is coded on binary. Then, it is constituted of cells which have the value 0 or 1. To have access to the data, memory cells are structured in lines and columns and every memory point is associated to an address that is proper to it.

For instance, in the structure of the figure 2.1, there are 2n lines which are called as well Word lines and m columns called bit lines.

It is possible to deduce three blocks:

  • The row address decoder that allows the activation of a word line among 2n
  • The column address decoder that selects a bit line among m
  • Addressable memory cells which constitute the heart of the memory

So, the memory cell is the most important and basic component of a memory system. Each cell stores one bit of binary information and is designed in arrays in order to constitute the core of the memory system.

An important objective in a memory design is to minimize as much as possible the size of the memory cell. It results in a decrease of the cost per bit, access time and power dissipation of the system. High-speed memories require a very fast access time while they allow an easy implementation at the same time. The 6T memory cell, named SRAM cell matches both of these requirements.

This conventional six-transistor (6T) SRAM is made (as shown in figure 2.2) of two cross coupled inverters and two access transistors, connecting the cell to the bitlines true (BLT) and false (BLF).

The inverters make up the storage element and the access transistors are used to communicate with the outside. The cell is symmetrical and has a relatively large surface. No special process steps are needed and it is fully compatible with standard CMOS processes.

Two kinds of operations can be implemented on SRAM cells:

  • The read operation, which consists in extracting the value from the selected memory cell.
  • The write operation, which consists in forcing a data into the selected memory cell.

Read operation

The 6T SRAM cell has a differential read operation. This means that both the stored value and its inverse are used in the evaluation to determine the stored value. Before a read operation, the wordline is held low (grounded) and the two bitlines connected to the cell through transistors N2 and N1 are precharged high (to VDD).

Since the gates of N1 and N2 are held low, these access transistors are off and the cross coupled latch is isolated from the bitlines.

As illustrated on the figure 2.3, a '0' is stored on the left node of the memory cell. When the wordline pulls up to VDD, the both access transistors are turning on, and nodes A and B are coupled respectively with the bitlines BLT and BLF.

The node where the data '0' is stored (the left node) allows N1 to be in a saturation mode, and a current Iread is flowing into N1 and N3. Thus, it makes it possible to discharge the capacitance Cbl of the Bitline True through N1 and N3.

This charge transfer results in a decrease of the Bitline True potential voltage (VBLT).

The right storage node (the inverse node), where the data '1' is stored, has the same potential as BL and therefore no charge transfer will be taking place on this side. As a consequence, the Bitline False potential voltage (VBLF) stays at VDD.

The advantage of this technique is bitlines sharing by different memory cells, for gain in surface, but so the capacity on the bitlines is increased which can make the read operation slow. Indeed, the time that takes the bitlines to discharge is T = RC as illustrated in figure 2.4. Thus, increasing the capacity on the bitlines induces a high time constant and slow read operation.

Write operation:

For a standard 6T SRAM cell, the writing is done by lowering one of the bitlines to ground while asserting the wordline. To write a '0', BLT is lowered, while writing a '1' requires BLF to be lowered. Why is this?

As seen in the previous example of a read, the cell has a '0' stored and the main difference is now that the bitlines are no longer released. Instead they are respectively held at VDD and gnd. If we look at the left side of the memory cell it is virtually identical to the read operation (figure 2.5). Since both bitlines are now held at their respective values, the Bitline capacitances have been omitted.

Focusing on the right side of the cell we have the constellation P6-N2. In this case BL is held at gnd. When the wordline is raised N2 is turned on and current is drawn from the inverse storage node to BL. However, at the same time, N1 is turned on and, as soon as the potential at the inverse storage node starts to decrease, current will flow from VDD to the node, and make the data stored equal to '1'.

Ternary Content Addressable Memory (TCAM):

CAMs (Content Addressable Memories) are a special type of memory that includes the classical memory functionalities (read and write) and a search function. They are memories that implement the lookup-table function in a single clock cycle using dedicated comparison circuitry. With normal memory, we provide an address, and depending on this address either we receive the data stored or we write in this address. However, with a CAM, besides this two functions, we can also supply a data, and the CAM returns a list of addresses where the data is stored, if it finds any. Furthermore, a CAM searches the entire memory in one operation, so it is considerably faster.

There are two main types of CAMs: binary CAMs and ternary CAMs. Binary CAMs search only for ones and zeros. TCAMs search not only for ones or zeros but also for a third state 'X'. The X state is a 'mask', meaning its value can be anything. In the following paragraph, we will present only the TCAM cell and its specific functionality which is the search operation.

Memory cell of the ternary CAM:

The TCAM memory cell represented in figure 2.6 is a 16 transistors cam cell based on a two 6 transistors SRAM bitcell. The two SRAM bitcells are responsible for the read and write operations while the transistors N1, N2, N3 and N4 are responsible for the search operation. As we have said, the read and the write operation are similar to those of the SRAM cell. The next paragraph explains the TCAM search function.

Search operation:

Search operations are carried out when WL is low. Then the searched data is broadcasted through SLs, and the search data is compared to each stored bit in each TCAM cell. Before any search operation, MLs are precharged to VDD and SLs are precharged to GND. If the word matches, ML remains high indicating that the data is present in the CAM, otherwise ML discharges to GND. We consider that a word matches when all the cells in the word match.

eSRAM team TCAM:

The TCAM 32nm of the eSRAM team has fully synchronous operations with the rising edge of the clock. Its cam cell is a 16-T one based on two 6-T SRAM bitcells.

As in all the ternary content addressable memories, the user can read, write and search. The read and write search operations are done like in SRAMs and the search function that is implemented as follow: the data presented at the inputs D of the macro is compared to the data stored at any address of the memory and an output called HIT is generated with the search status for each address.

Data bits D can be masked for write or search operations with mask bits M.

The chip enable (CSN) must be low during the rising edge of the clock, with proper setup & hold time. If CSN is set high at the rising edge of CK, it inhibits the access to the memory for the current clock cycle, and the CAM is held in stand-by mode.

The memory's floor plan (see figure 2.7 (a)) is composed of:

  • 2 pages which are composed of Macros (the right page contains the hit column): this block contains the memory cells where data are stocked.
  • Row decoder: this block selects a word (row) from 1024 rows
  • Control: responsible for the generation of the necessary control signals for the TCAM
  • Input buffers in the bottom to allow the inputs to reach the IOs and control.

For this TCAM, both the read and the write operations are done in one clock cycle. However, the read operation is slower and takes two clock cycles. To cope with this problem, the memory was divided (as shown in the figure 2.7(b)) into 4 pages and the IOs block are putted in the middle of the memory in order to decrease the total capacitance on the global bitlines which increase the time that takes the global bitlines to discharge.

However, the read operation is not improved enough and we have to ameliorate it more in order to fulfill the frequency imposed in the specification, an interesting solution could be implementing a sense amplifier at the global bit-lines. This was the subject of my final gradual project. The functioning of the sense amplifier as well as the different architectures of sense amplifiers are presented in the next section.

Sense amplifier:

Sense amplifiers are considered as the most critical CMOS memories peripheral circuits. Their main function is to sense stored data from selected memory cell if a read operation is detected and translate a small voltage signal to a full logic signal.

Sense amplifier's performance highly affects not only the memory access time but also the memory power consumption. CMOS memories are expected to be faster, to have high capacity and to consume low power. However, increased memory capacity, decreased supply voltage and high speed have negative effects on the memory. In fact, the increased capacity of memory increases the number of memory cells per bitline which increases the bitline capacitance. Moreover, an increase in the length of the bitlines increases the bitlines parasitic resistance. To cope with these problems, we may use sense amplifier circuit as shown in the figure 2.9 to improve the frequency of the read operation and to limit the impact of the increased bitline parasitic capacitance and resistance [4].

In fact, when the both bitlines are linked to a Sense Amplifier (Sense Amp), which detects the difference between the potentials of BLT and BLF, we don't need to wait until the BLT or the BLF discharges totally since once a small difference voltage between the two bitlines is developed, the sense is turned on, and the Sense Amplifier gives the resulting output, as represented on the next scheme:

In the next two sections two types of sense amplifiers will be presented. First, there are the voltage sense amplifiers that sense a small voltage difference at their inputs and amplify it to the correct voltage at the output. The performance of the voltage sense amplifier is limited by the increasing bitline capacitance. Second, there are the current sense amplifiers that sense a small current difference between the bitlines during the discharge phase and amplify that to the correct output voltage.

Current mode Sense Amplifier:

Current mode sense amplifiers are designed to sense a small current difference between the bitlines and amplify it to the correct output voltage. The architecture provided by the figure 2.11 is an example of current mode sense amplifier. This type of sense amplifier is used, in advanced memories, where the capacitances of the bitline are increasing due to technology scaling and due to the increasing number of cells that share the bitlines. Current sense amplifiers are thus faster and can provide low delays [4].

Though current sense amplifiers have the advantage of being faster, their limitations are the high power consumption and the increased layout area.

To cope with these limitations, more complex sense amplifiers are required which are the voltage mode sense amplifiers.

Voltage mode sense amplifiers:

Voltage mode sense amplifiers are highly used in most of memories since they present high input impedance to the bitline.

There are many types of voltage mode sense amplifiers but we will present only the conventional latch type sense amplifier and the improved latch type sense amplifier [4] [5].

They operate in two phases, the first is the precharge phase in which the inputs (the bitlines) and outputs of the sense are precharged high, then after a sufficient voltage differential voltage is developed, the sense is enabled by SAEN signal [4] [5].

The sense reacts after a certain amount of differential voltage is developed.

A drawback of the conventional latch type sense amplifier (see figure 2.11) is that the inputs and outputs of the circuit are the same. For this type of sense, the bitlines are directly connected to the cross coupled inverter pair so that the complete bitlines are discharged or charged, which increases the delay and power consumption during the read operation.

The improved latch type sense amplifier (see figure 2.12) is also based on a cross coupled inverter pair, but with a high impedance input differential stage (i.e. transistor M5 and M6) with input terminals blt and blf.

The most important point of latch type sense amplifiers is that once the sensing process has started, it does not recuperate unless the circuit is reset to its metastable point (i.e. V(SO)=V(SON)). In the presence of variance and noise, a wrong output signal can be developed for small input voltage differences. On the other hand, it is crucial that the latch type sense amplifier is activated at the smallest possible input voltage difference for high speed reasons [6].

The differential sense amplifier is designed to be electrically balanced symmetric circuits. However due to process variations all the devices in the circuit do not have the same characteristics and this leads to variation in the devices parameters of the circuit.

As we can see, the improved latch type sense amplifier is faster, consumes low power and occupies small layout area comparing with two other types (current mode sense amplifier and latch type sense amplifier).

Taking into account the different advantages and disadvantages of each type, we will restrict our study on the improved latch type sense amplifier. In the next section, the functioning of the improved latch type sense amplifier is explained.

Functioning of the improved latch type sense amplifier:

The core of this sense amplifier (see figure 2.12) is composed by seven transistors: two sensing transistors (M6 and M5) with their sources coupled to a common pull down node, a pull down transistor (connected to the SAEN signal) for drawing current from the pull down node during sensing operations, and a four transistor-latch (M1, M2, M3 and M4) coupled to the drains of the two sensing transistors.

The four transistor latch is composed by two cross coupled CMOS inverters. When the pull down transistor is activated, the four transistor latch automatically amplifies the voltage difference on the gates of the two sensing transistors.

To initialize the two nodes of the cross coupled latch, two precharging transistors (M7 and M8) are used.

First, we have to initialize the nodes of the latch to VDD. Then, once a voltage difference is built over the bitline true (blt) and the bitline false (blf) , the sense is enabled by raising the signal SAEN. Depending on the value of bitline true (blt) and bitline false (blf), and on the development of a minimal voltage difference, the cross coupled latch triggers to one of its stable operation point [6].

During a memory read cycle, a differential signal is generated by the memory cell on the blt and the blf and depending on the value stored in the memory cell, we can distinguish two cases.

First case: To read a "one" stored in the bitcell (Vgblt < A small voltage difference between the bitline true and the bitline false such that Vblt < Vblf is applied and then the sense is turned on.

In this case, during the precharge phase, the output nodes SO and SOB are set to VDD by the precharge transistor. The sense is enabled by the signal SAEN. Then, the nodes SO and SOB are discharged by the current I1 and I2 respectively. Since Vblt < Vblf, I2 > I1 hence SO discharges faster than SON. As the nodes SO and SOB begin to discharge, SO will reach (VDD - VT) sooner than SON. This will turn on PMOS M2 earlier than M4. Turning on M2 charges SOB and counters the effect of I1.This further increases the voltage difference [6].

Second case: To read a "zero" stored in the bitcell (Vgblf < A small differential voltage between the bitline true (blt) and the bitline false (blf) such that Vblf < Vblt is applied and then turn on the sense.

In this case, during the precharge phase, the output nodes SO and SOB are set to VDD by the precharge circuit. The sense is enabled by the signal SAEN. The nodes SO and SOB are then discharged by current I1 and I2 respectively. However, since Vblf < Vblt, I1 > I2 hence SOB discharges faster than SO. As the nodes SO and SOB begin to discharge, SOB will reach (VDD - VT) sooner than SO. This will turn on PMOS M4 earlier than M2. Turning on M4 charges SO and counters the effect of I2.This further increases the voltage difference.


In this chapter, we have presented, on one hand, an overview about SRAMs and TCAMs, on the other hand, different architecture of sense amplifiers were presented. Thr improved latch type sense amplifier is the best for its area sizes and its speed. By consequence we will choose this architecture for my project that it will be presented later. In the next chapter, we will present all the steps we have followed to design our sense as well as the simulations done to validate its functioning.

Chapter III

Design of the 32nm Sense Amplifier


In order to improve the read operation frequency of the TCAM 32nm, a sense amplifier will be implemented on its global bitlines. In this chapter, we will present the different steps that we have followed in the design of this sense. In a first section, we are going to present the schematic view of the sense and the different simulations done on the schematic's netlist, in the second section, the layout view and the post-layout simulations are presented.

Design of the sense amplifier:

The objective of this internship is to design a sense amplifier which will be implemented on the global bitlines of the TCAM 32nm. The main characteristic of sense amplifier is its offset. An ideal Sense amplifier will have small offset. But in reality, it will have large offset. So, we must reduce this value as much as possible in order to reduce the delay of the sensing operation and so improve the frequency of the read operation in the TCAM.

Offset definition: The offset, in sense amplifiers, is the minimum differential voltage needed at input nodes to resolve the latch in correct state.

As we have said, while designing the sense, the offset must be minimized as it allows to us to decrease the delay of the read operation which can be modeled as:

Iread: the current developed in read operation

Devices dimensions:

The choosen type (improved latch type sense amplifier) was also implemented in the TCAM90nm (technology 90nm). So, all the dimensions of our sense are firstly inspired from those developed in the 90nm technology.

To move from one technology to another, a scale factor is applied on the devices dimensions. So, to design our sense in the 32nm technology, we have to determine the scale factor that allows to us to go from the 90nm to the 32nm technology.

Using this scale factor, we can determine each transistor dimensions in the new technology (32nm).

Later, as the main characteristic of each sense amplifier is its offset value, transistors must be sized to minimize as possible the offset with the minimum layout area.

Thus, many simulations are done to verify the impact of each transistor on the offset. Through simulations and with different transistors sizes, we have deduced that to reduce the offset:

The transistor pair (N2 and N6) must be large compared with the other devices sizes. In fact, the impact of these sense transistors (N6 and N2) is the most dominant. They convert the small voltage difference at their inputs to a current difference, which is required to flip the inverter pair. It is important to make these transistors as large as possible to obtain good matching and therefore minimize the offset. Another important pair of transistors for minimal offset is the NMOS transistors (N1 and N0) of the cross coupled inverter pair.

The PMOS transistors (P5 and P6) of the cross coupled inverter pair have the opposite effect (if we increase their width, the offset increase). So, there is no problem if we reduce this pair of transistors.

The transistors of the two precharge circuits are also important for the reduction of the offset value because they are responsible for charging the output nodes to the same voltage before the sense amplifier is activated. When the voltage at the output nodes is not the same, the cross-coupled inverter pair will start unbalanced, which could cause the sense amplifier to flip in the wrong direction.

The enabling transistor (N5) has no significant impact on the offset, since it has equal influence on the left and right branch of the sense amplifier.

So, there is no problem if we reduce the size of this transistor.

Doing so, and via a lot of simulations, we have found a combination of transistor sizes that are optimized for small offset within the fixed area.

Functional simulation:

Test scenario:

Two simulations will be performed: the first is done to verify the functioning and the appearance of the curves, the second are done in order to characterize the offset value.

In the first simulation, we will try to simulate a realistic functioning of the sense by setting the inputs as schematized in the figures 3.4 for read0 operation and 3.5 for read1 operation.

In the second, the differential voltage (Vdiff) between the two Sense Amplifier bitlines is kept at zero and the direction of sense resolution is checked. Then, the difference between the two inputs Vdiff is given in opposite direction and is gradually increased (as shown in the figure..). So, the Vdiff at which the sense resolves in the opposite direction is found. The results of this simulation are summarized in the table………….


When a read zero operation is performed (see figure 3.4), the global bitline false (gblf) will remain its state and stay precharged at VDD while the global bitline true (gblt) will discharge developing thus a differential voltage necessary for the activation of the sense. After a small differential voltage, the sense is enabled. Then, this voltage difference is amplified and one of the output (in read0 is doutlocalb<0>) is charged to VDD.

The same scenario is obtained when read1 operation is performed but, in this case, the doutlocal <1> (see figure 3.5) is charged to VDD.

The different simulations confirm that our design is successful with a small offset value <10mV. In fact, the simulation of the CDL is usually ideal and gives unrealistic values (10mV is a small value).

Symbol view:

In the TCAM 32nm, the cell which will be replaced by the sense amplifier, I mean the cam32lp_IO_read_xyx2 contains two bit cells, so we have to create a new schematic view in which we instantiate two sense amplifiers as shown in the next figure. To do that, the symbol view of the sense must be created.

Doing so, allows to us to facilitate the creation of the layout view of this cell by instantiating the layout view of only one sense.

Layout view creation:

Once, our design was verified in all the operation conditions. The next step is the creation of layout view of the sense.

Layout Design:

At the beginning, there are some criteria we have to take into account while creating our design:

First, the differential sense amplifier is designed to be electrically balanced symmetric circuits. As a consequence, while creating the layout view of the sense, we have to take care of this point. In fact, all the transistor parameters on the right-hand side must be equal to parameters the left-hand side as mismatch between the devices lead to rise in the offset value. Moreover, different metal connections between the two sides causes mismatch in the parasitic capacitance which affect also the offset.

Second, when a transistor has a large width compared to its length, parasitic resistance will increase. The solution to this problem is the use of multifinger transistors: the transistor is broken up into smaller ones that have the same length as the original one but the width is divided by the number of finger. After that, we connect the drain to drain, the source to source and the gate to gate. The figure 3.7 shows a two fingers transistor.

  • Third, the input pins and the alimentation VDD, GND, VDDS and GNDS are in metal 2.
  • Fourth, the output and the input pins are in metal 3.
  • All the transistors must be in the same direction.
  • In order to make easier the assembling of the different blocs in the memory and avoid the problem of across interconnections, some precautions must be taken. In fact, we have to use each metal in one direction, so we have decided to use pair metals (i.e. metal 2) in the horizontal direction and impair metals (i.e. metal 3) in the vertical direction.
  • The cell width is fixed at 2.404µm.
  • The length of the sense cell is obtained with the pitch minimum to reduce the area.

After the creation of the sense's mask layout, two tests must be performed: DRC and LVS. In fact, we must make sure that all the layout errors related to the fabrication defects (DRC) are removed. We have also to check that the created layout is a correct realization of the intended circuit topology done by the schematic (LVS). The latter compare the netlist obtained from the schematic (.cdl) with that obtained from the layout (.gds). The result of the LVS test is presented in the figure 3.10.

Once these two tests are checked and passed without violations, we must ensure that our sense is still functioning by performing post layout simulations.

Post layout simulation:

In this part, we are about to simulate the extracted netlist which contains a description of the layout view as well as the different parasitic capacitances and resistances created in each node. This simulation is very important as it allow to us to determine the performance of the circuit under 'realistic' conditions. The difference between the schematic netlist and that generated by extraction is shown in figure …….

It is obviously that the added parasitic capacitance in the extracted netlist, induces a shift on the curves as shown in figure 2.17 and lead to slower response affecting thus the performance of the circuit. That's why the post layout is very important to verify the circuit functionality under realistic conditions.

The post layout simulations are done within the same conditions as those performed on the schematic netlist. The results are presented in the figure 3.13, figure 3.14and in the table……….

The obtained curves show that after the design of layout, the sense is still functioning normally. In fact the same operation described for the schematic view is ensured. But we remark the value of the offset is increased to 30mv.

As a conclusion, we can say that we still ensure the sense functionality but with an increased offset value. The rise of the offset value can be justified. In fact, the parasitic capacitance and resistance that are taken into account in the extracted netlist may cause a shift in the offset value. Moreover, while drawing the layout view of the sense, we cannot ensure total symmetry as in the schematic.


In this chapter, we have presented the architecture we have designed during the project (schematic, symbol and layout views). Then, we have verified through simulations the success of our sense. However, there are still a lot of simulations to do in order to ensure its robustness and reliability. This is the subject of the next chapter.

Chapter IV

Validation of the 32nm Sense Amplifier


With each new CMOS technology generation the functional correctness of the design and the design performance parameters become also more sensitive to circuit parametric variations which results in poor yield designs. Thus, a good estimation method for the circuit performance parameters, robust to process variations, has emerged as a critical need in nanometer IC designs. In this chapter, variation in the process parameters will be applied to determine the offset that gives a secure functioning of the sense (i.e. maximal yield) using Monte Carlo simulations. Before that, we have to determine the worst operation conditions (process corner, voltage, temperature) of the sense on which MC will be run.

Pvts simulations:

In this paragraph, PVTs simulations are verified only on the extracted netlist and we present only the simulations obtained in read1 operation. For best comparison reasons, we have plotted only one signal curve.

To insure sense functionality tolerance to corner variations, voltage variations and tempeature variations, we have simulated, in a first step, in nominal operating conditions (temperature and voltage values respectively equals to 25°C and 1V), the extracted netlist in four extreme corners by replacing the nominal devices by their worst-case or best-case incarnations (maxNmaxP, minNminP, maxNminP, minNmaxP corners). In a second step, the temperature variation (-40°C, 125°C and 25°C) effect was checked using typical model and under power supply voltage VDD equal to 0.9V. Finally, we have studied our sense functionality tolerance to voltage variation, by simulating it at different voltages, 0.9, 1V, 1.2V and 1.8V using typical model and at nominal temperature (25°C).

Corner variations:

As a conclusion, we can say that the sense is still functioning normally with slight shift in the curves. In fact, we notice that the sense is faster in the minNmaxP corner while the slowest response is obtained in the minNminP corner. Thus, the worst corner for the sense functioning is the minNminP.

Temperature variations:

As we see, the sense turns out to be rather insensitive to temperature variations. The worst temperature for the sense functioning is 125°C.

Voltage variations:

Comparing the resulting curves, we can notice that for each value of VDD, the desired output (in this example V (DOUTLOCAL<1>) reaches always the full rail level which represents the high logic level. We remark also that the response of the sense is faster when the VDD is greater. So, the worst voltage for the sense functioning is 0.9V.

In all the PVTs simulations, and under all the types of variations, in spite of the slight shift of the curves we still ensure the sense functionality since there is no signal distortion.

As a conclusion, we can determine the worst conditions for the functioning of the sense are minNminP corner, 0.9V and 125°C.

Once the worst case is found out, Monte Carlo simulations will be carried out in these operation conditions in order to find the worst offset that ensure secure and correct sense functionality. In fact, since the 32nm technology is not mature enough and as we don't perfectly control the fabrication process under sub nanometer regime, many fluctuations due to process variations can occur. As a result, all the devices in the circuit do not have the same characteristics and this leads to variation in the design parameters in the circuit. Such dissimilarity in the devices parameters deeply affects the offset value. That is why we must verify that our sense work correctly under these fluctuations. To do that many Monte Carlo simulations will be performed in order to evaluate the offset and to characterize the worst offset that permit to us to ensure that our sense will work correctly in all conditions.

Offset evaluation:

To avoid incorrect read operations, the sense amplifiers activation has to be done as soon as the bitline differential voltage is greater than the sense amplifiers offset. Since the output of the SRAM is critical, we want Sense Amplifier Enable signal to happen as soon as possible while it is still ensuring that the sense has enough differential input to overcome any noises or mismatches in the circuit. However to decrease the overall sensing delay, it is desirable to activate the sense amplifier as early as possible. Therefore, we have to minimize the offset value and to characterize it with accuracy as we need it later for the activation of the sense. However, minimal offset reduces the reliability of the sense, therefore determining the worst case possibility of variation is highly significant in the design of a sense amplifier. To do so, Monte Carlo simulations are done in different cases. For that purpose, the methodology that will be implemented is presented in the next section as well as its implementation and the obtained results.


The instability of the offset in the sense amplifier may be due to the following reasons:

  • Device Variation: Sense amplifier transistors which are fabricated in very identical condition may have variation in their device parameters. This variation is called Device variation. This variation may exist because of mask misalignments and stress effects or statistical doping concentration differences [7].
  • Capacitance Variation: The total metal capacitance (load) on internal nodes of sense-amplifier (SA) may not be same. This capacitance mismatch can occur due to layout placement or due to fabrication process [7].

To characterize the offset value, we need to calculate all these individual contribution. We will be assuming that all the contributions are independent of each other.

Device Offset: The spice models have equations that represent the spread in mobility (µ) and threshold voltage (Vt) of transistor within one process corner. Monte Carlo simulation uses these equations and applies random variation. So, depending on the mismatch the sense amplifier will require voltage difference (Vdiff) between its internal nodes above a certain threshold for which circuit will operate correctly (pass) [7].

For each run of Monte Carlo, first the differential voltage (Vdiff) between SA internal nodes is kept at zero and the direction of sense resolution is checked. Then Vdiff is given in opposite direction and is gradually increased. The Vdiff at which the sense resolves in the opposite direction is found. This value of Vdiff is the offset value for this particular set of mismatch. Similarly the offset is found for each Monte Carlo run. Applied to certain number of samples (number of MC run) this effect is referred to as parametric yield. Yield is an important characteristic of sense amplifiers as single failing amplifier implicates whole of memory.

Capacitance Variation Offset: Nominal Offset requirement of SA is described as Vdiff in between its internal nodes required for SA to work correctly when SA is not under any mismatch. For capacitance mismatch we will be applying 10% metal capacitance mismatch between matched internal nodes of sense amplifier internal nodes. Then perform corresponding read (0 or 1) operation which makes SA to work in worse case environment [7]. Doing so allow to us to :

  • favorite the read one, if read zero operation is performed.
  • favorite the read zero one , if read one operation is performed.

Monte Carlo simulation:

Originally, the Computer Aided Design (CAD) tools have been used to study the nominal design of an integrated circuit (IC). Due to the disturbances of the IC manufacturing process, the effective performance of the mass produced chips are different than those for the nominal design. Process related performance variations may lead to low manufacturing yield, and unacceptable product quality. For these reasons, statistical circuit design techniques are required to design the circuit parameters. Monte Carlo simulations are one of these most important techniques [8].

The purpose of the Monte Carlo (MC) analysis is to determine the uncertainty in estimates for dependent variables of interest. Thus MC analysis focuses on data, and how uncertainty in data propagates through computations. This definition of uncertainty involves model input and output models. In our context, the models of interest are provided by circuit simulations. For instance, Eldo takes a netlist describing the circuit transistors, resistors, capacitors, and so on, and their connections and translates this description into mathematical equations. The inputs are therefore the various design parameters, the process parameters and the environmental conditions. On the other side, the output space is characterized by the circuit performance of interest [8].

Monte Carlo-based uncertainty analysis is performed on multiple model evaluations with randomly selected model input variables, and then using the results of these evaluations to determine the uncertainty in model predictions, and the input variables that gave rise to this uncertainty [8]. In general, a Monte Carlo analysis involves four steps:

  • A range and distribution are selected for each input factor. These selections will be used in the next step in the generation of a sample from the input factors.
  • A sample of points is generated from the distribution of the inputs specified in the first step. The result of this step is a sequence of sample elements.
  • The circuit simulator is fed with the sample elements and a set of extracted measures is produced. In essence, these evaluations create a mapping from the space of the inputs to the space of the results. This mapping is the basis for subsequent uncertainty analysis [8].
  • The results of model evaluations are used as the basis for the uncertainty analysis. For example, one way to characterize the uncertainty is with a mean value and a variance.

Implementation and results:

The methodology that we just mentioned will be implemented on our sense in order to determine the worst offset. For that, we will, in the first part, evaluate the impact of the devises variation and in the second part, the capacitance mismatch effect on the internal nodes of the sense will be determined.

Device Offset mismatch:

  • The netlist for this setup is the extracted one. This derived netlist will contain devices and capacitances that contribute to sense amplifier operation, include active devices and load devices.
  • We will run 1000 Monte Carlo simulation on this extracted netlist and find the read0 and read1 offsets in each process corner.
  • The Vdiff is as usual increased progressively, the searched offset is equal to the differential voltage Vdiff that gives us 1000 successful run.
  • For all the simulations, as the nominal offset obtained when the sense amplifier in not under any mismatch is 30mV, we will vary Vdiff from 50mV to 100mV with 10mV step.
  • All the simulations are done in worst conditions (corner = minNminP, VDD = 0.9V, T = 125°C).

Capacitance Mismatch:

  • The netlist for this setup is the extracted one. This derived netlist will contain devices and capacitances that contribute to sense amplifier operation, include active devices and load devices.
  • We will run 1000 Monte Carlo simulation on this extracted netlist and find the read0 and read1 offsets in each process corner.
  • The Vdiff is as usual increased progressively, the searched offset is equal to the differential voltage Vdiff that gives us 1000 successful run.
  • For all the simulations, as the nominal offset obtained when the sense amplifier in not under any mismatch is 30mV, we will vary Vdiff from 50mV to 100mV with a 10mV step.
  • For this mismatch setup we would take capacitance of both internal nodes and multiply with 10% between (sa_data_h/sa_data_l) and (read1_l/read0_l) and then run Monte Carlo simulation. So, we have to verify sense functionality in four cases:
  • First case : C (sa_data_l) = 1.1 C (sa_data_l)
  • Second case: C (sa_data_h) = 1.1 C (sa_data_h)
  • Third case : C (read0_l) = 1.1 C (read0_l)
  • Fourth case : C (read1_l) = 1.1 C (read1_l)


In this chapter the process variation impact on the sense amplifier was studied thanks to statistical approaches that use Monte Carlo simulations. Through this study we can notice that the value of the offset which gives us a secure functioning and make the sense more robust and more reliable is 100mV and not 30mV, the value found by simple simulation.

General Conclusion

The purpose of my project was to design a sense amplifier to be implemented on the global bitlines of a TCAM using the 32nm technology in order to improve its read operation frequency.

To begin with, we have presented briefly the working environment related to STMicroelectronics and to the integrated circuit design.

The second step was presenting the characteristics of SRAM and TCAM as well as the different types of sense amplifiers, the different architecture used and the one we will use for the TCAM 32nm.

After that, we have presented our design and its three created views (schematic, symbol and layout), the different simulations done to validate our sense functionality and finally we have explained the utility, the methodology as well as an example of statistical simulations implementation (Monte Carlo simulations).


  1. Intranet site: visited 10/05/2010
  2. Cadence manual
  3. visited 15/05/2010
  4. Hwang Cherng Chow and Shu-Hsien Chang, "High Performance Sense Amplifier Circuit for Low Power SRAM Applications", IEEE, Circuits and Systems, 2004.
  5. Steevan Rodrigues and M.S.Bhat, NITK Suratkal, "Impact of Process Variation Induced Transistor Mismatch on Sense Amplifier Performance", IEEE Advanced Computing and Communications, 2006.
  6. Bernhard Wicht, Thomas Nirschl, Doris Schmitt-Landsiedel, "A Yield-Optimized Latch-Type SRAM Sense Amplifier"
  7. STMicroelectronics: "Memory Verification Methodology Book"
  8. Mentor Graphics Eldo User's Manual Release AMS 2009.2a

Cadence is an Electronic Design Automation (EDA) environment that allows integrating in a single framework different applications and tools allowing supporting all the stages of IC design and verification from a single environment.