This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
In today's technological advancements in VLSI industry, the limits of ASICs/FPGA chips in terms of area, power and speed are constantly shrinking. The end user requirements are also influencing these limits and pushing them to a new level on top of all these technological advancements. The effects of nanometer technologies on congestion, signal integrity, crosstalk etc. are becoming more significant as the technology sizes of semiconductor devices continue to decrease. All of these factors are affecting and forcing various technological methodologies throughout the design flow to constantly fight and keep updating the EDA tools to cop-up with these issues.
Thus, there is always a need of constant learning and exposure to new advanced EDA tools like Synopsys Design Compiler, IC Compiler, PrimeTime, TetraMax etc. The aim of this project is to successfully complete ASIC design flow from RTL to GDS-II, using the advance industry level tools. This project provides a solid base and practical hands-on experience of these advanced tools. It also provides an overview of types of ASICs, detailed ASIC standard design flow (Front and Back end), Synopsys Design Compiler and IC compiler flow. Along with this, the analysis of various design factors affecting the performance of the final chip such as power, area and timing is also performed.
CHAPTER 1: Understanding VLSI, ASIC and its Flow
The term 'ASIC' stands for 'application-specific integrated circuit'. An ASIC is basically an integrated circuit designed specifically for a special purpose or application. ASIC application is endless due to its customizability in size, power, speed, reliability, they are being used in auto emission control, gaming device, iPod much more. An example of an ASIC is an IC designed for a specific line of cellular phones of a company, whereby no other products can use it except the cell phones belonging to that product line. The opposite of an ASIC is a standard product or general purpose IC, such as a logic gate or a general purpose microcontroller, both of which can be used in any electronic application by anybody. 
1.1 ASIC Design Architectures
Application specific integrated circuits are categorized according to the technology used for manufacturing them. These types are full-custom ASICs and semi-custom and semi-custom can be further classified as standard cell based ICs (CBICs), Gate Array (GA) type.
1.1.1 Full-Custom ASICs
Full-custom ASIC's are those that are entirely tailor-fitted to a particular application from the very start. Since its ultimate design and functionality is pre-specified by the user, it is manufactured with all the photolithographic layers of the device already fully defined, just like most off-the-shelf general purpose IC's. These designs are referred to "handcrafted" designs. The use of predefined masks for manufacturing leaves no option for circuit modification during fabrication, except perhaps for some minor fine-tuning or calibration. This means that a full-custom ASIC cannot be modified to suit different applications, and is generally produced as a single, specific product for a particular application only. 
1.1.2 Semi-Custom ASICs
Semi-custom ASIC's, can be partly customized to serve different functions within its general area of application. Unlike full-custom ASIC's, semi-custom ASIC's are designed to allow a certain degree of modification during the manufacturing process. A semi-custom ASIC is manufactured with the masks for the diffused layers already fully defined, so the transistors and other active components of the circuit are already fixed for that semi-custom ASIC design. The customization of the final ASIC product to the intended application is done by varying the masks of the interconnection layers, e.g., the metallization layers. 
22.214.171.124 Standard Cell-based ASICs
Cell-based ASICs (CBICs) make use of pre-designed, pre-tested, and pre-characterized logic cells such as AND gates, OR gates, multiplexers, and flip-flops for the design. These predefined cells are also referred to as standard cells. The standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard cells similar to a wall built of bricks. The standard-cell areas may be used in combination with larger pre-designed cells, known as megacells. Megacells are also known as mega-functions, full-custom blocks, system-level macros (SLMs), fixed blocks, cores, or Functional Standard Blocks (FSBs). Examples of such megacells are microcontrollers and microprocessors. The ASIC designer needs to define only the placement of the standard cells and interconnect in a CBIC. The CBIC approach for ASIC fabrication saves time and money, reduces risk, and allows each standard cell to be optimized individually. For example, during the cell-library design, the designer can choose each transistor in a standard cell to maximize speed or minimize area. The drawbacks of this approach are the time and expense of designing or buying the standard-cell library and the time needed to fabricate all layers of the ASIC for each new design. 
Figure 1.1: A Cell Based ASIC(CBIC)die with one flexible block and few macros
126.96.36.199 Array-Based ASICs
The gate array (GA) or gate-array based Application specific integrated circuits is another type of ASIC which uses a predefined pattern of transistors on a silicon wafer. The pre-defined pattern of transistors on a gate array is the base array, and the smallest element that is replicated to make the base array (like tiles on a floor) is the base cell (sometimes called a primitive cell). Only the top few layers of metal, which define the interconnect between transistors, are defined by the designer using custom masks. To distinguish this type of gate array from other types of gate array, it is often called a masked gate array (MGA). The designer chooses from a gate-array library of pre-designed and pre-characterized logic cells. The logic cells in a gate-array library are often called macros. Array-based methodology represents the largest ASIC market. Gate Arrays are preprocessed down to the interconnection layers. The interconnection layers customize the array and connect up the macro cells. The array slices are fabricated in large quantities resulting in one-time mask costs reducing NRE and providing faster turn-around times for both prototype and production. If there is a design modification or an error to be fixed, there will have to be another prototype iteration including mask generation, fabrication, assembly and test. This is less of an impact with array-based because of the reduced layers in the methodology.
The following are the types of gate arrays:
a. Channeled Gate Array
b. Channeless Gate Array
c. Structured or Platform
Channeled Gate arrays contain empty channels of silicon separated into rows of unwired transistor pairs, which can be configured into gates, flip-flops or large functions. The routing between the elements is performed by using the dedicated routing channels. In channeled gate arrays, the interconnect is customized. The interconnect uses predefined spaces between rows of base cells.
Channeless architecture is used for designs beyond the limit of channeled because they offer more efficient routing with the sea-of-gates approach. The channeled arrays are reaching three and four layers with increasingly test and integration efforts. In the channeless arrays there are no pre-defined areas set aside for routing between cells. The unused transistor sites that are not used for the intended design function are used as the routing resources. In addition, channel-less gate arrays have higher logic density than channeled gate arrays. RAM can be implemented in array-based methodologies with more difficulties than cell-based. In array-based, an optimized block of RAM of a predetermined size is positioned in an allocated area in the array. [1,3]
Figure1.2 Channeled Gate Array Figure1.3 Channeless Gate Array
Structured Gate Arrays combine some of the features of standard Cell based ASICs and MGAs. These are designed and produced from a tightly defined set of: 1) design methodologies; 2) intellectual properties (IP's); 3) well-characterized silicon, aimed at shortening the design cycle and minimizing the development costs of the ASIC. A platform ASIC is built from a group of 'platform slices', with a 'platform slice' being defined as a pre-manufactured device, system, or logic for that platform. Each slice used by the ASIC may be customized by varying its metal layers. The "re-use" of pre-manufactured and pre-characterized platform slices simply means that platform ASIC's are not built from scratch, thereby minimizing design cycle time and costs. 
Figure 1.4 Structured Gate Array ASIC
Another category of ASIC, Programmable logic devices (PLDs) are standard ICs that are available in standard configurations from a catalog of parts and are sold in very high volume to many different customers. can configure a PLD to meet the needs of a specific application, and hence, it also belongs to the family of ASICs. PLDs use different technologies to allow programming of the device. They generally have a single large block of programmable interconnect and a matrix of logic macrocells that consists of programmable array logic followed by a flip-flop or latch. A PLD uses no customized mask layers or logic cells, and has fast design turnaround. A particular category of ASIC, which includes large building blocks such as ROM, EPROM or 32 bit processor is referred to as a System-on-a-Chip (SoC). 
1.2 VLSI flow: Analogy of VLSI and Building Architecture - a thought process
VLSI flow was evolved similar to the flow involved in Building Construction. In order to best understand the VLSI Chip design flow development, delve in to the construction flow.
Whenever construction of a building begins, in order to see how the building should look like, the exterior, a architecture is made, which is quite similar to that of designing a architecture in the chip-design. Based on the requirements of the product, what the product is addressed for and whom to serve what needs, the so called specifications are needed to make an outlook like a block diagram (black box) and modules inside it. After deciding the architecture of the chip, implementation phase comes, which comprises series of coding, verification, validation stages. Once the designing phase is completed known as the front-end part, the physical part known as the back-end is to be implemented. Back-end part of a VLSI chip design flow also includes series of stages; most of them can be easily understandable if related to a construction process of a building.
Building a Floor-plan: The floorplan of a building is similar to the floorplan of a Chip. Based on the connectivity/accessibility, placement of rooms is done, similar to the constraints to place the blocks. Likewise, the building blocks of a building are bricks; for Chip Design libraries serve as the building blocks, which are like pre-designed bricks, for a specific functionality.
Power Grid: Initially an electrical plan is made, where all the requirements are considered that all electrical gadgets inside the building need to get the power to operate. Similar to that, a Chip requires a power source. The required power is supplied through the power-pads, over a ring like topology to have a uniform distribution across all corners of the chip, and the supply has to reach all the standard-cells (bricks for Chip-Designing), this is called as power-grid topology in the Chip-Design, now the challenge is how well is the design of Power-grid, so as to reduce the IR-drop, ensuring that standard-cells get proper power for their operation.
Clock-tree: In digital design, there are two ways of designing: synchronous way of designing and asynchronous way of designing (difficult to verify). Majority of chips follow synchronous way of coding, for which Static Timing Analysis is possible. For the relevancy of the flops, the clock to those flops should reach at the same time from the crystal, with in some skew targets within the chip. In order to make this happen, a step called as clock-tree is performed after power-grid is created.
Place and Route: To have a better understanding of this concept, assume a society where people speak different languages are living, and visualize those people who speak same language living in the same community, in that case the communication is much easier. Similarly, in chip-designing, the standard-cells who are having the similar design relation-ships are placed closer in the placement flow; this concept is called as region-ing. Now, within the region-ing of the groups of the standard-cells, the cells which are really sharing data, has to placed close-by so that their timing is achieved and is well optimized, this step is called placement. Connectivity across the standard-cells is called as routing; the challenge is having optimized or reduced wire-lengths.
Signal-Integrity: As the process is shrinking day by day, and silicon-realestate is getting costlier. Engineers are trying to accommodate more and more standard-cells in the limited area. The cells are placed in very close proximity, thus the switching of one can have an impact over the other's behavior, which can make the path to be faster or slower, this issue is called as signal-integrity. Similarly, in construction, in order to maintain the integrity within the house (neighbor free-zone), boundaries are created such as fences across buildings. Similarly, a concept called as Shielding, the high frequency signal net with the power-nets running across. Furthermore, as there is a certain limit of spacing across the buildings, similarly there is spacing across the nets, which are in close proximities.
Design for Test (DFT): One of the DFT techniques is scan-chain. To understand the concept of the scan-chain, try to visualize a process of entering and exiting a building. There is a person who enters from the front-door and exits from the back-door of the building; this process ensures that there is no blocking within the rooms of the building and that person does not get stuck inside the building. Similar to this analogy the flip-flops are connected together creating a scan-chain and test-input values are passed from the scan-chain input of the chip and expected data is visualized in the scan-chain output of the chip. This way it is ensured that the chip is free from manufacturability issues like stuck-at faults (stuck-at one or stuck at zeros). 
1.3 ASIC Design Flow
The traditional ASIC design flow contains the steps outlined below:
Prepare requirement specification and create an Micro-Architecture document.
RTL design and development of IP's.
After the previous step DFT memory BIST insertion can also be implemented, if the design contains any memory element.
Functional verification all the IPS. Check whether the RTL is free from linting errors and analyze whether the RTL is synthesis friendly.
Perform cycle based verification (functional) to verify the protocol behavior of the RTL.
Perform the property checking to verify the RTL implementation and the specification understanding is matching.
Design environment setting. This includes the technology file to be used along with other environmental attributes.
Prepare the design constraints file to perform synthesis, usually called as an SDC synopsys_constraints or dc_synopsys_setup file, specific to synthesis tool (design compiler).
Once the constraints file is set. For performing synthesis inputs to the DC are the library file (for which the synthesis needs to be targeted for, which has the functional/timing information available for the standard cell library and the wire load models for the wires based on the fan-out length of the connectivity), RTL files and the design constraints files, so that the synthesis tool can perform the synthesis of the RTL files and map and optimize to meet the design constraints requirements. After performing the synthesis, scan insertion and JTAG scan chain insertions are implemented and then synthesis is repeated.
Check whether the design is meeting the requirements after synthesis. Perform block level static timing analysis using Design compiler's built-in static timing analysis engine.
Perform Formal verification between RTL and the synthesized netlist to confirm that the synthesis tool has not altered the functionality.
Perform the pre-layout STA (static timing analysis) using PrimeTime with the SDF (standard delay format) file and synthesized netlist file to check whether the design is meeting the timing requirements.
Once the synthesis is performed the synthesized netlist file (VHDL/Verilog format) and the SDC (constraints file) is passed as input files to the Placement and routing tool to perform the back-end activities. The tool used is IC Compiler. 
Initialize the floorplanning with timing driven placement of cells, clock tree insertion and global routing.
Transfer of clock tree to the original design (netlist) residing in Design Compiler.
In-place optimization of the design in Design Compiler.
Formal verification between the synthesized netlist and clock tree inserted netlist, using Formality.
Extraction of estimated timing delays from the layout after the global routing step.
Back annotation of estimated timing data from the global routed design, to PrimeTime.
Static timing analysis in PrimeTime, using the estimated delays extracted after performing global route.
Detailed routing of the design.
Extraction of real timing delays from the detailed routed design.
Back annotation of the real extracted timing data to PrimeTime.
Post-layout static timing analysis using PrimeTime.
Functional gate-level simulation of the design with post-layout timing (if desired).
Tape out after LVS and DRC verification.
CAD tools are involved in all stages of VLSI design flow-Different tools can be used at different stages due to EDA common data formats. CAD tools provide several advantages:
Ability to evaluate complex conditions in which solving one problem creates other problems
Use analytical methods to assess the cost of a decision
Use synthesis methods to help provide a solution
Allows the process of proposing and analyzing solutions to occur at the same time
Figure 1.5 Traditional ASIC Design Flow 
Figure 1.5 graphically illustrates the typical ASIC design flow discussed above. The acronyms STA and CT represent static timing analysis and clock tree respectively. DC represents Design Compiler Synopsys CAD tool for Physical Design is called Integrated Circuit Compiler (ICC). 
CHAPTER 2: Project Design Overview
2.1 Introduction: Asynchronous Interface
Asynchronous interface design is the cicuitry in which set of signals that comprises the connection between devices of a computer system where the transfer of information between devices is organized by the exchange of signals not synchronized to some controlling clock. A request signal from an initiating device indicates the requirement to make a transfer; an acknowledging signal from the responding device indicates the transfer completion. This asynchronous interchange is also widely known as Handshaking. 
Most of the time, asynchronous designs are referred to as the designs with no clocks, but this project asynchronous FIFO interface circuit incorporates multiple clocks for transmitting and receiving the data values. The description of the design is explained below along with the top module diagram of the design.
An asynchronous FIFO refers to a FIFO design where data values are written to a FIFO buffer (RAM) from one clock domain and the data values are read from the same FIFO buffer from another clock domain, where the two clock domains are asynchronous to each other. Asynchronous FIFOs are used to safely pass data from one clock domain to another clock domain. 
There are a lot of different ways to design asynchronous FIFO interface design, the method used in this project is "FIFO partitioning with synchronized pointer comparison"; for comparing and synchronizing the design working on two clocks one for transmitting and one for receiving, uses gray counters for comparison of full and empty registers of RAM which is FIFO buffer for writing and reading the data values.
Figure 2.1: Internal Block Diagram of FIFO partitioning with synchronized pointer comparison 
Data words are placed into a FIFO buffer memory array by control signals in one clock domain, and the data words are removed from another port of the same FIFO buffer memory array by control signals from a second clock domain. The difficulty associated with doing FIFO design is related to generating the FIFO pointers and finding a reliable way to determine full and empty status on the FIFO. 
Generally FIFOs are used where write operation is faster than read operation. However, even with the different speed and access types the average rate of data transfer remains constant. FIFO pointers keep track of number of FIFO memory locations read and written and corresponding control logic circuit prevents FIFO from either under flowing or overflowing. FIFO architectures inherently have a challenge of synchronizing itself with the pointer logic of other clock domain and control the read and write operation of FIFO memory locations safely. 
2.2 Issues in Designing Asynchronous FIFO
Although the design states that the circuitry is asynchronous and is working in multiclock environment, it is essential to synchronize the two clocks as the data can be lost due to setup and hold violations. It is very important to understand the signal stability in multi clock domains since for a traveling signal the new clock domain appears to be asynchronous. If the signal is not synchronized to new clock, the first storage element of the new clock domain may go to metastable state and the worst case is that resolution time cannot be predicted. It can traverse throughout the new clock domain resulting in failure of functionality. To prevent such failures setup time and hold time specification has to be obeyed in the design. Manufacturers provide statistics of probability of failure of flip-flops due to metastability characters in terms of MTBF (Mean Time before Failure). Synchronizers are used to prevent the downstream logic from entering into the metastable state in multiclock domain with multibit data values.
Thus, for efficient working of FIFO architecture designing of FIFO pointers is the key issue. At this point, deep understandings of the FIFO read and write pointers become necessary. On reset both read and write pointers are pointing to the starting location of the FIFO. This location is also the first location where data has to be written at the same time this first location happens to be first read location. Therefore, in general, read pointer always points to the word to be read and write pointer always points to the next location to which data has to be written. 
2.3 Operation of the Design
2.3.1 Data write operation:
When both read and write pointers are pointing to first location of FIFO empty flag is asserted indicating the FIFO status as empty. Now data writing can be performed. Data will be written to the location where the write pointer is pointing and after the data write operation write pointer gets incremented pointing to the next location to be written. At the same time, empty flag is de-asserted which indicates that FIFO is not empty, some data is available. One notable point regarding read pointer is with empty flag active the data pointed out by the read pointer is always invalid data. When first data written and empty flag status cleared (i.e. empty flag inactive) read pointer logic immediately drives the data from the location to which it was pointing to the read port of the dual port RAM, ready to be read by read logic. With this implementation of read logic the biggest advantage is that only one clock pulse is required to read from read port since previous clock cycle has already incremented read pointer and drives the data to read port. This will help in reducing latency in detecting empty and full pointer flag status. Empty status flag can be asserted in one more condition. After some n number of data write operations if same n number of read is performed then both pointers are again equal. Hence, if both pointers "catch up" each other, then empty flag is asserted. .
2.3.2 FIFO full status:
When write pointer reaches the top of the FIFO, it is pointing towards the location, which can be written and is the last location to be written. No read operation is performed yet and read pointer is pointing to first location itself. This is one method is to generate FIFO full condition. When write pointer reaches the top of the FIFO, if full flag is asserted then it is not the actual FIFO full condition, this is only 'almost full' as there is one location which can be written. Similarly almost empty condition can exist in FIFO. Now a write operation causes the location to be written and increment of write pointer. Since the location was the last one write pointer wraps up to first location. Now both read and write pointers are equal and hence empty flag is asserted instead of full flag assertion, which is a fatal mistake. Hence wrap around condition of a full pointer may be a FIFO full condition.
After writing the data to FIFO (consider write pointer is in top of FIFO) some data has been read and read pointer is somewhere in between FIFO. One more write operation causes the write pointer to wrap. Note that even though write pointer is pointing to first location of FIFO this is NOT FIFO full condition, since read pointer has moved up from the first location. Further data writing pushes write pointer up. Imagine read pointer wraps around after some more read operation. Present condition is that both pointers have wrapped around but there is no FIFO full or FIFO empty condition. Data can be written to FIFO or read from the FIFO. . The disadvantage of a FIFO of this kind is that the status signals cannot be fully synchronized with the read and write clock. .
2.3.3 Asynchronous FIFO pointers:
FIFO is full when the pointers are equal, that is, when the write pointer has wrapped around and caught up to the read pointer. This is a problem. Considering that point, it is difficult to decide which condition has occurred; the FIFO is either empty or full when the pointers are equal.
One design technique used to distinguish between full and empty is to add an extra bit to each pointer. Whenever the write pointer increments past the final FIFO address, the write pointer will increment the unused MSB while setting the rest of the bits back to zero as shown in Figure below (the FIFO has wrapped and toggled the pointer MSB). The same is done with the read pointer. If the MSBs of the two pointers are different, it means that the write pointer has wrapped one more time that the read pointer. If the MSBs of the two pointers are the same, it means that both pointers have wrapped the same number of times. 
Figure2.2: FIFO full and empty conditions 
Using n-bit pointers where (n-1) is the number of address bits required to access the entire FIFO memory buffer; the FIFO is empty when both pointers, including the MSBs are equal. And the FIFO is full when both pointers, except the MSBs are equal. The FIFO design uses n-bit pointers for a FIFO with 2(n-1) write-able locations to help handle full and empty conditions.
The counters designed to synchronize the signals are Gray code counters. The reason to choose gray coder counter and not the binary code counter is that, trying to synchronize a binary count value from one clock domain to another is problematic because every bit of an n-bit counter can change simultaneously (example 7->8 in binary numbers is 0111->1000, all bits changed). Gray codes only allow one bit to change for each clock transition, eliminating the problem associated with trying to synchronize multiple changing signals on the same clock edge. It is desirable to create both an n-bit Gray code counter and an (n-1)-bit Gray code counter. It would certainly be easy to create the two counters separately, but it is also easy and efficient to create a common n-bit Gray code counter and then modify the 2nd MSB to form an (n-1)-bit Gray code counter with shared LSBs. This will be called a "dual n-bit Gray code counter." .
Figure 2.3: n-bit Gray code converted to an (n-1)-bit Gray code 
It is obvious that inverting the second MSB of the second half of the 4-bit Gray code will produce the desired 3-bit Gray code sequence in the three LSBs of the 4-bit sequence. The only other problem is that the 3-bit Gray code with extra MSB is no longer a true Gray code because when the sequence changes from 7 (Gray 0100) to 8 (~Gray 1000) and again from 15 (~Gray 1100) to 0 (Gray 0000), two bits are changing instead of just one bit. A true Gray code only changes one bit between counts.
2.4 Handling full and empty conditions
Exactly how FIFO full and FIFO empty are implemented is design-dependent. The FIFO design in this paper assumes that the empty flag will be generated in the read-clock domain to insure that the empty flag is detected immediately when the FIFO buffer is empty, that is, the instant that the read pointer catches up to the write pointer (including the pointer MSBs).The FIFO design in this paper assumes that the full flag will be generated in the write-clock domain to insure that the full flag is detected immediately when the FIFO buffer is full, that is, the instant that the write pointer catches up to the read pointer (except for different pointer MSBs).
2.4.1 Generating empty flag
The FIFO is empty when the read pointer and the synchronized write pointer are equal. The empty comparison is simple to do. Pointers that are one bit larger than needed to address the FIFO memory buffer are used. If the extra bits of both pointers (the MSBs of the pointers) are equal, the pointers have wrapped the same number of times and if the rest of the read pointer equals the synchronized write pointer, the FIFO is empty. The Gray code write pointer must be synchronized into the read-clock domain through a pair of synchronizer registers found in the sync_w2r module. Since only one bit changes at a time using a Gray code pointer, there is no problem synchronizing multi-bit transitions between clock domains. In order to efficiently register the rempty output, the synchronized write pointer is actually compared against the rgraynext (the next Gray code that will be registered into the rptr). The empty value testing and the accompanying sequential always block has been extracted from the rptr_empty.v
2.4.2 Generating full flag
Since the full flag is generated in the write-clock domain by running a comparison between the write and read pointers, one safe technique for doing FIFO design requires that the read pointer be synchronized into the write clock domain before doing pointer comparison. The full comparison is not as simple to do as the empty comparison. Pointers that are one bit larger than needed to address the FIFO memory buffer are still used for the comparison, but simply using Gray code counters with an extra bit to do the comparison is not valid to determine the full condition. 
2.5 Modules of the design
fifo1.v:- This is the top-level module that includes all clock domains. The top module is only used as a wrapper to instantiate all of the other FIFO modules used in the design. Designing of the Asynchronous FIFO circuitry can be performed by number of different ways. Also, from above it was clear that while designing this kind of design, the key factor is to keep in mind the designing of FIFO pointers for the correct, efficient and most possible accurate working of the FIFO. Thus, designing of the internal logic for pointers could be different depending upon the hardships and efficiency of the design but the top module remains the same which is shown below:
Figure 2.4: Top Module of asynchronous fifo 
fifomem.v - This is the FIFO memory buffer that is accessed by both the write and read clock domains. This buffer is most likely an instantiated, synchronous dual-port RAM.
sync_r2w.v - This is a synchronizer module that is used to synchronize the read pointer into the write-clock domain. The synchronized read pointer will be used by the wptr_full module to generate the FIFO full condition. This module only contains flip-flops that are synchronized to the write clock.
sync_w2r.v - This is a synchronizer module that is used to synchronize the write pointer into the read-clock domain. The synchronized write pointer will be used by the rptr_empty module to generate the FIFO empty condition. This module only contains flip-flops that are synchronized to the read clock.
rptr_empty.v - This module is completely synchronous to the read-clock domain and contains the FIFO read pointer and empty-flag logic.
wptr_full.v - This module is completely synchronous to the write-clock domain and contains the FIFO write pointer and full-flag logic.
In order to perform FIFO full and FIFO empty tests using this FIFO style, the read and write pointers must be passed to the opposite clock domain for pointer comparison.
CHAPTER 3: Functional verification
Once the coding is completed it has to be checked if the design is doing its expected function. This is called the functional verification. One can develop a test bench using Verilog or VHDL to apply all possible stimuli at the input and check the output that is generated by the code. For the above design it may not be a great deal to check the possible combinations of the input. This is because there are only just four possible combinations to be tested. But for a huge chip, this may be a very involved job. Once the designer is sure that the code functions as expected, he will take the code through the synthesis process to convert it into gates.
3.1 Challenges in Testing Asynchronous Designs
Attempting to synchronize multiple changing signals from one clock domain into a new clock domain and insuring that all changing signals are synchronized to the same clock cycle in the new clock domain is a difficult task. FIFOs are used in designs to safely pass multi-bit data words from one clock domain to another. 
One of the most biggest and serious problems associated with multiple clock designs is when two stages of logic are combined using asynchronous clocks. Asynchronous logic can create metastable states that can seriously degrade the performance of the design or completely destroy the functionality. A metastable state is created when the flip-flop's timing requirements (setup and hold times) are violated. The resulting output of the flip-flop is unknown, and can make the entire design nondeterministic. If one stage of logic asynchronously feeds data to another, it is difficult, if not impossible to meet the set-up and hold-time requirements of the flip-flop, as shown in Figure below. 
Figure 3.1: Occurrence of metastability in multiple clock design 
3.2 Asynchronicity and Its Solutions
Before creating any logic with asynchronous clocking, one should exhaustively consider another alternative. Combining logic stages with asynchronous clocks is a dominant source of problems. Again, when a flip-flop's setup and hold time constraints are violated, the output becomes unpredictable for a short amount of time and will eventually settle to a "1" or "0". Which state it will settle in, is impossible to predict. Fortunately, there are some solutions to the problems of metastability and are described as:
3.2.1 The Double-Registering Technique:
Data coming into the first flip-flop is asynchronous with the clock, so the first flip-flop will almost certainly go metastable. However, the second flip-flop will never go metastable as long as the length of metastability is less than the period of the clock. If the clock is not too fast to meet normal timing constraints, it is probably not going to propagate metastable states.
3.2.2 Sampling Data at Faster clock:
Another way to avoid problems with asynchronous clocks is to ignore the slower clock and sample the data with the faster clock. This requires that the data have special framing characters (a preamble, for example) to define the data boundary. This is a common practice and can be found in nearly every embedded system in the form of a UART. A very fast clock, say 16 times the data symbol rate, will sample until 15 consecutive start characters are found. The design then declares that the next 16 (or so) bits correspond to the first bit sent; the next 16 (or so) bits are the next bit, and so on.
3.2.3 Gray Counters:
If the data being read is a counter, such as read or write addresses from an asynchronous FIFO, as in this project, one should use gray counters instead of using traditional binary counters. A traditional 3-bit counter can have one, two, or three bits changing between states. For example, if the read occurs at the instance when the counter is changing from "011" to "100," then the state of all three bits is unknown, and the read value can be any of the eight possible states. If the counter is built using a grey code, as explained in previous chapter, then only one bit can change from one state to the next. If the read occurs at the instance that the counter is changing, then only one bit will be in question, and there are only two possible outcomes to the read operation. Furthermore, the two possible values will be the value of the counter just before the read and the value of the counter just after the read. Since the read occurred at a time when the counter was in transition, it is impossible to say with certainty that one value is correct while the other is not. In other words, either value should be considered valid. 
3.3 Testing Strategy
Testing a FIFO design for subtle design problems is nearly impossible to do. The problem is rooted in the fact that FIFO pointers in an RTL simulation behave ideally, even though, if incorrectly implemented, they can cause catastrophic failures if used in a real design. So, we must write test plan (stimulus block) to check if Asynchronous FIFO is functioning correctly along with all its internal logic conditions of comparing the pointer values to generate full and empty flags.
Firstly, reset the FIFO read and write pointers to its initial value i.e zero. Then, start writing data into FIFO memory at the frequency of 200 MHz at the same time we read from its memory at the frequency of 40Mhz. Data is arriving in 32 word 16-bit wide bursts.
Once, the top level design is finished and ready to be tested, various sub-modules like the synchronization modules, gray counters etc. were first tested and verified separately. The test vectors written into the FIFO buffer were chosen randomly. Thus, the technique for this design is primarily exhaustive testing. Random inputs were first written into respective addresses and then again those were read back from their respective addresses.
Test bench is written in such a way that it writes data into every location of the FIFO buffer. By this it make sure that every location in RAM is writable and stores data correctly. Moreover, it also satisfied the working of pointers to generate full flag when the top location is reached in the process of writing the data. When the data is written to its entire memory location full flag is generated to make sure that there is no overflow problem.
As soon as the data is written up to the top location of the FIFO RAM, the logic is checked in order to accurately and properly read the data from the buffer. Again, the reading was performed till the last location, in order to check the empty flag generation logic. The generation of proper full and empty flags as expected, confirmed the proper functioning of internal synchronization logic.
Testing of asynchronous designs like asynchronous FIFO designs is not an easy task in terms of looking out for errors in an RTL simulation, with no backannotated delays, there is only a slight chance of observing a problem if the gate transitions are different for rising and falling edge signals, and even then, one would have to get lucky and have the correct sequence of bits changing just prior to and just after a rising clock edge. For higher speed designs, the delay differences between rising and falling edge signals diminishes and the probability of detecting problems also diminishes. Finding actual FIFO design problems is greatest for gate-level designs with backannotated delays, but even doing this type of simulation, finding problems is difficult to do and again the odds of observing the design problems decreases as signal propagation delays diminish. Most almost-correct FIFO designs functions properly more than 99% of the time. Unfortunately, FIFOs that work properly 99% of the time, still have design flaws that are usually the most difficult to detect and debug or the most costly to diagnose and recall.