This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
A Framework for Modeling Impact of Intrinsic Parameter Fluctuations at Architectural-Level
The International Technology Roadmap for Semiconductors (ITRS) predicts that the microelectronic industry will bene¬t enormously from MOSFET miniaturization to the nanometer regime for the next decades. However, the scaling of device size are approaching fundamental physical limits . One of the most challenging by-products of feature scaling that is proving extremely dif¬cult to manage are the increasing variations of the transistor characteristics due to intrinsic parameter ¬‚uctuations (IPF). This problem is associated with the fundamental discreteness of charge and matter ,  and cannot be removed by better processing steps or improved equipment . It has been experimentally demonstrated at device and circuit level that with the continuing scaling of the conventional MOSFETs, IPF will adversely affect circuit performance , . As the process and technology to build the next generation devices and IC are very complex or still unavailable, several simulation methodology to investigate the impact of IPF at circuit-level have been introduced , , . However, application of circuit-level simulation are limited because it is only suitable to investigate circuit blocks. From architecture-level point of view, investigating the intrinsic transistor variability is important, because architectural techniques can control large groups of circuits (e.g. control circuitry, cache lines or the entire cache) at once. Evaluation of computer architecture or micro-architecture is usually done by simulators based on instruction-set or register-transfer level , , .
Typical simulations study compare the performances of hardware-enhanced models with a baseline model by running benchmark programs. These tools also provides the opportunity to investigate static or dynamic power , battery dissipation , compiler development , chip area and system reliability . Most of these studies requires analytical MOSFET models or compact-model parameters that do not fully account IPF effects that is critical for nanometer regime devices.
In this paper, a framework to bridge architecture-level and device-level simulation will be presented. This framework allows modeling of computer architecture using realistic and physical device emulation. The rest of the paper is organized as follows. The nature of intrinsic parameter ¬‚uctuation in 6T-SRAM cells based on UTB-SOI MOSFETs will be presented in Section II. These baseline fault observed will be used as an input to the fault injection framework. The methodology for the framework will be described in Section III. The strategy for fault injection and the cache memory setup will be elaborated. The benchmark programs used for the purpose of this study will be presented. Preliminary result from cache memory simulation injected with intrinsic parameter ¬‚uctuation will be presented in Section IV. The conclusion of the work is given in Section V.
II. INTRINSIC PARAMETER FLUCTUATION IN 6T-SRAM
In this work, the impact of the scaling limit on ¬‚uctuation sensitive microprocessor cache memory due to discrete random dopants (RDD) in the source/drain regions, line edge roughness (LER) and body-thickness variations (BTV) of Ultra-Thin-Body (UTB) SOI MOSFETs will be studied. The impact of these intrinsic parameter ¬‚uctuations on individual 6T SRAM cell based on UTB SOI MOSFETs with physical channel length of 7.5 nm and 5 nm have been presented elsewhere . SRAM cells are classi¬ed as faulty if the 2009 International Conference on Signal Processing Systems 978-0-7695-3654-5/09 $25.00 © 2009 IEEE DOI 10.1109/ICSPS.2009.89 574
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on July 29, 2009 at 10:37 from IEEE Xplore. Restrictions apply. Fig. 1. -6 of Static Noise Margin (SNM) as a function of cell ratio for UTB SOI MOSFETs and 35nm bulk MOSFET. Lines: -6 >32mV for 10nm, >28mV for 7.5nm and >28mV for 5nm.
VIRTUAL MACHINE SPECIFICATIONS
System architecture arm-sa1110
Microprocessor ARMv5TE , 30MHZ
Physical Memory 32 MB
Data Bus 32 bit
Operating System Linux 2.4 variation of the Static Noise Margin (SNM)  are larger than 6-sigma manufacturing requirement  as shown in SRAM cell due to destructive read and unsuccessful write. An increase of access time of the cell would also cause failure . Increasing the SRAM cell ratio could improve the resistant of the cell to IPF as shown in Figure 1, however the overall cell size would increase. These data-sets has been statistically analyzed and a Gaussian probability distribution function will be used to inject faulty cells into the cache memory of a microprocessor. Each of the simulation is repeated ten times and the average is calculated. The inter-die variation has been ignored to clearly illustrate the impact of IPF on cache memory.
A. Fault Injection Framework and Cache Memory Setup
The fault injection framework is built on top of Simics , a system level instruction set simulator that allows execution of machine code and monitoring of various component of computer hardware with cycle accurate ability. The operation of the computer hardware can be traced which is especially bene¬cial for evaluation and debugging of architectural and software implementation.
DATA CACHE MEMORY CONFIGURATION
Size 8 KB
Bytes / Block 32
Cache mapping Direct
Replacement policy LRU
Write policy Write-through
MECHANISM FOR READ FAULT
Qt Qt+1 Status
1 1 No fault
0 1 Faulty
MECHANISM FOR WRITE FAULT
Qt D Qt+1 Status
0 0 0 No fault
0 1 1 No fault
1 1 1 No fault
1 0 1 Faulty
computer platform will be utilized which its speci¬cations listed in Table I. The targeted architecture for this study is a generic ARMv5 processor that model an Intel StrongARM- 1110 microprocessor. The choice was guided by the simple architecture including the instruction set and widely available literature documenting the processor. A minimal Linux kernel 2.4 operating system is used to boot the virtual machine in order to run the benchmark program. Modern processors have large specialized and multi-level cache memory, however only the L1 data cache is the most susceptible architecture components to the impacts of process variation , . This also lead to the selection of the ARMv5 microprocessor architecture for the purpose of this study. By default, Simics does not model any cache system, thus a data cache model for the microprocessor was developed while the instruction data cache was not included. The con¬guration of the data cache is summarized in Table II. The small data cache size simplify the IPF impact analysis on cache memory system. The data cache is organized as 256 blocks of 32 bytes of data with direct cache mapping ¬ll and physical address translation . Each line is associated with a cache tag and these lines are divided in eight 4-byte sub- blocks without error correction . The cache supports a least- recently-used (LRU) replacement policy and a write-through policy. Faulty cells in the cache memory does not directly translate into faulty cache read and write transaction Read and write faults status are determine based on the state of the memory and the speci¬c values during read and write transaction as summarized in Table III and IV. This mechanism are used by the framework cache memory read and write fault 575
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on July 29, 2009 at 10:37 from IEEE Xplore. Restrictions apply. TABLE V
AVERAGE FAULTY CELLS DUE TO DIFFERENT INDIVIDUAL AND
COMBINED SOURCE OF IPF. CELL RATIO IS ONE
Gate length 10 nm 7.5 nm 5 nm
RDD 0 773 2090
LER 0 0 440
BTV 0 0 1
Combined 0 557 1247
AVERAGE FAULTY CELLS DUE TO DIFFERENT INDIVIDUAL AND
COMBINED SOURCE OF IPF. CELL RATIO IS TWO
Gate length 10 nm 7.5 nm 5 nm
RDD 0 0 36
LER 0 0 0
BTV 0 0 0
Combined 0 0 125 handlers to simulate faulty cell behaviors when the error occur. Two simulation modes are available in the framework to analyze the behavior of IPF in cache memory. The normal mode would perform fault injection in the cache memory system and invoke the handler for the read and write fault mechanism. Fault emulation mode would just emulate existence of faulty cell however the read and write fault mechanism is not enabled. This mode allow analysis of targeted architecture that does not implement cache memory fault tolerance policy.
B. Benchmark Selection
Dhrystone benchmark  program has been used to re¬‚ect microprocessor activities to study the impact of IPF on cache memory. Dhrystone code is dominated by integer arithmetic, string operation, logic decisions and memory accesses frequently found in most general purpose computing application. However, since Dhrystone is very compact and because of its small size memory access beyond the cache is not exercised. This benchmark characteristics facilitate the impact study of IPF on cache memory.
IV. RESULTS AND DISCUSSIONS
The framework has been used to inject faulty cells due to different sources of individual and combine IPF on a processor cache memory. The cache model for the processor architecture used in this study consist of 65536 individual SRAM cell (8 KB). The cell ratio of each cell has been varied to control the adverse effects of IPF. Table V summarize the fault injection result obtained from the framework. Using the baseline fault data-sets with 6 manufacturing tolerance as the input for the fault injection framework does not produce any faulty cell for cache memory based on 10 nm UTB-SOI MOSFETs device with cell ratio of one as shown in Table V. Cache memory based on 7.5 nm UTB-SOI MOSFETs would only have faulty cells due to RDD or combined sources of IPF. However, increasing the cell ratio to two would prevent the probability of a memory cell to be faulty for the 7.5 nm UTB-SOI MOSFETs as shown in Table VI. The number of faulty cell for the cache memory based on 5 nm UTB- SOI MOSFETs has been reduced signi¬cantly for RDD and combined sources of IPF to 98.8% and 89.9% respectively. There is no faulty cells in cache memory based on 5 nm gate length device injected with LER and BTV. Note that the source of individual and combined IPF are statistically independent and provides some evidence that these sources of ¬‚uctuations are uncorrelated. ARM's Dhrystone version 2.1 benchmark program has been executed on the framework for cache memory built with 7.5 nm and 5 nm physical gate length UTB-SOI MOSFETs. Since Dhrystone have virtually no write transaction, write fault handler mechanism in the framework have never been invoked. Due to various factors a single benchmark program could not be representative due to the diversity of processor cores, memory architectures, memory management and compilers optimization. Another benchmark program with a different nature might provide a different outcome .
Dhrystone caused a total of 171675 cache memory transaction in the framework. Whenever the cache memory contain faulty cells, the benchmark program exit prematurely. This behavior is because the fault-tolerance mechanism is purposely not built into the framework cache memory system. The fault emulation mode of the framework is used to obtained the following result. Figure 2 illustrate the impact of various individual and combined sources of IPF on a cache memory system built using 5 nm UTB-SOI MOSFETs with a cell ratio of one. Cache memory transaction that access cache lines with faulty cell (FT) for the targeted architecture injected with RDD dominate almost 75% of the total transaction. One or more cells within each line may be faulty due to IPF. This observation could be explained by the non-uniform access pattern of the cache memory , cache architecture design, selection of sizes and replacement- policy . Line edge roughness is the second dominant 576
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on July 29, 2009 at 10:37 from IEEE Xplore. Restrictions apply. intrinsic parameter ¬‚uctuation with 42% ransaction in FT and 28% of fault occurrence in read transaction. Careful selection of cache architecture and sizes, replacement-policy and inclusion of fault tolerant scheme might be able to control the adverse effect of IPF , .
V. CONCLUSION AND FUTURE WORK
A framework to bridge architecture-level and device- level simulation has been presented. The framework has been used to analyze the impact of various individual and combined sources of intrinsic parameter ¬‚uctuation from UTB-SOI transistors within the 25 nm and 13 nm technology node in a simple cache memory system. Without carefully designing cache memory system, IPF could adversely affect the performance and yield of the corresponding system. The next step of this work is to identify IPF tolerant cache architecture, cache management policy and fault tolerance policy that could overcome the impact of IPF without sacri¬cing the ef¬ciency of future microprocessors.
 H.-S. P. Wong, "Beyond the conventional transistor," in IBM Journal of Research and Development, vol. 46, pp. 133-168, 2002.
 T. Mizuno, M. Iwase, H. Niiyama, T. Shibata, K. Fujisaki, T. Nakasugi, A. Toriumi, and Y. Ushiku, "Performance ¬‚uctuations of 0.10 m
MOSFETs-limitation of 0.1 m ULSIs," in Symposium on VLSI Technology, pp. 13-14, 1994.
 T. Mizuno, J. Okumtura, and A. Toriumi, "Experimental study of threshold voltage ¬‚uctuation due to statistical variation of channel dopant number in MOSFET's," IEEE Transactions on Electron Devices, vol. 41, no. 11, pp. 2216-2221, 1994.
 G. Roy, F. Adamu-Lema, A. R. Brown, S. Roy, and A. Asenov, "Intrinsic parameter ¬‚uctuations in conventional mosfets until the end of the itrs: A statistical simulation study," in 7th International Conference on New Phenomena in Mesoscopic Systems and 5th International Conference on Surfaces and Interfaces in Mesoscopic Devices (NPMS/SIMD), pp. 35-36, 2005.
 A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, and K. Chin, "Fluctuation limits and scaling opportunities for CMOS SRAM cells," in International Electron Devices Meeting (IEDM) Technical Digest, 2005.
 R. Venkatraman, R. Castagnetti, and S. Ramesh, "The statistics of device variations and its impact on SRAM bitcell performance, leakage and stability," in International Symposium on Quality Electronic Design (ISQED), 2006.
 K. A. Bowman, X. Tang, J. C. Eble, and J. D. Meindl, "Impact of extrinsic and intrinsic parameter ¬‚uctuations on CMOS circuit performance," IEEE Journal of Solid-State Circuits, vol. 35, no. 8,pp. 1186-1193, 2000.
 A. M. J. Bhavnagarwala, A.J.; Kapoor, "Dynamic-threshold CMOS SRAM cells for fast, portable applications," in 13th Annual IEEE International ASIC/SOC Conference, pp. 359-363, 2000.
 B. Cheng, S. Roy, G. Roy, and A. Asenov, "Integrating 'atomistic', intrinsic parameter ¬‚uctuations into compact model circuit analysis," in European Solid-State Device Research Conference (ESSDERC),pp. 437-440, 2003.
 V. S. Pai, P. Ranganathan, and S. V. Adve, "RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors andUniprocessors," in Proceedings of the Third Workshop on Computer Architecture Education, February 1997. Also appears in IEEE TCCA Newsletter, October 1997.
 M. Oskin, F. Chong, and M. Farrens, "Hls: combining statistical and symbolic simulation to guide microprocessor designs," in Computer Architecture, 2000. Proceedings of the 27th International Symposium on, pp. 71-82, 2000.
 T. Austin, E. Larson, and D. Ernst, "Simplescalar: an infrastructure for computer system modeling," Computer, vol. 35, no. 2, pp. 59-67, 2002.
 K. Samsudin, F. Adamu-Lema, A. Brown, S. Roy, and A. Asenov, "Combined sources of intrinsic parameter ¬‚uctuations in sub-25 nm generation UTB-SOI MOSFETs: A statistical simulation study," Solid- State Electronics, vol. 51, pp. 611-616, Apr. 2007. [Impact Factor =1.247 (2005)].
 E. Seevinck, F. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," IEEE Journal of Solid-State Circuits, vol. 22, no. 5, pp. 748-754, 1987.
 P. Fieler and J. Loverro, N., "Defects tail off with six-sigma manufacturing," IEEE Circuits and Devices Magazine, vol. 7, no. 5, pp. 18-20, 48, 1991.
 A. Agarwal, B. Paul, H. Mahmoodi, A. Datta, and K. Roy, "A process-tolerant cache architecture for improved yield in nanoscale technologies," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 13, pp. 27-38, Jan. 2005.
 V. Inc., "Simics." http://www.virtutech.com/.
 X. Liang, R. Canal, G.-Y. Wei, and D. Brooks, "Process variation tolerant 3T1D-based cache architectures," in 40th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 2007., pp. 15- 26, 2007.
 A. Agarwal, B. Paul, S. Mukhopadhyay, and K. Roy, "Process variation in embedded memories: failure analysis and variation aware architecture," IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp. 1804-1814, 2005.
 R. P. Weicker, "Dhrystone: a synthetic systems programming benchmark," Communications of the ACM, vol. 27, pp. 1013-1030, Oct. 1984.
 C. Krintz, Y. Wen, and R. Wolski, "Application-level prediction of battery dissipation," in Low Power Electronics and Design, 2004. ISLPED '04. Proceedings of the 2004 International Symposium on, pp. 224-229, 2004.
 R. Rao, J. Wenck, D. Franklin, R. Amirtharajah, and V. Akella, "Segmented Bitline Cache: Exploiting non-uniform memory access patterns," LECTURE NOTES IN COMPUTER SCIENCE, vol. 4297, p. 123, 2006.
 M. Brorsson and P. Stenstrom, "Modelling accesses to migratory and producer-consumer characteriseddata in a shared memory multiprocessor," in Sixth IEEE Symposium on Parallel and Distributed Processing, pp. 612-619, 1994. 577