Wednesday, 26 December 2012

Design For testability

http://larc.ee.nthu.edu.tw/~cww/n/625/6251/05DFT0603.pdf

http://www.eng.auburn.edu/~strouce/class/elec6970/BISTc3.pdf

http://tiger.ee.nctu.edu.tw/course/Testing2011/notes/ch8.DFT.pdf

CMOS FABRICATION TECHNOLOGY AND DESIGN RULES

Chapter 2
CMOS FABRICATION TECHNOLOGY AND DESIGN RULES

Notice: This chapter is a largely based on Chapter 2 (Fabrication of MOSFETs) of the book CMOS Digital Integrated Circuit Design - Analysis and Design by S.M. Kang and Y. Leblebici.

In this chapter, the fundamentals of MOS chip fabrication will be discussed and the major steps of the process flow will be examined. It is not the aim of this chapter to present a detailed discussion of silicon fabrication technology, which deserves separate treatment in a dedicated course. Rather, the emphasis will be on the general outline of the process flow and on the interaction of various processing steps, which ultimately determine the device and the circuit performance characteristics. The following chapters show that there are very strong links between the fabrication process, the circuit design process and the performance of the resulting chip. Hence, circuit designers must have a working knowledge of chip fabrication to create effective designs and in order to optimize the circuits with respect to various manufacturing parameters. Also, the circuit designer must have a clear understanding of the roles of various masks used in the fabrication process, and how the masks are used to define various features of the devices on-chip.

The following discussion will concentrate on the well-established CMOS fabrication technology, which requires that both n-channel (nMOS) and p-channel (pMOS) transistors be built on the same chip substrate. To accommodate both nMOS and pMOS devices, special regions must be created in which the semiconductor type is opposite to the substrate type. These regions are called wells or tubs. A p-well is created in an n-type substrate or, alternatively, an n- well is created in a p-type substrate. In the simple n-well CMOS fabrication technology presented, the nMOS transistor is created in the p-type substrate, and the pMOS transistor is created in the n-well, which is built-in into the p-type substrate. In the twin-tub CMOS technology, additional tubs of the same type as the substrate can also be created for device optimization.

The simplified process sequence for the fabrication of CMOS integrated circuits on a p- type silicon substrate is shown in Fig. 2.1. The process starts with the creation of the n-well regions for pMOS transistors, by impurity implantation into the substrate. Then, a thick oxide is grown in the regions surrounding the nMOS and pMOS active regions. The thin gate oxide is subsequently grown on the surface through thermal oxidation. These steps are followed by the creation of n+ and p+ regions (source, drain and channel-stop implants) and by final metallization (creation of metal interconnects).

[Click to enlarge image]

Figure-2.1: Simplified process sequence for fabrication of the n-well CMOS integrated circuit with a single polysilicon layer, showing only major fabrication steps.

The process flow sequence pictured in Fig. 2.1 may at first seem to be too abstract, since detailed fabrication steps are not shown. To obtain a better understanding of the issues involved in the semiconductor fabrication process, we first have to consider some of the basic steps in more detail.

2.2 Fabrication Process Flow - Basic Steps

Note that each processing step requires that certain areas are defined on chip by appropriate masks. Consequently, the integrated circuit may be viewed as a set of patterned layers of doped silicon, polysilicon, metal and insulating silicon dioxide. In general, a layer must be patterned before the next layer of material is applied on chip. The process used to transfer a pattern to a layer on the chip is called lithography. Since each layer has its own distinct patterning requirements, the lithographic sequence must be repeated for every layer, using a different mask.

To illustrate the fabrication steps involved in patterning silicon dioxide through optical lithography, let us first examine the process flow shown in Fig. 2.2. The sequence starts with the thermal oxidation of the silicon surface, by which an oxide layer of about 1 micrometer thickness, for example, is created on the substrate (Fig. 2.2(b)). The entire oxide surface is then covered with a layer of photoresist, which is essentially a light-sensitive, acid-resistant organic polymer, initially insoluble in the developing solution (Fig. 2.2(c)). If the photoresist material is exposed to ultraviolet (UV) light, the exposed areas become soluble so that the they are no longer resistant to etching solvents. To selectively expose the photoresist, we have to cover some of the areas on the surface with a mask during exposure. Thus, when the structure with the mask on top is exposed to UV light, areas which are covered by the opaque features on the mask are shielded. In the areas where the UV light can pass through, on the other hand, the photoresist is exposed and becomes soluble (Fig. 2.2(d)).

[Click to enlarge image]

[Click to enlarge image]

Figure-2.2: Process steps required for patterning of silicon dioxide.

The type of photoresist which is initially insoluble and becomes soluble after exposure to UV light is called positive photoresist. The process sequence shown in Fig. 2.2 uses positive photoresist. There is another type of photoresist which is initially soluble and becomes insoluble (hardened) after exposure to UV light, called negative photoresist. If negative photoresist is used in the photolithography process, the areas which are not shielded from the UV light by the opaque mask features become insoluble, whereas the shielded areas can subsequently be etched away by a developing solution. Negative photoresists are more sensitive to light, but their photolithographic resolution is not as high as that of the positive photoresists. Therefore, negative photoresists are used less commonly in the manufacturing of high-density integrated circuits.

Following the UV exposure step, the unexposed portions of the photoresist can be removed by a solvent. Now, the silicon dioxide regions which are not covered by hardened photoresist can be etched away either by using a chemical solvent (HF acid) or by using a dry etch (plasma etch) process (Fig. 2.2(e)). Note that at the end of this step, we obtain an oxide window that reaches down to the silicon surface (Fig. 2.2(f)). The remaining photoresist can now be stripped from the silicon dioxide surface by using another solvent, leaving the patterned silicon dioxide feature on the surface as shown in Fig. 2.2(g).

The sequence of process steps illustrated in detail in Fig. 2.2 actually accomplishes a single pattern transfer onto the silicon dioxide surface, as shown in Fig. 2.3. The fabrication of semiconductor devices requires several such pattern transfers to be performed on silicon dioxide, polysilicon, and metal. The basic patterning process used in all fabrication steps, however, is quite similar to the one shown in Fig. 2.2. Also note that for accurate generation of high-density patterns required in sub-micron devices, electron beam (E-beam) lithography is used instead of optical lithography. In the following, the main processing steps involved in the fabrication of an n-channel MOS transistor on p-type silicon substrate will be examined.

[Click to enlarge image]

Figure-2.3: The result of a single lithographic patterning sequence on silicon dioxide, without showing the intermediate steps. Compare the unpatterned structure (top) and the patterned structure (bottom) with Fig. 2.2(b) and Fig. 2.2(g), respectively.

The process starts with the oxidation of the silicon substrate (Fig. 2.4(a)), in which a relatively thick silicon dioxide layer, also called field oxide, is created on the surface (Fig. 2.4(b)). Then, the field oxide is selectively etched to expose the silicon surface on which the MOS transistor will be created (Fig. 2.4(c)). Following this step, the surface is covered with a thin, high-quality oxide layer, which will eventually form the gate oxide of the MOS transistor (Fig. 2.4(d)). On top of the thin oxide, a layer of polysilicon (polycrystalline silicon) is deposited (Fig. 2.4(e)). Polysilicon is used both as gate electrode material for MOS transistors and also as an interconnect medium in silicon integrated circuits. Undoped polysilicon has relatively high resistivity. The resistivity of polysilicon can be reduced, however, by doping it with impurity atoms.

After deposition, the polysilicon layer is patterned and etched to form the interconnects and the MOS transistor gates (Fig. 2.4(f)). The thin gate oxide not covered by polysilicon is also etched away, which exposes the bare silicon surface on which the source and drain junctions are to be formed (Fig. 2.4(g)). The entire silicon surface is then doped with a high concentration of impurities, either through diffusion or ion implantation (in this case with donor atoms to produce n-type doping). Figure 2.4(h) shows that the doping penetrates the exposed areas on the silicon surface, ultimately creating two n-type regions (source and drain junctions) in the p-type substrate. The impurity doping also penetrates the polysilicon on the surface, reducing its resistivity. Note that the polysilicon gate, which is patterned before doping actually defines the precise location of the channel region and, hence, the location of the source and the drain regions. Since this procedure allows very precise positioning of the two regions relative to the gate, it is also called the self-aligned process.

[Click to enlarge image]

[Click to enlarge image]

[Click to enlarge image]

Figure-2.4: Process flow for the fabrication of an n-type MOSFET on p-type silicon.

Once the source and drain regions are completed, the entire surface is again covered with an insulating layer of silicon dioxide (Fig. 2.4(i)). The insulating oxide layer is then patterned in order to provide contact windows for the drain and source junctions (Fig. 2.4(j)). The surface is covered with evaporated aluminum which will form the interconnects (Fig. 2.4(k)). Finally, the metal layer is patterned and etched, completing the interconnection of the MOS transistors on the surface (Fig. 2.4(l)). Usually, a second (and third) layer of metallic interconnect can also be added on top of this structure by creating another insulating oxide layer, cutting contact (via) holes, depositing, and patterning the metal.

2.3 The CMOS n-Well Process

Having examined the basic process steps for pattern transfer through lithography, and having gone through the fabrication procedure of a single n-type MOS transistor, we can now return to the generalized fabrication sequence of n-well CMOS integrated circuits, as shown in Fig. 2.1. In the following figures, some of the important process steps involved in the fabrication of a CMOS inverter will be shown by a top view of the lithographic masks and a cross-sectional view of the relevant areas.

The n-well CMOS process starts with a moderately doped (with impurity concentration typically less than 1015 cm-3) p-type silicon substrate. Then, an initial oxide layer is grown on the entire surface. The first lithographic mask defines the n-well region. Donor atoms, usually phosphorus, are implanted through this window in the oxide. Once the n-well is created, the active areas of the nMOS and pMOS transistors can be defined. Figures 2.5 through 2.10 illustrate the significant milestones that occur during the fabrication process of a CMOS inverter.

[Click to enlarge image]

Figure-2.5: Following the creation of the n-well region, a thick field oxide is grown in the areas surrounding the transistor active regions, and a thin gate oxide is grown on top of the active regions. The thickness and the quality of the gate oxide are two of the most critical fabrication parameters, since they strongly affect the operational characteristics of the MOS transistor, as well as its long-term reliability.

[Click to enlarge image]

Figure-2.6: The polysilicon layer is deposited using chemical vapor deposition (CVD) and patterned by dry (plasma) etching. The created polysilicon lines will function as the gate electrodes of the nMOS and the pMOS transistors and their interconnects. Also, the polysilicon gates act as self-aligned masks for the source and drain implantations that follow this step.

[Click to enlarge image]

Figure-2.7: Using a set of two masks, the n+ and p+ regions are implanted into the substrate and into the n- well, respectively. Also, the ohmic contacts to the substrate and to the n-well are implanted in this process step.

[Click to enlarge image]

Figure-2.8: An insulating silicon dioxide layer is deposited over the entire wafer using CVD. Then, the contacts are defined and etched away to expose the silicon or polysilicon contact windows. These contact windows are necessary to complete the circuit interconnections using the metal layer, which is patterned in the next step.

[Click to enlarge image]

Figure-2.9: Metal (aluminum) is deposited over the entire chip surface using metal evaporation, and the metal lines are patterned through etching. Since the wafer surface is non-planar, the quality and the integrity of the metal lines created in this step are very critical and are ultimately essential for circuit reliability.

[Click to enlarge image]

Figure-2.10: The composite layout and the resulting cross-sectional view of the chip, showing one nMOS and one pMOS transistor (built-in n-well), the polysilicon and metal interconnections. The final step is to deposit the passivation layer (for protection) over the chip, except for wire-bonding pad areas.

The patterning process by the use of a succession of masks and process steps is conceptually summarized in Fig. 2.11. It is seen that a series of masking steps must be sequentially performed for the desired patterns to be created on the wafer surface. An example of the end result of this sequence is shown as a cross-section on the right.

[Click to enlarge image]

Figure-2.11: Conceptual illustration of the mask sequence applied to create desired structures.

2.4 Advanced CMOS Fabrication Technologies

In this section, two examples will be given for advanced CMOS processes which offer additional benefits in terms of device performance and integration density. These processes, namely, the twin-tub CMOS process and the silicon-on-insulator (SOI) process, are becoming especially more popular for sub-micron geometries where device performance and density must be pushed beyond the limits of the conventional n-well CMOS process.

Twin-Tub (Twin-Well) CMOS Process

This technology provides the basis for separate optimization of the nMOS and pMOS transistors, thus making it possible for threshold voltage, body effect and the channel transconductance of both types of transistors to be tuned independently. Generally, the starting material is a n+ or p+ substrate, with a lightly doped epitaxial layer on top. This epitaxial layer provides the actual substrate on which the n-well and the p-well are formed. Since two independent doping steps are performed for the creation of the well regions, the dopant concentrations can be carefully optimized to produce the desired device characteristics.

In the conventional n-well CMOS process, the doping density of the well region is typically about one order of magnitude higher than the substrate, which, among other effects, results in unbalanced drain parasitics. The twin-tub process (Fig. 2.12) also avoids this problem.

[Click to enlarge image]

Figure-2.12: Cross-section of nMOS and pMOS transistors in twin-tub CMOS process.

Silicon-on-Insulator (SOI) CMOS Process

Rather than using silicon as the substrate material, technologists have sought to use an insulating substrate to improve process characteristics such as speed and latch-up susceptibility. The SOI CMOS technology allows the creation of independent, completely isolated nMOS and pMOS transistors virtually side-by-side on an insulating substrate (for example: sapphire). The main advantages of this technology are the higher integration density (because of the absence of well regions), complete avoidance of the latch-up problem, and lower parasitic capacitances compared to the conventional n-well or twin-tub CMOS processes. A cross-section of nMOS and pMOS devices in created using SOI process is shown in Fig. 2.13.

The SOI CMOS process is considerably more costly than the standard n-well CMOS process. Yet the improvements of device performance and the absence of latch-up problems can justify its use, especially for deep-sub-micron devices.

[Click to enlarge image]

Figure-2.13: Cross-section of nMOS and pMOS transistors in SOI CMOS process.

2.5 Layout Design Rules

The physical mask layout of any circuit to be manufactured using a particular process must conform to a set of geometric constraints or rules, which are generally called layout design rules. These rules usually specify the minimum allowable line widths for physical objects on-chip such as metal and polysilicon interconnects or diffusion areas, minimum feature dimensions, and minimum allowable separations between two such features. If a metal line width is made too small, for example, it is possible for the line to break during the fabrication process or afterwards, resulting in an open circuit. If two lines are placed too close to each other in the layout, they may form an unwanted short circuit by merging during or after the fabrication process. The main objective of design rules is to achieve a high overall yield and reliability while using the smallest possible silicon area, for any circuit to be manufactured with a particular process.

Note that there is usually a trade-off between higher yield which is obtained through conservative geometries, and better area efficiency, which is obtained through aggressive, high- density placement of various features on the chip. The layout design rules which are specified for a particular fabrication process normally represent a reasonable optimum point in terms of yield and density. It must be emphasized, however, that the design rules do not represent strict boundaries which separate "correct" designs from "incorrect" ones. A layout which violates some of the specified design rules may still result in an operational circuit with reasonable yield, whereas another layout observing all specified design rules may result in a circuit which is not functional and/or has very low yield. To summarize, we can say, in general, that observing the layout design rules significantly increases the probability of fabricating a successful product with high yield.

The design rules are usually described in two ways :

Micron rules, in which the layout constraints such as minimum feature sizes and minimum allowable feature separations, are stated in terms of absolute dimensions in micrometers, or,
Lambda rules, which specify the layout constraints in terms of a single parameter (?) and, thus, allow linear, proportional scaling of all geometrical constraints.

Lambda-based layout design rules were originally devised to simplify the industry- standard micron-based design rules and to allow scaling capability for various processes. It must be emphasized, however, that most of the submicron CMOS process design rules do not lend themselves to straightforward linear scaling. The use of lambda-based design rules must therefore be handled with caution in sub-micron geometries. In the following, we present a sample set of the lambda-based layout design rules devised for the MOSIS CMOS process and illustrate the implications of these rules on a section a simple layout which includes two transistors (Fig. 2.14).

   MOSIS Layout Design Rules (sample set)



Rule number  Description     L-Rule 


 R1  Minimum active area width   3 L
 R2  Minimum active area spacing   3 L

  R3  Minimum poly width    2 L
 R4  Minimum poly spacing    2 L
 R5  Minimum gate extension of poly over active 2 L
 R6  Minimum poly-active edge spacing   1 L
   (poly outside active area)
 R7  Minimum poly-active edge spacing   3 L
   (poly inside active area) 

 R8  Minimum metal width    3 L
 R9  Minimum metal spacing    3 L

 R10  Poly contact size    2 L
 R11  Minimum poly contact spacing   2 L
 R12  Minimum poly contact to poly edge spacing 1 L
 R13  Minimum poly contact to metal edge spacing 1 L
 R14  Minimum poly contact to active edge spacing 3 L
 
 R15  Active contact size    2 L
 R16  Minimum active contact spacing   2 L
   (on the same active region)   
 R17  Minimum active contact to active edge spacing 1 L
 R18  Minimum active contact to metal edge spacing 1 L
 R19  Minimum active contact to poly edge spacing 3 L
 R20  Minimum active contact spacing   6 L
   (on different active regions)

[Click to enlarge image]

Figure-2.14: Illustration of some of the typical MOSIS layout design rules listed above.

INTRODUCTION TO VLSI SYSTEMS

Chapter 1
INTRODUCTION TO VLSI SYSTEMS

1.1 Historical Perspective

The electronics industry has achieved a phenomenal growth over the last two decades, mainly due to the rapid advances in integration technologies, large-scale systems design - in short, due to the advent of VLSI. The number of applications of integrated circuits in high-performance computing, telecommunications, and consumer electronics has been rising steadily, and at a very fast pace. Typically, the required computational power (or, in other words, the intelligence) of these applications is the driving force for the fast development of this field. Figure 1.1 gives an overview of the prominent trends in information technologies over the next few decades. The current leading-edge technologies (such as low bit-rate video and cellular communications) already provide the end-users a certain amount of processing power and portability. This trend is expected to continue, with very important implications on VLSI and systems design. One of the most important characteristics of information services is their increasing need for very high processing power and bandwidth (in order to handle real-time video, for example). The other important characteristic is that the information services tend to become more and more personalized (as opposed to collective services such as broadcasting), which means that the devices must be more intelligent to answer individual demands, and at the same time they must be portable to allow more flexibility/mobility.

[Click to enlarge image]

Figure-1.1: Prominent trends in information service technologies.

As more and more complex functions are required in various data processing and telecommunications devices, the need to integrate these functions in a small system/package is also increasing. The level of integration as measured by the number of logic gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid progress in processing technology and interconnect technology. Table 1.1 shows the evolution of logic complexity in integrated circuits over the last three decades, and marks the milestones of each era. Here, the numbers for circuit complexity should be interpreted only as representative examples to show the order-of-magnitude. A logic block can contain anywhere from 10 to 100 transistors, depending on the function. State-of-the-art examples of ULSI chips, such as the DEC Alpha or the INTEL Pentium contain 3 to 6 million transistors.

ERA     DATE   COMPLEXITY 
(number of logic blocks per chip)

Single transistor   1959   less than 1
Unit logic (one gate)   1960   1
Multi-function    1962   2 - 4
Complex function   1964   5 - 20
Medium Scale Integration  1967   20 - 200  (MSI)
Large Scale Integration   1972   200 - 2000  (LSI)
Very Large Scale Integration  1978   2000 - 20000 (VLSI)
Ultra Large Scale Integration  1989   20000 - ? (ULSI)

Table-1.1: Evolution of logic complexity in integrated circuits.

The most important message here is that the logic complexity per chip has been (and still is) increasing exponentially. The monolithic integration of a large number of functions on a single chip usually provides:

Less area/volume and therefore, compactness
Less power consumption
Less testing requirements at system level
Higher reliability, mainly due to improved on-chip interconnects
Higher speed, due to significantly reduced interconnection length
Significant cost savings

[Click to enlarge image]

Figure-1.2: Evolution of integration density and minimum feature size, as seen in the early 1980s.

Therefore, the current trend of integration will also continue in the foreseeable future. Advances in device manufacturing technology, and especially the steady reduction of minimum feature size (minimum length of a transistor or an interconnect realizable on chip) support this trend. Figure 1.2 shows the history and forecast of chip complexity - and minimum feature size - over time, as seen in the early 1980s. At that time, a minimum feature size of 0.3 microns was expected around the year 2000. The actual development of the technology, however, has far exceeded these expectations. A minimum size of 0.25 microns was readily achievable by the year 1995. As a direct result of this, the integration density has also exceeded previous expectations - the first 64 Mbit DRAM, and the INTEL Pentium microprocessor chip containing more than 3 million transistors were already available by 1994, pushing the envelope of integration density.

When comparing the integration density of integrated circuits, a clear distinction must be made between the memory chips and logic chips. Figure 1.3 shows the level of integration over time for memory and logic chips, starting in 1970. It can be observed that in terms of transistor count, logic chips contain significantly fewer transistors in any given year mainly due to large consumption of chip area for complex interconnects. Memory circuits are highly regular and thus more cells can be integrated with much less area for interconnects.

[Click to enlarge image]

Figure-1.3: Level of integration over time, for memory chips and logic chips.

Generally speaking, logic chips such as microprocessor chips and digital signal processing (DSP) chips contain not only large arrays of memory (SRAM) cells, but also many different functional units. As a result, their design complexity is considered much higher than that of memory chips, although advanced memory chips contain some sophisticated logic functions. The design complexity of logic chips increases almost exponentially with the number of transistors to be integrated. This is translated into the increase in the design cycle time, which is the time period from the start of the chip development until the mask-tape delivery time. However, in order to make the best use of the current technology, the chip development time has to be short enough to allow the maturing of chip manufacturing and timely delivery to customers. As a result, the level of actual logic integration tends to fall short of the integration level achievable with the current processing technology. Sophisticated computer-aided design (CAD) tools and methodologies are developed and applied in order to manage the rapidly increasing design complexity.

1.2 VLSI Design Flow

The design process, at various levels, is usually evolutionary in nature. It starts with a given set of requirements. Initial design is developed and tested against the requirements. When requirements are not met, the design has to be improved. If such improvement is either not possible or too costly, then the revision of requirements and its impact analysis must be considered. The Y-chart (first introduced by D. Gajski) shown in Fig. 1.4 illustrates a design flow for most logic chips, using design activities on three different axes (domains) which resemble the letter Y.

[Click to enlarge image]

Figure-1.4: Typical VLSI design flow in three domains (Y-chart representation).

The Y-chart consists of three major domains, namely:

behavioral domain,
structural domain,
geometrical layout domain.

The design flow starts from the algorithm that describes the behavior of the target chip. The corresponding architecture of the processor is first defined. It is mapped onto the chip surface by floorplanning. The next design evolution in the behavioral domain defines finite state machines (FSMs) which are structurally implemented with functional modules such as registers and arithmetic logic units (ALUs). These modules are then geometrically placed onto the chip surface using CAD tools for automatic module placement followed by routing, with a goal of minimizing the interconnects area and signal delays. The third evolution starts with a behavioral module description. Individual modules are then implemented with leaf cells. At this stage the chip is described in terms of logic gates (leaf cells), which can be placed and interconnected by using a cell placement & routing program. The last evolution involves a detailed Boolean description of leaf cells followed by a transistor level implementation of leaf cells and mask generation. In standard-cell based design, leaf cells are already pre-designed and stored in a library for logic design use.

[Click to enlarge image]

Figure-1.5: A more simplified view of VLSI design flow.

Figure 1.5 provides a more simplified view of the VLSI design flow, taking into account the various representations, or abstractions of design - behavioral, logic, circuit and mask layout. Note that the verification of design plays a very important role in every step during this process. The failure to properly verify a design in its early phases typically causes significant and expensive re-design at a later stage, which ultimately increases the time-to-market.

Although the design process has been described in linear fashion for simplicity, in reality there are many iterations back and forth, especially between any two neighboring steps, and occasionally even remotely separated pairs. Although top-down design flow provides an excellent design process control, in reality, there is no truly unidirectional top-down design flow. Both top-down and bottom-up approaches have to be combined. For instance, if a chip designer defined an architecture without close estimation of the corresponding chip area, then it is very likely that the resulting chip layout exceeds the area limit of the available technology. In such a case, in order to fit the architecture into the allowable chip area, some functions may have to be removed and the design process must be repeated. Such changes may require significant modification of the original requirements. Thus, it is very important to feed forward low-level information to higher levels (bottom up) as early as possible.

In the following, we will examine design methodologies and structured approaches which have been developed over the years to deal with both complex hardware and software projects. Regardless of the actual size of the project, the basic principles of structured design will improve the prospects of success. Some of the classical techniques for reducing the complexity of IC design are: Hierarchy, regularity, modularity and locality.

1.3 Design Hierarchy

The use of hierarchy, or “divide and conquer” technique involves dividing a module into sub- modules and then repeating this operation on the sub-modules until the complexity of the smaller parts becomes manageable. This approach is very similar to the software case where large programs are split into smaller and smaller sections until simple subroutines, with well-defined functions and interfaces, can be written. In Section 1.2, we have seen that the design of a VLSI chip can be represented in three domains. Correspondingly, a hierarchy structure can be described in each domain separately. However, it is important for the simplicity of design that the hierarchies in different domains can be mapped into each other easily.

As an example of structural hierarchy, Fig. 1.6 shows the structural decomposition of a CMOS four-bit adder into its components. The adder can be decomposed progressively into one- bit adders, separate carry and sum circuits, and finally, into individual logic gates. At this lower level of the hierarchy, the design of a simple circuit realizing a well-defined Boolean function is much more easier to handle than at the higher levels of the hierarchy.

In the physical domain, partitioning a complex system into its various functional blocks will provide a valuable guidance for the actual realization of these blocks on chip. Obviously, the approximate shape and size (area) of each sub-module should be estimated in order to provide a useful floorplan. Figure 1.7 shows the hierarchical decomposition of a four-bit adder in physical description (geometrical layout) domain, resulting in a simple floorplan. This physical view describes the external geometry of the adder, the locations of input and output pins, and how pin locations allow some signals (in this case the carry signals) to be transferred from one sub-block to the other without external routing. At lower levels of the physical hierarchy, the internal mask

[Click to enlarge image]

Figure-1.6: Structural decomposition of a four-bit adder circuit, showing the hierarchy down to gate level.

[Click to enlarge image]

Figure-1.7: Hierarchical decomposition of a four-bit adder in physical (geometrical) description domain.

[Click to enlarge image]

Figure-1.8: Layout of a 16-bit adder, and the components (sub-blocks) of its physical hierarchy.

[Click to enlarge image]

Figure-1.9: The structural hierarchy of a triangle generator chip.

[Click to enlarge image]

Figure-1.10: Physical layout of the triangle generator chip.

layout of each adder cell defines the locations and the connections of each transistor and wire. Figure 1.8 shows the full-custom layout of a 16-bit dynamic CMOS adder, and the sub-modules that describe the lower levels of its physical hierarchy. Here, the 16-bit adder consists of a cascade connection of four 4-bit adders, and each 4-bit adder can again be decomposed into its functional blocks such as the Manchester chain, carry/propagate circuits and the output buffers. Finally, Fig. 1.9 and Fig. 1.10 show the structural hierarchy and the physical layout of a simple triangle generator chip, respectively. Note that there is a corresponding physical description for every module in the structural hierarchy, i.e., the components of the physical view closely match this structural view.

1.4 Concepts of Regularity, Modularity and Locality

The hierarchical design approach reduces the design complexity by dividing the large system into several sub-modules. Usually, other design concepts and design approaches are also needed to simplify the process. Regularity means that the hierarchical decomposition of a large system should result in not only simple, but also similar blocks, as much as possible. A good example of regularity is the design of array structures consisting of identical cells - such as a parallel multiplication array. Regularity can exist at all levels of abstraction: At the transistor level, uniformly sized transistors simplify the design. At the logic level, identical gate structures can be used, etc. Figure 1.11 shows regular circuit-level designs of a 2-1 MUX (multiplexer), an D-type edge-triggered flip flop, and a one-bit full adder. Note that all of these circuits were designed by using inverters and tri-state buffers only. If the designer has a small library of well-defined and well-characterized basic building blocks, a number of different functions can be constructed by using this principle. Regularity usually reduces the number of different modules that need to be designed and verified, at all levels of abstraction.

[Click to enlarge image]

Figure-1.11: Regular design of a 2-1 MUX, a DFF and an adder, using inverters and tri-state buffers.

Modularity in design means that the various functional blocks which make up the larger system must have well-defined functions and interfaces. Modularity allows that each block or module can be designed relatively independently from each other, since there is no ambiguity about the function and the signal interface of these blocks. All of the blocks can be combined with ease at the end of the design process, to form the large system. The concept of modularity enables the parallelisation of the design process. It also allows the use of generic modules in various designs - the well-defined functionality and signal interface allow plug-and-play design.

By defining well-characterized interfaces for each module in the system, we effectively ensure that the internals of each module become unimportant to the exterior modules. Internal details remain at the local level. The concept of locality also ensures that connections are mostly between neighboring modules, avoiding long-distance connections as much as possible. This last point is extremely important for avoiding excessive interconnect delays. Time-critical operations should be performed locally, without the need to access distant modules or signals. If necessary, the replication of some logic may solve this problem in large system architectures.

1.5 VLSI Design Styles

Several design styles can be considered for chip implementation of specified algorithms or logic functions. Each design style has its own merits and shortcomings, and thus a proper choice has to be made by designers in order to provide the functionality at low cost.

1.5.1 Field Programmable Gate Array (FPGA)

Fully fabricated FPGA chips containing thousands of logic gates or even more, with programmable interconnects, are available to users for their custom hardware programming to realize desired functionality. This design style provides a means for fast prototyping and also for cost-effective chip design, especially for low-volume applications. A typical field programmable gate array (FPGA) chip consists of I/O buffers, an array of configurable logic blocks (CLBs), and programmable interconnect structures. The programming of the interconnects is implemented by programming of RAM cells whose output terminals are connected to the gates of MOS pass transistors. A general architecture of FPGA from XILINX is shown in Fig. 1.12. A more detailed view showing the locations of switch matrices used for interconnect routing is given in Fig. 1.13.

A simple CLB (model XC2000 from XILINX) is shown in Fig. 1.14. It consists of four signal input terminals (A, B, C, D), a clock signal terminal, user-programmable multiplexers, an SR-latch, and a look-up table (LUT). The LUT is a digital memory that stores the truth table of the Boolean function. Thus, it can generate any function of up to four variables or any two functions of three variables. The control terminals of multiplexers are not shown explicitly in Fig. 1.14.

The CLB is configured such that many different logic functions can be realized by programming its array. More sophisticated CLBs have also been introduced to map complex functions. The typical design flow of an FPGA chip starts with the behavioral description of its functionality, using a hardware description language such as VHDL. The synthesized architecture is then technology-mapped (or partitioned) into circuits or logic cells. At this stage, the chip design is completely described in terms of available logic cells. Next, the placement and routing step assigns individual logic cells to FPGA sites (CLBs) and determines the routing patterns among the cells in accordance with the netlist. After routing is completed, the on-chip

[Click to enlarge image]

Figure-1.12: General architecture of Xilinx FPGAs.

[Click to enlarge image]

Figure-1.13: Detailed view of switch matrices and interconnection routing between CLBs.

[Click to enlarge image]

Figure-1.14: XC2000 CLB of the Xilinx FPGA.

performance of the design can be simulated and verified before downloading the design for programming of the FPGA chip. The programming of the chip remains valid as long as the chip is powered-on, or until new programming is done. In most cases, full utilization of the FPGA chip area is not possible - many cell sites may remain unused.

The largest advantage of FPGA-based design is the very short turn-around time, i.e., the time required from the start of the design process until a functional chip is available. Since no physical manufacturing step is necessary for customizing the FPGA chip, a functional sample can be obtained almost as soon as the design is mapped into a specific technology. The typical price of FPGA chips are usually higher than other realization alternatives (such as gate array or standard cells) of the same design, but for small-volume production of ASIC chips and for fast prototyping, FPGA offers a very valuable option.

1.5.2 Gate Array Design

In view of the fast prototyping capability, the gate array (GA) comes after the FPGA. While the design implementation of the FPGA chip is done with user programming, that of the gate array is done with metal mask design and processing. Gate array implementation requires a two-step manufacturing process: The first phase, which is based on generic (standard) masks, results in an array of uncommitted transistors on each GA chip. These uncommitted chips can be stored for later customization, which is completed by defining the metal interconnects between the transistors of the array (Fig. 1.15). Since the patterning of metallic interconnects is done at the end of the chip fabrication, the turn-around time can be still short, a few days to a few weeks. Figure 1.16 shows a corner of a gate array chip which contains bonding pads on its left and bottom edges, diodes for I/O protection, nMOS transistors and pMOS transistors for chip output driver circuits in the neighboring areas of bonding pads, arrays of nMOS transistors and pMOS transistors, underpass wire segments, and power and ground buses along with contact windows.

[Click to enlarge image]

Figure-1.15: Basic processing steps required for gate array implementation.

[Click to enlarge image]

Figure-1.16: A corner of a typical gate array chip.

Figure 1.17 shows a magnified portion of the internal array with metal mask design (metal lines highlighted in dark) to realize a complex logic function. Typical gate array platforms allow dedicated areas, called channels, for intercell routing as shown in Figs. 1.16 and 1.17 between rows or columns of MOS transistors. The availability of these routing channels simplifies the interconnections, even using one metal layer only. The interconnection patterns to realize basic logic gates can be stored in a library, which can then be used to customize rows of uncommitted transistors according to the netlist. While most gate array platforms only contain rows of uncommitted transistors separated by routing channels, some other platforms also offer dedicated memory (RAM) arrays to allow a higher density where memory functions are required. Figure 1.18 shows the layout views of a conventional gate array and a gate array platform with two dedicated memory banks.

With the use of multiple interconnect layers, the routing can be achieved over the active cell areas; thus, the routing channels can be removed as in Sea-of-Gates (SOG) chips. Here, the entire chip surface is covered with uncommitted nMOS and pMOS transistors. As in the gate array case, neighboring transistors can be customized using a metal mask to form basic logic gates. For intercell routing, however, some of the uncommitted transistors must be sacrificed. This approach results in more flexibility for interconnections, and usually in a higher density. The basic platform of a SOG chip is shown in Fig. 1.19. Figure 1.20 offers a brief comparison between the channeled (GA) vs. the channelless (SOG) approaches.

[Click to enlarge image]

Figure-1.17: Metal mask design to realize a complex logic function on a channeled GA platform.

[Click to enlarge image]

Figure-1.18: Layout views of a conventional GA chip and a gate array with two memory banks.

[Click to enlarge image]

Figure-1.19: The platform of a Sea-of-Gates (SOG) chip.

In general, the GA chip utilization factor, as measured by the used chip area divided by the total chip area, is higher than that of the FPGA and so is the chip speed, since more customized design can be achieved with metal mask designs. The current gate array chips can implement as many as hundreds of thousands of logic gates.

[Click to enlarge image]

Figure-1.20: Comparison between the channeled (GA) vs. the channelless (SOG) approaches.

1.5.3 Standard-Cells Based Design

The standard-cells based design is one of the most prevalent full custom design styles which require development of a full custom mask set. The standard cell is also called the polycell. In this design style, all of the commonly used logic cells are developed, characterized, and stored in a standard cell library. A typical library may contain a few hundred cells including inverters, NAND gates, NOR gates, complex AOI, OAI gates, D-latches, and flip-flops. Each gate type can have multiple implementations to provide adequate driving capability for different fanouts. For instance, the inverter gate can have standard size transistors, double size transistors, and quadruple size transistors so that the chip designer can choose the proper size to achieve high circuit speed and layout density. The characterization of each cell is done for several different categories. It consists of

delay time vs. load capacitance
circuit simulation model
timing simulation model
fault simulation model
cell data for place-and-route
mask data

To enable automated placement of the cells and routing of inter-cell connections, each cell layout is designed with a fixed height, so that a number of cells can be abutted side-by-side to form rows. The power and ground rails typically run parallel to the upper and lower boundaries of the cell, thus, neighboring cells share a common power and ground bus. The input and output pins are located on the upper and lower boundaries of the cell. Figure 1.21 shows the layout of a typical standard cell. Notice that the nMOS transistors are located closer to the ground rail while the pMOS transistors are placed closer to the power rail.

[Click to enlarge image]

Figure-1.21: A standard cell layout example.

Figure 1.22 shows a floorplan for standard-cell based design. Inside the I/O frame which is reserved for I/O cells, the chip area contains rows or columns of standard cells. Between cell rows are channels for dedicated inter-cell routing. As in the case of Sea-of-Gates, with over-the- cell routing, the channel areas can be reduced or even removed provided that the cell rows offer sufficient routing space. The physical design and layout of logic cells ensure that when cells are placed into rows, their heights are matched and neighboring cells can be abutted side-by-side, which provides natural connections for power and ground lines in each row. The signal delay, noise margins, and power consumption of each cell should be also optimized with proper sizing of transistors using circuit simulation.

[Click to enlarge image]

Figure-1.22: A simplified floorplan of standard-cells-based design.

If a number of cells must share the same input and/or output signals, a common signal bus structure can also be incorporated into the standard-cell-based chip layout. Figure 1.23 shows the simplified symbolic view of a case where a signal bus has been inserted between the rows of standard cells. Note that in this case the chip consists of two blocks, and power/ground routing must be provided from both sides of the layout area. Standard-cell based designs may consist of several such macro-blocks, each corresponding to a specific unit of the system architecture such as ALU, control logic, etc.

[Click to enlarge image]

Figure-1.23: Simplified floorplan consisting of two separate blocks and a common signal bus.

After chip logic design is done using standard cells in the library, the most challenging task is to place individual cells into rows and interconnect them in a way that meets stringent design goals in circuit speed, chip area, and power consumption. Many advanced CAD tools for place-and-route have been developed and used to achieve such goals. Also from the chip layout, circuit models which include interconnect parasitics can be extracted and used for timing simulation and analysis to identify timing critical paths. For timing critical paths, proper gate sizing is often practiced to meet the timing requirements. In many VLSI chips, such as microprocessors and digital signal processing chips, standard-cells based design is used to implement complex control logic modules. Some full custom chips can be also implemented exclusively with standard cells.

Finally, Fig. 1.24 shows the detailed mask layout of a standard-cell-based chip with an uninterrupted single block of cell rows, and three memory banks placed on one side of the chip. Notice that within the cell block, the separations between neighboring rows depend on the number of wires in the routing channel between the cell rows. If a high interconnect density can be achieved in the routing channel, the standard cell rows can be placed closer to each other, resulting in a smaller chip area. The availability of dedicated memory blocks also reduces the area, since the realization of memory elements using standard cells would occupy a larger area.

[Click to enlarge image]

Figure-1.24: Mask layout of a standard-cell-based chip with a single block of cells and three memory banks.

1.5.4 Full Custom Design

Although the standard-cells based design is often called full custom design, in a strict sense, it is somewhat less than fully custom since the cells are pre-designed for general use and the same cells are utilized in many different chip designs. In a fuller custom design, the entire mask design is done anew without use of any library. However, the development cost of such a design style is becoming prohibitively high. Thus, the concept of design reuse is becoming popular in order to reduce design cycle time and development cost. The most rigorous full custom design can be the design of a memory cell, be it static or dynamic. Since the same layout design is replicated, there would not be any alternative to high density memory chip design. For logic chip design, a good compromise can be achieved by using a combination of different design styles on the same chip, such as standard cells, data-path cells and PLAs. In real full-custom layout in which the geometry, orientation and placement of every transistor is done individually by the designer, design productivity is usually very low - typically 10 to 20 transistors per day, per designer.

In digital CMOS VLSI, full-custom design is rarely used due to the high labor cost. Exceptions to this include the design of high-volume products such as memory chips, high- performance microprocessors and FPGA masters. Figure 1.25 shows the full layout of the Intel 486 microprocessor chip, which is a good example of a hybrid full-custom design. Here, one can identify four different design styles on one chip: Memory banks (RAM cache), data-path units consisting of bit-slice cells, control circuitry mainly consisting of standard cells and PLA blocks.

[Click to enlarge image]

Figure-1.25: Mask layout of the Intel 486 microprocessor chip, as an example of full-custom design.

Divisha

Wednesday, 26 December 2012

Design For testability

CMOS FABRICATION TECHNOLOGY AND DESIGN RULES

Chapter 2
CMOS FABRICATION TECHNOLOGY AND DESIGN RULES

2.1 Introduction

2.2 Fabrication Process Flow - Basic Steps

2.3 The CMOS n-Well Process

2.4 Advanced CMOS Fabrication Technologies

2.5 Layout Design Rules

INTRODUCTION TO VLSI SYSTEMS

Chapter 1
INTRODUCTION TO VLSI SYSTEMS

1.1 Historical Perspective

1.2 VLSI Design Flow

1.3 Design Hierarchy

1.4 Concepts of Regularity, Modularity and Locality