To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Computers are used more and more to provide high-quality and reliable products and services, and to control and optimise production processes. Such computers are often embedded into the products and thus hidden to the human user. Examples are computer-controlled washing machines or gas burners, electronic control units in cars needed for operating airbags and braking systems, signalling systems for high-speed trains, or robots and automatic transport vehicles in industrial production lines.
In these systems the computer continuously interacts with a physical environment or plant. Such systems are thus called reactive systems. Moreover, common to all these applications is that the computer reactions should obey certain timing constraints. For example, an airbag has to unfold within milliseconds, not too early and not too late. Reactive systems with such constraints are called real-time systems. They often appear in safety-critical applications where a malfunction of the controller will cause damage and risk the lives of people. This is immediately clear for all applications in the transport sector where computers control cars, trains and planes.
Therefore the design of real-time systems requires a high degree of precision. Here formal methods based on mathematical models of the system under design are helpful. They allow the designer to specify the system at different levels of abstraction and to formally verify the consistency of these specifications before implementing them. In recent years significant advances have been made in the maturity of formal methods that can be applied to real-time systems.
Structure of this book
In this advanced textbook we shall present three such formal approaches:
Duration Calculus (DC for short), a logic and calculus for specifying highlevel requirements of real-time systems;
Timed automata were introduced by R. Alur and D. Dill as an operational model of real-time systems. In their simplest form timed automata extend classical finite automata, having only finitely many control states, by clock variables ranging over the non-negative real numbers (continuous time). Constraints on the values of the clock variables serve as guards of the transitions and as invariants in the control states. Timed automata can be combined into networks by using parallel composition and restriction operators of process algebras like CCS or CSP. One of the most important results on timed automata is that it is decidable whether a given control state is reachable. This led to the development of several tools for the automatic verification of behavioural properties of timed automata. Here we shall present in more detail the tool UPPAAL.
Timed automata
Timed automata engage in transitions from locations to locations when certain timing conditions are satisfied. These transitions either perform input and output actions on channels that will synchronise with other timed automata working in parallel or they perform internal actions that are invisible from the outside.
As a first contact with timed automata let us look at an example.
Example 4.1 (Light controller)
We wish to model a light controller with the following behaviour. Initially, the light is off. When the switch is pressed once, the light goes on (into a dim mode). If the switch is pressed twice quickly the light gets bright. Otherwise, if the switch is pressed only after a while the light goes off again.
This book is about the design of certain kinds of reactive systems. A reactive system interacts with its environment by reacting to inputs from the environment with certain outputs. Usually, a reactive system is not supposed to stop but should be continuously ready for such interactions. In the real world there are plenty of reactive systems around. A vending machine for drinks should be continuously ready for interacting with its customers. When a customer inputs suitable coins and selects “coffee” the vending machine should output a cup of hot coffee. A traffic light should continuously be ready to react when a pedestrian pushes the button indicating the wish to cross the street. A cash machine of a bank should continuously be ready to react to customers' desire for extracting money from their bank account.
Reactive systems are seen in contrast to transformational systems, which are supposed to compute a single input–output transformation that satisfies a certain relation and then terminate. For example, such a system could input two matrices and compute its product.
We wish to design reactive systems that interact in a well-defined relation to the real, physical time. A real-time system is a reactive system which, for certain inputs, has to compute the corresponding outputs within given time bounds. An example of a real-time system is an airbag. When a car is forced into an emergency braking its airbag has to unfold within 300 milliseconds to protect the passenger's head. Thus there is a tight upper time bound for the reaction. However, there is also a lower time bound of 100 milliseconds.
In the earlier chapters of this book we have seen that domino logic is intrinsically faster than static logic. The logic family is, however, more complex to use since every cell is clocked. Furthermore, the cell outputs are only valid during the evaluate phase, with the precharge phase resetting the cell. With domino logic the designer has to consider not only the logical functionality of the circuit, but also the clocking scheme. Domino logic design has traditionally only been available to those design groups who have an absolute need for high speed and can afford to utilize large numbers of engineers to handcraft circuits using this design style. This approach to domino logic design has meant that design productivity associated with the use of domino logic, measured in terms of cost and turnaround time (TAT, the time needed to complete a task) has lagged that of automated static logic. While the quality-of-results (QoR) generally improves with custom design, this may still lead to an unfavorable tradeoff in terms of cost versus benefit. For many design groups a fully automated solution provides adequate or close to adequate results.
The dynamic behavior of domino logic is part of the challenge in using it. At high speeds the clock and data are involved in a complex timing interplay which must be resolved correctly for proper functionality. The data for every domino cell must be propagated before the precharge signal arrives.
In 1989 a forward-looking paper attempted to determine the characteristics of microprocessors in the year 2000. Called “Microprocessors circa 2000”, the paper hypothesized that a high-performance microprocessor in the year 2000 would have an area of 1 square inch (645 sq mm), contain 50 million transistors, and run at above 250 MHz [1]. The overall performance of the microprocessor was estimated at 2000 million instructions per second (MIPS), achieved by the employment of two or three cores, each with a performance of 750 MIPS. Forward-looking papers often have somewhat fanciful conceits of future developments, illustrating the witticism that predictions tend to be difficult if they involve the future. This prediction, however, was based on many years of microprocessor development, leading to a broadly accurate prediction of things to come. The International Solid State Circuit Conference (ISSCC), held in early 2000, presented a number of microprocessors whose transistor counts and area were within 2× of the prediction. Since much of the area of a microprocessor is composed of on-chip memory, the prediction for transistor count was achieved soon afterwards. The prediction of 2000 MIPS for the maximum performance of the system also proved to be accurate. The interesting discrepancy was in the way that the performance of the microprocessor was achieved. Instead of employing a number of processors operating at 250 MHz, most high end microprocessors were single core designs running at or above 1 GHz.
We start our discussions on designing a domino logic library by reviewing the answer to two classical results on sizing static CMOS inverters. While static and domino logic are different circuit families, they are both CMOS digital design styles, with the insight provided by studying static inverters being useful in understanding the general needs required for any library. The first issue relates to how the transistor sizes in inverters should scale to achieve a fast delay through a series of inverters driving a large capacitor. For example, if the first inverter has PMOS and NMOS transistor widths of 2 and 1 μm, what should the transistor sizes be in the next inverter? It seems obvious that the next inverter should have larger transistor sizes to ensure that the final inverter is strong enough to quickly drive the large load. The question that arises is how the transistor sizes should scale from one inverter to the next to minimize total delay. If the next inverter's transistor size increases quickly, it will heavily load down the inverter driving it. This will lead to a large delay. If, on the other hand, there is only a small increase in size between adjacent inverters then a very large number of cells are needed. Again, this will cause a large delay. The inverter sizing question leads us to think how different drives need to be sized.
This book stems from my experience over the last few years in designing high-speed digital logic using ASIC design flows. I discovered that while it is possible to significantly improve performance in ASIC implementations with deep pipelining and careful physical design, a speed penalty still had to be paid due to their exclusive use of static logic. This spurred an interest in using domino logic with automated synthesis and place and route tools. This book documents my experiences in automating the use of domino logic, and shows that despite the challenges entailed in the process, it is possible to use domino logic with industry-standard ASIC tools and achieve a significant speed improvement in the process.
Engineering is a group activity. The development of our domino logic synthesis system was possible due to the collaboration of many intelligent, enthusiastic, and dedicated co-workers whose contributions I must acknowledge. First of all I would like to thank my two chapter co-authors, Tommy Zounes and Bernard Bourgin. In addition to being gifted and hard-working engineers, Tommy and Bernard have also always been very generous with their knowledge and time, allowing all of their co-workers, including me, to learn a great deal from them. The domino logic library was possible due to the talents and efforts of Scott Anderson, Shaun Forsting, Judy Alvarez-Gallardo, Roger Boates, Michael Lin, and Juneho Park, who helped design the schematics and also contributed to the myriad other tasks involved with taping out a number of chips.
By the late 1970s complementary metal oxide semiconductor (CMOS) started to become the process of choice for digital semiconductor designs. CMOS had originally been proposed by Frank Wanlass in 1963 as a low standby power technology, since CMOS logic gates dissipate almost no power when the inputs to the gate do not change [1]. This follows as CMOS contains both PMOS field effect transistors (FETs), which can efficiently drive a high voltage, or logic one value, and NMOS transistors, which are good at driving a zero voltage. The presence of complementary transistors allows CMOS logic gates to be implemented so that the output voltage level is connected to the power or ground line, but not both. This ability to avoid contention ensures that if the inputs are not changing, then no power is dissipated. This was a major advantage of CMOS over the other manufacturing processes then available, which dissipated constant leakage or bias currents.
In Figure 1.1 the schematic representation of a CMOS static NAND logic gate is shown. The logic gate has two inputs A and B. A high logic value at inputs A and B turns on transistors MN1 and MN2, while turning off transistors MP1 and MP2. This causes the output Z to be low. When either input A or B is off, however, the path to the ground line is ruptured, with a path to the power supply (by convention called Vdd) being established. This causes Z to rise.
Previous chapters in this book have been devoted to the design of domino logic standard cells and methods to synthesize logic using them. In this chapter we describe some example circuits implemented using different automated domino logic design flows. Since the primary benchmark for synthesizable domino logic is against synthesizable static logic, comparisons are provided between the two. Silicon-measured data is also provided wherever it is available.
Domino integer execution unit
A typical application for high-speed logic is in the execution units of microprocessors. Execution units are the main arithmetic modules in processors, performing integer or floating point arithmetic. In order to understand the speed advantages possible with domino logic, we decided to build a simple integer execution unit. The block has an adder, a shifter, a multiplier, and a bit operations unit. Memory modules interact closely with execution units, to provide data and instructions. For this design two 32-entry, 32-bit wide register files are used in each execution unit. One register file supplies the 32-bit wide data operands that are applied to the datapath modules and stores the result. The other register keeps a simple set of instructions. These instructions allow the data operations to start and stop. They also determine the operations to be performed and the data memory locations to be accessed.
A schematic representation of the execution unit data flow is shown in Figure 5.1. Operation starts via instructions sent from the instruction register file. Each arithmetic function receives operands from the data register file.
Digital ASIC design methodologies are now mature technologies. While EDA tools continue to progress and improve, the basic algorithms on which they are based have been well optimized. In addition, the high-speed needs in an ASIC often tend to be focused on small or medium-sized blocks of logic, while the current focus for EDA tools is on dealing with the massive complexity of systems on-chip. Static logic libraries, like EDA tools, have also improved in the last few years, especially with the introduction of pulse-based flip-flops [1, 2]. Beyond that there does not appear to be very much one can do to improve performance significantly beyond the incremental work of increasing the number of cells and type of libraries provided for the synthesis tool. This is common for many maturing industries, where once the low-hanging fruit has been picked further improvements require considerable effort, often for limited gain.
Before the reader decides to accept the limitations in ASIC design flows with the calm serenity with which it is best to accept the unalterable frailties of the human condition, and other such phenomena, it is perhaps useful to remember that custom designs still remain significantly faster than ASIC implementations in the same process generation [3]. This suggests that there still remains scope for further speed improvements in ASIC flows by using custom design techniques.