To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The processor and its instruction set are the fundamental components of any architecture because they drive its functionality. In some sense the processor is the “brain” of a computer system, and therefore understanding how processors work is essential to understanding the workings of a multiprocessor.
This chapter first covers instruction sets, including exceptions. Exceptions, which can be seen as a software extension to the processor instruction set, are an integral component of the instruction set architecture definition and must be adhered to. They impose constraints on processor architecture. Without the need to support exceptions, processors and multiprocessors could be much more efficient but would forgo the flexibility and convenience provided by software extensions to the instruction set in various contexts. A basic instruction set is used throughout the book. This instruction set is broadly inspired by the MIPS instruction set, a rather simple instruction set. We adopt the MIPS instruction set because the fundamental concepts of processor organizations are easier to explain and grasp with simple instruction sets. However, we also explain extensions required for more complex instruction sets, such as the Intel x86, as need arises.
Since this book is about parallel architectures, we do not expose architectures that execute instructions one at a time. Thus the starting point is the 5-stage pipeline, which concurrently processes up to five instructions in every clock cycle. The 5-stage pipeline is a static pipeline in the sense that the order of instruction execution (or the schedule of instruction execution) is dictated by the compiler, an order commonly referred to as the program, thread, or process order, and the hardware makes no attempt to re-order the execution of instructions dynamically.
Technology has always played the most important role in the evolution of computer architecture over time and will continue to do so for the foreseeable future. Technological evolution has fostered rapid innovations in chip architecture. We give three examples motivated by performance, power, and reliability. In the past, architectural designs were dictated by performance/cost tradeoffs. Several well-known architectural discoveries resulted from the uneven progress of different technological parameters. For instance, caches were invented during the era when processor speed grew much faster than main memory speed. Recently, power has become a primary design constraint. Since the invention of the microprocessor, the amount of chip realestate has soared relentlessly, enabling an exponential rise of clock frequencies and ever more complex hardware designs. However, as the supply voltage approached its lower limit and power consumption became a primary concern, chip architecture shifted from high-frequency uniprocessor designs to chip multiprocessor architectures in order to contain power growth. This shift from uniprocessor to multiprocessor microarchitectures is a disrupting event caused by the evolution of technology. Finally, for decades processor reliability was a concern primarily for high-end server systems. As transistor feature sizes have shrunk over time they have become more susceptible to transient faults. Hence radiation-hardened architectures have been developed to protect computer systems from single-event upsets causing soft errors.
These examples of the impact of technology on computer design demonstrate that it is critical for a reader of this book to understand the basic technological parameters and features, and their scaling with each process generation.
Interconnection networks are an important component of every computer system. Central to the design of a high-performance parallel computer is the elimination of serializing bottlenecks that can cripple the exploitation of parallelism at any level. Instruction-level and thread-level parallelisms across processor cores demand a memory system that can feed the processor with instructions and data at high speed through deep cache memory hierarchies. However, even with a modest miss rate of one percent and with 100 cycle miss penalty, half of the execution time can be spent bringing instructions and data from memory to processors. It is imperative to keep the latency to move instructions and data between main memory and the cache hierarchy short.
It is also important that memory bandwidth be sufficient. If the memory bandwidth is not sufficient, contention among memory requests elongates the memory-access latency, which, in turn, may affect instruction execution time and throughput. For example, consider a nonblocking cache that has N outstanding misses. If the bus connecting the cache to memory can only transfer one block every T cycles, it takes N × T cycles to service the N misses as opposed to T cycles if the bus can transfer N blocks in parallel.
The role of interconnection networks is to transfer information between computer components in general, and between memory and processors in particular. This is important for all parallel computers, whether they are on a single processor chip – a chip multiprocessor or multi-core – or built from multiple processor chips connected to form a large-scale parallel computer.
Improve design efficiency and reduce costs with this practical guide to formal and simulation-based functional verification. Giving you a theoretical and practical understanding of the key issues involved, expert authors including Wayne Wolf and Dan Gajski explain both formal techniques (model checking, equivalence checking) and simulation-based techniques (coverage metrics, test generation). You get insights into practical issues including hardware verification languages (HVLs) and system-level debugging. The foundations of formal and simulation-based techniques are covered too, as are more recent research advances including transaction-level modeling and assertion-based verification, plus the theoretical underpinnings of verification, including the use of decision diagrams and Boolean satisfiability (SAT).
This new, expanded textbook describes all phases of a modern compiler: lexical analysis, parsing, abstract syntax, semantic actions, intermediate representations, instruction selection via tree matching, dataflow analysis, graph-coloring register allocation, and runtime systems. It includes good coverage of current techniques in code generation and register allocation, as well as functional and object-oriented languages, that are missing from most books. In addition, more advanced chapters are now included so that it can be used as the basis for two-semester or graduate course. The most accepted and successful techniques are described in a concise way, rather than as an exhaustive catalog of every possible variant. Detailed descriptions of the interfaces between modules of a compiler are illustrated with actual C header files. The first part of the book, Fundamentals of Compilation, is suitable for a one-semester first course in compiler design. The second part, Advanced Topics, which includes the advanced chapters, covers the compilation of object-oriented and functional languages, garbage collection, loop optimizations, SSA form, loop scheduling, and optimization for cache-memory hierarchies.
Device testing represents the single largest manufacturing expense in the semiconductor industry, costing over $40 billion a year. The most comprehensive and wide ranging book of its kind, Testing of Digital Systems covers everything you need to know about this vitally important subject. Starting right from the basics, the authors take the reader through automatic test pattern generation, design for testability and built-in self-test of digital circuits before moving on to more advanced topics such as IDDQ testing, functional testing, delay fault testing, memory testing, and fault diagnosis. The book includes detailed treatment of the latest techniques including test generation for various fault models, discussion of testing techniques at different levels of integrated circuit hierarchy and a chapter on system-on-a-chip test synthesis. Written for students and engineers, it is both an excellent senior/graduate level textbook and a valuable reference.
This textbook provides a clear and concise introduction to computer architecture and implementation. Two important themes are interwoven throughout the book. The first is an overview of the major concepts and design philosophies of computer architecture and organization. The second is the early introduction and use of analytic modeling of computer performance. The author begins by describing the classic von Neumann architecture, and then presents in detail a number of performance models and evaluation techniques. He goes on to cover user instruction set design, including RISC architecture. A unique feature of the book is its memory-centric approach - memory systems are discussed before processor implementations. The author also deals with pipelined processors, input/output techniques, queuing modes, and extended instruction set architectures. Each topic is illustrated with reference to actual IBM and Intel architectures. The book contains many worked examples and over 130 homework exercises. It is an ideal textbook for a one-semester undergraduate course in computer architecture and implementation.
This book gives a comprehensive description of the architecture of microprocessors from simple in-order short pipeline designs to out-of-order superscalars. It discusses topics such as:The policies and mechanisms needed for out-of-order processing such as register renaming, reservation stations, and reorder buffers Optimizations for high performance such as branch predictors, instruction scheduling, and load-store speculationsDesign choices and enhancements to tolerate latency in the cache hierarchy of single and multiple processorsState-of-the-art multithreading and multiprocessing emphasizing single chip implementationsTopics are presented as conceptual ideas, with metrics to assess the performance impact, if appropriate, and examples of realization. The emphasis is on how things work at a black box and algorithmic level. The author also provides sufficient detail at the register transfer level so that readers can appreciate how design features enhance performance as well as complexity.
What makes some computers slow? Why do some digital systems operate reliably for years while others fail mysteriously every few hours? How can some systems dissipate kilowatts while others operate off batteries? These questions of speed, reliability, and power are all determined by the system-level electrical design of a digital system. Digital Systems Engineering presents a comprehensive treatment of these topics. It combines a rigorous development of the fundamental principles in each area with real-world examples of circuits and methods. The book not only serves as an undergraduate textbook, filling the gap between circuit design and logic design, but can also help practising digital designers keep pace with the speed and power of modern integrated circuits. The techniques described in this book, once used only in supercomputers, are essential to the correct and efficient operation of any type of digital system.
This textbook describes all phases of a compiler: lexical analysis, parsing, abstract syntax, semantic actions, intermediate representations, instruction selection via tree matching, dataflow analysis, graph-coloring register allocation, and runtime systems. It includes good coverage of current techniques in code generation and register allocation, as well as the compilation of functional and object-oriented languages, that is missing from most books. The most accepted and successful techniques are described concisely, rather than as an exhaustive catalog of every possible variant, and illustrated with actual Java classes. This second edition has been extensively rewritten to include more discussion of Java and object-oriented programming concepts, such as visitor patterns. A unique feature is the newly redesigned compiler project in Java, for a subset of Java itself. The project includes both front-end and back-end phases, so that students can build a complete working compiler in one semester.
This new, expanded textbook describes all phases of a modern compiler: lexical analysis, parsing, abstract syntax, semantic actions, intermediate representations, instruction selection via tree matching, dataflow analysis, graph-coloring register allocation, and runtime systems. It includes good coverage of current techniques in code generation and register allocation, as well as functional and object-oriented languages, that are missing from most books. In addition, more advanced chapters are now included so that it can be used as the basis for a two-semester or graduate course. The most accepted and successful techniques are described in a concise way, rather than as an exhaustive catalog of every possible variant. Detailed descriptions of the interfaces between modules of a compiler are illustrated with actual C header files. The first part of the book, Fundamentals of Compilation, is suitable for a one-semester first course in compiler design. The second part, Advanced Topics, which includes the advanced chapters, covers the compilation of object-oriented and functional languages, garbage collection, loop optimizations, SSA form, loop scheduling, and optimization for cache-memory hierarchies.
Distributed Object Architectures with CORBA is a guide to creating a software architecture comprising distributed components. While it is based on OMG's Common Object Request Broker Architecture (CORBA) standard, the principles also apply to architecture built with other technology (such as Microsoft's DCOM). As ORB products evolve to incorporate new additions to CORBA, the knowledge and experience required to build stable and scalable systems is not widespread. With this book the reader can develop the skills and knowledge that is necessary for building such systems. The book assumes a familiarity with object-oriented concepts and the basics of CORBA. Software developers who are new to building systems with CORBA-based technologies will find this book a useful guide to effective development.
Cellular Nonlinear/neural Network (CNN) technology is both a revolutionary concept and an experimentally proven new computing paradigm. Analogic cellular computers based on CNNs are set to change the way analog signals are processed and are paving the way to an analog computing industry. This unique undergraduate level textbook includes many examples and exercises, including CNN simulator and development software accessible via the Internet. It is an ideal introduction to CNNs and analogic cellular computing for students, researchers and engineers from a wide range of disciplines. Although its prime focus is on visual computing, the concepts and techniques described in the book will be of great interest to those working in other areas of research including modeling of biological, chemical and physical processes. Leon Chua, co-inventor of the CNN, and Tamás Roska are both highly respected pioneers in the field.
This chapter provides a thorough discussion of the issues surrounding the optimization of linear systems. Section 7.2 describes fundamental properties, and presents a list of common linear transforms. Then, Section 7.3 formalizes the problem. Sections 7.4 and 7.5 introduce two important cases of linear system optimization, namely single- and multiple-constant multiplication. While the algorithms to solve these problems are important, they do not fully take advantage of the solution space; thus, they may lead to inferior results. Section 7.6 describes the relationship of these two problems to the linear system optimization problem and provides an overview of techniques used to optimize linear systems. Section 7.7 presents a transformation from a linear system into a set of polynomial expressions. The algorithms presented later in the chapter use this transformation during optimization. Section 7.8 describes an algorithm for optimizing expressions for synthesis using two-operand adders. The results in Section 7.9 describe the synthesis of high-speed FIR filters using this two-operand optimization along with other common techniques for FIR filter synthesis. Then, Section 7.10 focuses on more complex architectures, in particular, those using three-operand adders, and shows how CSAs can speed up the calculation of linear systems. Section 7.11 discusses how to consider timing constraints by modifying the previously presented optimization techniques. Specifically, the section describes ideas for performing delay aware optimization.
This chapter provides a brief summary of the stages in the hardware synthesis design flow. It is designed to give unfamiliar readers a high-level understanding of the hardware design process. The material in subsequent chapters describes different hardware implementations of polynomial expressions and linear systems. Therefore, we feel that it is important, though not necessarily essential, to have an understanding of the hardware synthesis process.
The chapter starts with a high-level description of the hardware synthesis design flow. It then proceeds to discuss the various components of this design flow. These include the input system specification, the program representation, algorithmic optimizations, resource allocation, operation scheduling, and resource binding. The chapter concludes with a case study using an FIR filter. This provides a step-by-step example of the hardware synthesis process. Additionally, it gives insight into the hardware optimization techniques presented in the following chapters.
Hardware synthesis design flow
The initial stages of a hardware design flow are quite similar to the frontend of a software compiler. One of the biggest differences is that the input system specification languages are different. Hardware description languages must deal with many features that are unnecessary in software, which for the most part model execution in a serial fashion. Such features include the need to model concurrent execution of the underlying resources, define a variety of different data types specifically for different bit widths, and introduce some notion of time into the language.Figure 4.1 gives a high-level view of the different stages of hardware compilation.
Arithmetic is one of the old topics in computing. It dates back to the many early civilizations that used the abacus to perform arithmetic operations. The seventeenth and eighteenth centuries brought many advances with the invention of mechanical counting machines like the slide rule, Schickard's Calculating Clock, Leibniz's Stepped Reckoner, the Pascaline, and Babbage's Difference and Analytical Engines. The vacuum tube computers of the early twentieth century were the first programmable, digital, electronic, computing devices. The introduction of the integrated circuit in the 1950s heralded the present era where the complexity of computing resources is growing exponentially. Today's computers perform extremely advanced operations such as wireless communication and audio, image, and video processing, and are capable of performing over 1015 operations per second.
Owing to the fact that computer arithmetic is a well-studied field, it should come as no surprise that there are many books on the various subtopics of computer arithmetic. This book provides a focused view on the optimization of polynomial functions and linear systems. The book discusses optimizations that are applicable to both software and hardware design flows; e.g., it describes the best way to implement arithmetic operations when your target computational device is a digital signal processor (DSP), a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC).
Polynomials are among the most important functions in mathematics and are used in algebraic number theory, geometry, and applied analysis.