To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Teaching fundamental design concepts and the challenges of emerging technology, this textbook prepares students for a career designing the computer systems of the future. Self-contained yet concise, the material can be taught in a single semester, making it perfect for use in senior undergraduate and graduate computer architecture courses. This edition has a more streamlined structure, with the reliability and other technology background sections now included in the appendix. New material includes a chapter on GPUs, providing a comprehensive overview of their microarchitectures; sections focusing on new memory technologies and memory interfaces, which are key to unlocking the potential of parallel computing systems; deeper coverage of memory hierarchies including DRAM architectures, compression in memory hierarchies and an up-to-date coverage of prefetching. Practical examples demonstrate concrete applications of definitions, while the simple models and codes used throughout ensure the material is accessible to a broad range of computer engineering/science students.
Master the art of data converter design with this definitive textbook, a detailed and accessible introduction ideal for students and practicing engineers. Razavi's distinctive and intuitive pedagogical approach, building up from elementary components to complex systems. Step-by-step transistor-level designs and simulations offer a practical hands-on understanding of key design concepts. Comprehensive coverage of essential topics including sampling circuits, comparator design, digital-to-analog converters, flash topologies, SAR and pipelined architectures, time-interleaved converters, and oversampling systems. Over 250 examples pose thought-provoking questions, reinforcing core concepts and helping students develop confidence. Over 350 end-of-chapter homework problems to test student understanding, with solutions available for course instructors. Developed by leading author Behzad Razavi, and addressing all the principles and design concepts essential to today's engineers, this is the ideal text for senior undergraduate and graduate-level students and professional engineers who aspire to excel in data converter analysis and design.
This chapter reviews techniques to address the processor memory speed gap. We start with concepts behind modern memory hierarchies: the principle of locality of accesses, coherence in the memory hierarchy, and cache and memory inclusion. We then review the architecture of main memory systems, including the architecture of DRAM devices and DRAM systems. This is followed by concepts of cache hierarchies, including cache mappings and access, replacements and write policies, and classification of cache misses. We cover techniques needed to cope with processors exploiting high degrees of instruction-level parallelism, including lockup-free caches, cache prefetching, and preloading. The chapter reviews data compression in the memory hierarchy to allow for higher memory capacity and effective bandwidth. Finally, the chapter covers hardware support for virtual memory, page tables and translation lookaside buffers, and virtual address caches.
This chapter is devoted to design principles of multiprocessor systems, focusing on two architectural styles: shared-memory and message-passing. Both styles use multiple processors with to achieve a linear speedup of computational power with the number of processors but differ in the method of data exchange. Processors in shared-memory multiprocessors share the same address space and can exchange data through shared-memory locations by regular load and store instructions. This chapter reviews the programming model abstractions for shared-memory and message-passing multiprocessors, then the semantics of message-passing primitives, the protocols needed, and architectural support to accelerate message processing. It covers support of a shared-memory model abstraction by reviewing the concept of cache coherence, the design space of snoopy-cache coherence protocols, classification of communication events, and translation-lookaside buffer consistency strategies. Scalable models of shared memory are treated, with an emphasis on the design of cache coherence solutions that can be applied at a large scale as well as the software techniques to deal with page mappings to exploit locality.
For the past 30 years we have lived through the information revolution, powered by the explosive growth of semiconductor integration and the internet. The exponential performance improvement of semiconductor devices was predicted by Moore’s law as early as the 1960s. Moore’s law predicts that the computing power of microprocessors will double every 18-24 months at constant cost so that their cost-effectiveness (the ratio between performance and cost) will grow at an exponential rate. It has been observed that the computing power of entire systems also grows at the same pace. This law has endured the test of time and remains valid today. This law will be tested repeatedly, both now and in the future, as many people today see strong evidence that the "end of the ride" is near, mostly because the miniaturization of CMOS technology is rapidly reaching its limit. This chapter reviews technology trends underpinning the evolution of computer systems. It also introduces metrics for performance comparison of computer systems and fundamental laws that drive the field of computer systems such as Amdahl’s law.
This chapter is dedicated to the correct and reliable communication of values in shared-memory multiprocessors. Correctness properties of the memory system of shared-memory multiprocessors include coherence, the memory consistency model, and the reliable execution of synchronization primitives. Since CMPs are designed as shared-memory multi-core systems, this chapter targets correctness issues not only in symmetric multiprocessors (SMPs) or large-scale cache coherent distributed shared-memory systems, but also in CMPs with core multi-threading. The chapter reviews the hardware components of a shared-memory architecture and why memory correctness properties are so hard to enforce in modern shared-memory multiprocessor systems. We then treat various levels of coherence and the difference between plain memory coherence and store atomicity. We introduce memory models and sequential consistency, the most fundamental memory model, enforcing sequential consistency by store synchronization. Finally, we review thread synchronization and ISA-level synchronization primitives and relaxed memory models based on hardware efficiency and relaxed memory models relying on synchronization.
The chapter also covers compiler-centric approaches to build computers known as VLIW computers. Apart from reviewing the design principles of VLIW pipelines, we also review compiler techniques to uncover instruction-level parallelism, including loop unrolling, software pipelining, and trace scheduling. Finally, this chapter covers vector machines.
The instruction set is the interface between the hardware and the software and must be followed meticulously when designing a computer. This chapter starts with introducing the instruction set of a computer. A basic instruction set is used throughout the book. This instruction set is broadly inspired by the MIPS instruction set, a rather simple instruction set which is representative of many instruction sets such as ARM and RISC V. We then review how one can support a representative instruction set with the concept of static pipelining. We start with reviewing a simple 5-stage pipeline and all issues involved in avoiding hazards. This simple pipeline is gradually augmented to allow for higher instruction execution rates including out-of-order instruction completion, superpipelining, and superscalar designs.
Given the widening gaps between processor speed, main memory (DRAM) speed, and secondary memory (disk) speed, it has become more and more difficult in recent years to feed data and instructions at the speed required by the processor while providing the ever-expanding memory space expected by modern applications.
In prior chapters we discussed how Dennard’s scaling combined with Moore’s law has resulted in continuous increase in single-threaded performance, through innovations to exploit instruction-level parallelism (ILP). Designs such as out-of-order (OoO) execution and speculation have been used to exploit the scaling properties of transistors. Recently, Dennard’s voltage scaling has hit its limits, with the supply voltage reduction coming to a near halt. Thus, power density grows as more transistors are integrated into a unit area. In fact, Moore’s law scaling seem to keep its momentum, leading to billions of transistors being integrated into chips. Overall, it is fair to say that the density of transistors has been scaling faster than power density. Recognizing this concern, the chip industry has shifted (at least partially) emphasis toward multi- and even many-core chip multiprocessors (CMPs). While scaling frequency has a cubic relationship to power consumption, scaling the cores has a linear relationship to the power. Graphics processing units (GPUs) have emerged as a promising many-core architectures for power-efficient throughput computing. With thousands of simple in-order cores that can run thousands of threads in parallel, GPUs derive several tera-flops of peak performance, primarily through thread-level parallelism (TLP).