To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
During the last few years, wireless data traffic has skyrocketed, driven mainly by a large penetration of smart phones and devices. In 2013, an exabyte of data traveled across the global mobile network monthly [1]. By 2020, data traffic served by such networks is expected to increase by up to a factor of 100, including traffic generated by the widespread adoption of device–device (D2D) and the Internet of Things (IoT) connected via machine–machine (M2M) communications. It is widely recognized that this general trend toward more explosive growth may accelerate even further in future, and how to meet such a demand has been one of the most active and rapidly growing areas in the wireless communication community in the past decade in terms of both academic and industrial research and development [2].
Facing the unprecedented challenge, the wireless communication community has considered many candidate solutions. A significant portion of these are focused on increasing the communication resources, e.g., deploying more network nodes, which leads to densification of existing networks, utilizing wider bandwidth, increasing antenna numbers, and employing additional resources to offload. Among them, the dense network approach stands out for its high scalability of providing magnitudes of capacity increase. Extensive research has been devoted to dense networks (see e.g., [3–8] and references therein).
Indeed, commercial wireless networks are already becoming denser and dense-network deployment will be a critical factor (together with other solutions) to meet the ever-increasing traffic demand. The trends for traffic and network-density growth over a 20-year span are illustrated in Figure 5.1.
In Figure 5.1, the network densities for the years 2000 to 2015 were estimated from 3GPP publications (e.g., [3, 9], etc.), whereas the network density for year 2020 is a projection based on historic data and recent trends. In more detail, around year 2000, sparse 3G network deployments of macro base stations covered wide areas with typical cell radii of several kilometers. Starting from around 2005, the network density increased to about 10 to 20 nodes/km2 and cell radii shrunk to between one kilometer and several hundreds of meters, according to the study in 3GPP LTE Rel-8/9; however, macro eNBs (evolved NodeB, also known as base stations, BS, BTS, etc.) were still the main focus.
Asynchronous sequential circuits have state that is not synchronized with a clock. Like the synchronous sequential circuits we have studied up to this point, they are realized by adding state feedback to combinational logic that implements a next-state function. Unlike synchronous circuits, the state variables of an asynchronous sequential circuit may change at any point in time. This asynchronous state update – from next state to current state – complicates the design process. We must be concerned with hazards in the next-state function, since a momentary glitch may result in an incorrect final state. We must also be concerned with races between state variables on transitions between states whose encodings differ in more than one variable.
In this chapter we look at the fundamentals of asynchronous sequential circuits. We start by showing how to analyze combinational logic with feedback by drawing a flow table. The flow table shows us which states are stable, which are transient, and which are oscillatory.We then show how to synthesize an asynchronous circuit from a specification by first writing a flow table and then reducing the flow table to logic equations. We see that state assignment is quite critical for asynchronous sequential machines, since it determines when a potential race may occur. We show that some races can be eliminated by introducing transient states.
After the introduction given in this chapter, we continue our discussion of asynchronous circuits in Chapter 27 by looking at latches and flip-flops as examples of asynchronous circuits.
FLOW-TABLE ANALYSIS
Recall from Section 14.1 that an asynchronous sequential circuit is formed when a feedback path is placed around combinational logic, as shown in Figure 26.1(a). To analyze such circuits, we break the feedback path as shown in Figure 26.1(b) and write the equations for the next-state variables as a function of the current-state variables and the inputs. We can then reason about the dynamics of the circuit by exploring what happens when the current-state variables are updated (in arbitrary order if multiple bits change) with their new values.
System-level timing is generally driven by the flow of information through the system. Because this information flows through the interfaces between modules, system timing is tightly tied to interface specification. Timing is determined by how modules sequence information over these interfaces. In this chapter we will discuss interface timing and illustrate how the operation of the overall system is sequenced via these interfaces, drawing on the examples introduced in Chapter 21.
INTERFACE TIMING
Interface timing is a convention for sequencing the transfer of data. To transfer a datum from a source module S to a destination module D, we need to know when the datum is valid (i.e., when the source module S has produced the datum and placed it on its interface pins) and when D is ready to receive the datum (i.e., when D samples the datum from its interface pins). We have already seen examples of interface timing in the interfaces between the modules of factored state machines in Chapter 17. In the remainder of this section, we look at interface timing in more depth.
Always valid timing
As the name implies, an always valid signal (Figure 22.1), is always valid. An interface with only always valid signals does not require any sequencing signals.
It is important to distinguish an always valid signal from a periodically valid signal (Section 22.1.2) with a period of one clock cycle. An always valid signal represents a value that can be dropped or duplicated. A temperature sensor that constantly outputs an eight-bit digital value representing the current temperature is an example of such a signal. We could pass this signal to a module operating at twice the clock rate (duplicating temperature signals) or a module operating at half the clock rate (dropping temperature values), and the output of the module still represents the current temperature (with perhaps a slight lag).
The state interfaces in the Pong game of Figure 21.2 are another example of always valid interfaces. The mode, ballPos, leftPadY, rightPadY, and score signals are all examples of always valid signals. Each of these signals always represents the current value of a state variable.
How fast will an FSM run? Could making our logic too fast cause our FSM to fail? In this chapter, we will see how to answer these questions by analyzing the timing of our finite-state machines and the flip-flops used to build them.
Finite-state machines are governed by two timing constraints – a maximum delay constraint and a minimum delay constraint. The maximum speed at which we can operate an FSM depends on two flip-flop parameters (the setup time and propagation delay) along with the maximum propagation delay of the next-state logic. On the other hand, the minimum delay constraint depends on the other two flip-flop parameters (hold time and contamination delay) and the minimum contamination delay of the next-state logic. We will see that if the minimum delay constraint is not met, our FSM may fail to operate at any clock speed due to hold-time violations. Clock skew, the delay between the clocks arriving at different flip-flops, affects both maximum and minimum delay constraints.
PROPAGATION AND CONTAMINATION DELAY
In a synchronous system, logic signals advance from the stable state at the end of one clock cycle to a new stable state at the end of the next clock cycle. Between these two stable states, they may go through an arbitrary number of transitions.
In analyzing timing of a logic block we are concerned with two times. First, we would like to know for how long the output retains its initial stable value (from the last clock cycle) after an input first changes (in the new clock cycle).We refer to this time as the contamination delay of the block – the time it takes for the old stable value to become contaminated by an input transition. Note that this first change in the output value does not in general leave the output in its new stable state. The second time we would like to know is how long it takes the output to reach its new stable state after the input has stopped changing. We refer to this time as the propagation delay of the block – the time it takes for the stable value of the input to propagate to a stable value at the output.
A pipeline is a sequence of modules, called stages, that each perform part of an overall task. Each stage is like a station along an assembly line – it performs a part of the overall assembly and passes the partial result to the next stage. By passing the incomplete task down the pipeline, each stage is able to start work on a new task before waiting for the overall task to be completed. Thus, a pipeline may be able to perform more tasks per unit time (i.e., it has greater throughput) than a single module that performs the whole task from start to finish.
The throughput, or work done per unit time, of a pipeline is that of the worst stage. Designers must load balance pipelines to avoid idling and wasting resources. Stages that take a variable latency can stall all upstream stages to prevent data propagating down the pipeline. Queues can be used to make a pipeline elastic and tolerate latency variance better than global stall signals.
BASIC PIPELINING
Suppose you have a factory that assembles toy cars. Assembling each car takes four steps. In step 1 the body is shaped from a block of wood. In step 2 the body is painted. In step 3 the wheels are attached. Finally, in step 4, the car is placed in a box. Suppose each of the four steps takes 5 minutes. With one employee, your factory can assemble one toy car every 20 minutes. With four employees your factory can assemble one car every 5 minutes in one of two ways. You could have each employee perform all four steps – producing a toy car every 20 minutes. Alternatively, you could arrange your employees in an assembly line with each employee performing one step and passing partially completed cars down to the next employee.
In a digital system, a pipeline is like an assembly line. We take an overall task (like building a toy car) and break it into subtasks (each of the four steps).We then have a separate unit, called a pipeline stage (like each employee along the assembly line), performing each task. The stages are tied together in a linear manner so that the output of each unit is the input of the next unit – like the employees along the assembly line passing their output (partially assembled cars) to the next employee down the line.
In this appendix we provide a summary of the VHDL syntax employed in this book. Before using this appendix you should first have read the introductions to VHDL in Section 1.5 and Section 3.6. In addition to this appendix you may find the subject index at the end of this book helpful when looking for information about specific VHDL syntax features. Excellent references documenting the complete VHDL language can be found elsewhere [3, 55]. However, due to the complexity of the VHDL language such references tend to lack detailed discussion of hardware design topics. An abridged summary of the key aspects of VHDL syntax, such as that found in this appendix, can be very helpful when learning hardware design.
This book uses VHDL syntax features from the most recent standard, VHDL-2008, that enable greater designer productivity and are supported by the FPGA CAD tools typically used in introductory courses on digital design. Many CAD tools by default still assume an earlier version of VHDL even though they have support for VHDL-2008. Hence, you should consult your CAD tool's documentation to learn how to enable support for VHDL-2008 before trying the examples in this book.
To keep descriptions brief yet precise Extended Backus-Naur Form (EBNF) is employed in this appendix. Non-terminals are surrounded by angle brackets (“<” and “>”) and definitions of non-terminals are denoted by the symbol “::=”. A list of choices is separated by a pipe symbol (“|”) and the interpretation is that only one of the items should appear. Zero or more repetitions of a construct are indicated by surrounding the construct with curly braces (“{” and “}”). An optional construct (i.e., zero or one instances) is indicated by surrounding it with square brackets (“[” and “]”). The EBNF descriptions in this appendix are simplified versions of those found in the VHDL language standard [55]. The simplified EBNF descriptions here correspond to the VHDL syntax subset commonly used for synthesis.
The use of hardware description languages (HDLs) for hardware design differs from the use of programming languages for software development. Software is implemented by converting a program written in a programming language into computer instructions that then appear to execute one at a time.
Implementing the next-state and output logic of a finite-state machine using a memory array gives a flexible way of realizing an FSM. The function of the FSM can be altered by changing the contents of the memory. We refer to the contents of the memory array as microcode, and a machine realized in this manner is called a microcoded FSM. Each word of the memory array determines the behavior of the machine for a particular state and input combination, and is referred to as a microinstruction.
We can reduce the required size of a microcode memory by augmenting the memory with special logic to compute the next state, and by selectively updating infrequently changing outputs. A single microinstruction can be provided for each state, rather than one for each state × input combination, by adding an instruction sequencer and a branch microinstruction to cause changes in control flow. Bits of the microinstruction can be shared by different functions by defining different microinstruction types for control, output, and other functions.
SIMPLE MICROCODED FSM
Figure 18.1 shows a block diagram of a simple microcoded FSM. A memory array holds the next-state and output functions. Each word of the array holds the next state and output for a particular combination of input and current state. The array is addressed by the concatenation of the current state and the inputs. A pair of registers holds the current state and current output.
In practice, the memory could be realized as a RAM or EEPROM allowing software to reprogram the microcode. Alternatively, the memory could be a ROM. With a ROM, a new mask set is required to reprogram the microcode. However, this is still advantageous because changing the program of the ROM does not otherwise alter the layout of the chip. Some ROM designs even allow the program to be changed by altering only a single metal-level mask – reducing the cost of the change. Some designs take a hybrid approach, putting most of the microcode into ROM (to reduce cost) but keeping a small portion of microcode in RAM. A method is provided to redirect an arbitrary state sequence into the RAM portion of the microcode to allow any state to be patched using the RAM.
In cellular networks, handover refers to the mechanism by which the set of radio links between an active mode mobile device and base station cells is modified. Mobility in the idle mode (when the mobile has no data bearers established and is not transmitting or receiving user plane traffic), termed cell selection/reselection, typically ensures that the UE selects the strongest available cell in preparation for an outgoing or incoming call/data session. Handover reliability is a key performance indicator (KPI) since it directly impacts the perceived quality of experience (QoE) of the end user. In contrast, cell reselection is less important since no bearers are established and suboptimal performance is apparent only on call establishment and as a signaling cost to the network operator. For this reason, the remainder of this chapter focuses on handover.
In GSM and LTE the mobile supports only a single radio link such that the handover swaps this link from one cell (the serving cell) to another (the target cell). In WCDMA, however, multiple links (on the same frequency) may be established (known as “soft handover”). Handovers can be classified as:
• intra-RAT, meaning within the same radio access technology (RAT), for example, LTE to LTE
• intra-frequency (serving and target cells are on the same frequency)
• inter-frequency (serving and target cells are not on the same frequency)
• inter-RAT
• between cells of different RATs.
Handover may be triggered for a number of reasons:
• to maintain the connectivity of the mobile and support data transfer (often called a “coverage handover”)
• to balance the loading of cells with overlapping coverage or to handover a mobile between overlapping cells to ensure data rates demanded by an ongoing service are met (often called a “vertical handover”).
Vertical handovers target stationary mobiles, implying that the radio conditions of links to serving and target cells are relatively stable. More challenging are coverage handovers that result from the motion of the mobile, leaving the coverage of the serving cell and entering that of the target cell. Since indoor users are usually stationary, the focus of coverage handovers is on outdoor users, on foot or in vehicles.
We should forget about small efficiencies, say about97% of the time: premature optimization is the root of all evil.
— DONALD E. KNUTH
When you first start to program your emphasis is usually on correctness, that is, getting your programs to run and return accurate and error-free results - and rightly so. There is little point in trying to speed up a program that returns incorrect answers! You develop your programs, prototyping with simple inputs so that you can see at a glance how things are progressing. At some point in the development process you start to increase the size or complexity of the inputs to your program and, if all goes well, the program scales well. But commonly, there are bottlenecks at various stages of the computation that slow things down; or there may be a large increase in the amount of memory needed to represent or store an expression or result. Some of these situations may be unavoidable, but often you can find optimizations that improve the efficiency and running time of your programs. This chapter introduces some of the optimization principles to think about both during the development process and after your programs are complete and you are satisfied that they produce the desired output.
How, you might ask, does one quantify efficiency? There are two measures we will focus on - timing and memory footprint. The importance of these two measures is highly subjective. Squeezing another tenth of a second out of a computation that is only going to be run once or twice does not make a lot of sense. However, if that computation is part of a loop that is going to be evaluated thousands of times, little things really start to add up. You will be the best judge of where to focus your efforts.
In this chapter we will see how to build logic circuits (gates) using complementary metal– oxide–semiconductor (CMOS) transistors. We start in Section 4.1 by examining how logic functions can be realized using switches. A series combination of switches performs an AND function while a parallel combination of switches performs an OR function. We can build up more complex switch-logic functions by building more complex series–parallel switch networks.
In Section 4.2 we present a very simple switch-level model of an MOS transistor. CMOS transistors come in two flavors: NMOS and PMOS. For purposes of analyzing the function of logic circuits, we consider an NMOS transistor to be a switch that is closed when its gate is a logic “1” and that passes only a logic “0.” A PMOS transistor is complementary – it is a switch that is closed when its gate is a logic “0” and that passes only a logic “1.” To model the delay and power of logic circuits (which we defer to Chapter 5), we add a resistance and a capacitance to our basic switch. This switch-level model is much simpler than the models used for MOS circuit design, but is perfectly adequate to analyze the functionality and performance of digital logic circuits.
Using our switch-level model, we see how to build gate circuits in Section 4.3 by building a pull-down network of NMOS transistors and a complementary pull-up network of PMOS transistors. A NAND gate, for example, is realized with a series pull-down network of NMOS transistors and a parallel pull-up network of PMOS transistors.
SWITCH LOGIC
In digital systems we use binary variables to represent information and switches controlled by these variables to process information. Figure 4.1 shows a simple switch circuit. When binary variable a is false (0), Figure 4.1(a), the switch is open and the light is off. When a is true (1), the switch is closed, current flows in the circuit, and the light is on (Figure 4.1(b)).
Before we dive into the technical details of digital system design, it is useful to take a high-level look at the way systems are designed in industry today. This will allow us to put the design techniques we learn in subsequent chapters into the proper context. This chapter examines four aspects of contemporary digital system design practice: the design process, implementation technology, computer-aided design tools, and technology scaling.
We start in Section 2.1 by describing the design process – how a design starts with a specification and proceeds through the phases of concept development, feasibility studies, detailed design, and verification. Except for the last few steps, most of the design work is done using English-language documents. A key aspect of any design process is a systematic – and usually quantitative – process of managing technical risk.
Digital designs are implemented on very-large-scale integrated (VLSI) circuits (often called chips) and packaged on printed-circuit boards (PCBs). Section 2.2 discusses the capabilities of contemporary implementation technology.
The design of highly complex VLSI chips and boards is made possible by sophisticated computer-aided design (CAD) tools. These tools, described in Section 2.3, amplify the capability of the designer by performing much of the work associated with capturing a design, synthesizing the logic and physical layout, and verifying that the design is both functionally correct and meets timing.
Approximately every two years, the number of transistors that can be economically fabricated on an integrated-circuit chip doubles. We discuss this growth rate, known as Moore's law, and its implications for digital systems design in Section 2.4.
THE DESIGN PROCESS
As in other fields of engineering, the digital design process begins with a specification. The design then proceeds through phases of concept development, feasibility, partitioning, and detailed design. Most texts, like this one, deal with only the last two steps of this process. To put the design and analysis techniques we will learn into perspective, we will briefly examine the other steps here. Figure 2.1 gives an overview of the design process.
After you have developed several programs for some related tasks, you will find it convenient to group them together and make them available as a cohesive whole. Packages are designed to make it easy to distribute your programs to others, but they also provide a framework for you to write programs that integrate with Mathematicaseamlessly.
A package is simply a text file containing Mathematicacode. Typically you put related functions in a package. So there might be a computational geometry package or a random walks package that includes functions in support of those tasks. The package framework includes a name-localizing construct, analogous to Module, but for entire files of definitions. The idea is to allow you, the programmer, to define a collection of functions for export.These exported functions are what the users of your package will work with and are often referred to as publicfunctions. Other functions, those that are not for export,are auxiliary, or privatefunctions, and are not intended to be accessible to users. The package framework, and contexts specifically, provide a convenient way to declare some functions public and others private. In this chapter we will describe this framework and show how to write, install, and use the packages developed with it.
Working with packages
Loading and using packages
Upon starting a Mathematicasession, the built-in functions are immediately available for you to use. There are, however, many more functions that you can access that reside in files supplied with Mathematica.The definitions in those files are placed in special structures called packages.Indeed, these files themselves are often called “packages” instead of “files.”
Mathematicapackages have been written for many different domains. They are provided with each version of Mathematicaand are referred to as the Standard Extra Packages. Their documentation is available in the Documentation Center (under the Help menu) and they provide a good set of examples for learning about package creation and usage.
The idea of representing an idealized version of an object with something called a pattern is central to mathematics and computer science. For the purposes of search, patterns provide a template with which to compare expressions. They can be used to filter data by selecting only those parts that match the pattern. Because of the wide applicability of patterns, many modern programming languages have extensive pattern-matching capabilities that enable them to identify objects that meet some criteria in order to classify, select, or transform those objects through the use of rules.
Pattern matching in Mathematica is done through a powerful yet flexible pattern language. Used in rules to transform expressions from one form to another, they can be applied to broad classes of expressions or they can be limited to very narrowly-defined objects through the use of conditional and structured pattern matching. Pattern matching is the key to identifying which rules should be applied to expressions that you wish to transform.
If you have used regular expressions in languages such as Perl or Ruby, or via libraries in Java, Python, or C++, then you are already familiar with pattern matching on strings (discussed further in Chapter 7). Mathematica's pattern language generalizes this to arbitrary objects and expressions. Although the syntax may be new to you, with practice it becomes natural, providing a direct connection between the statement of a problem and its expression in a program. This chapter starts with an introduction to patterns and pattern matching and then proceeds to a discussion of transformation rules in which patterns are used to identify the parts of an expression that are to be transformed. The chapter concludes with several concrete examples that make use of pattern matching and transformation rules to show their application to some common programming tasks.
Lists are the key data structure used in Mathematica to group objects together. They share some features with arrays in other languages such as C and Java, but they are more general and can be used to represent a wide range of objects: vectors, matrices, tensors, iterator and parameter specifications, and much more.
Because lists are so fundamental, an extensive set of built-in functions is available to manipulate them in a variety of ways. In this chapter, we start by looking at the structure and syntax of lists before moving on to constructing, measuring, and testing lists. We then introduce some of the built-in functionality used to manipulate lists such as sorting and partitioning. Finally, we will discuss associations, a feature first introduced in Mathematica 10. Associations provide a framework for efficient representation and lookup of large data structures such as associative arrays (for example, a large database of article and book references or a music library).
Many of the things you might wish to do with a list or association can be accomplished using built-in functions and the programming concepts in this book. And most of these operations extend to arbitrary expressions in a fairly natural way, as we will see in later chapters. As such, it is important to have a solid understanding of these functions before going further, since a key to efficient programming in Mathematica is to use the built-in functions whenever possible to manipulate lists and associations as well as general expressions.
Creating and displaying lists
List structure and syntax
The standard input form of a list is a sequence of elements separated by commas and enclosed in curly braces:
{e1, e2, …, en}
Internally, lists are stored in the functional form using the List function with an arbitrary number of arguments.
Mobile device data rates are increasing at effectively Moore's law [1] due to the fact that mobile devices are all integrated in silicon and thus are taking advantage of the reduction in geometry and increase in functionality and the number of transistors per die. The current cellular approach of using large outdoor base station towers to provide mobile broadband via wireless communications, however, does not scale efficiently to cope with the forecast 13-fold increase expected by 2017 [2]. Thus a need for a new approach is required. As we will discuss in this chapter, small cell deployments have the potential to provide a scalable solution to this demand where they are beginning to change the network topology into a so-called heterogeneous network (HetNet) containing a mix of different cell sizes and cell power levels, as shown in Figure 8.1. This will lead to a mix of macro cells, micro cells, pico cells, and femto cells. This deployment is not as uniform as outdoor macro cells with different cell sizes and a much more irregular deployment results.
As we will show in this chapter, small cell network (SCN) deployments provide, for the first time, a low-cost efficient scalable architecture to meet the expected demand. This technology was initially deployed in large scale when femto cells (residential small cells) were first deployed by a number of leading wireless operators (including Vodafone and AT&T). The solution was enabled by two key developments, namely:
• low-cost chip-sets (a so-called “system-on-chip”), which included the entire signal processing and most of the radio software stack, and
• high-speed internet access (greater than Mbps) was available, thus providing the so-called “backhaul” (the connectivity into the network) for these access points.
These developments mean that with volume the effective capital costs of small cells are negligible compared to the cost of deployment and operating costs.
Small cell deployments have a range of topologies and configurations that will depend on the location and the demand requirements. For example, downtown city areas will consist of a combination of small cells where they will exist:
on outdoor light poles to serve “hotspot” traffic needs around cafes and other places where users congregate
in city buildings to serve enterprise customers with high demands on throughput and reliability
in apartments or residential homes to serve private users' needs.