To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
There is a perpetual need for faster computation which is unlikely to be ever satisfied. With device technologies hitting physical limits, alternate computational models are being explored. The Big Data phenomenon precedes the coinage of this term by many decades. One of the earliest and natural direction to speed-up computation was to deploy multiple processors instead of a single processor for running the same program. The ideal objective is to speed-up a program p-fold by using p processors simultaneously. A common caveat is that an egg cannot be boiled faster by employing multiple cooks! Analogously, a program cannot be executed faster indefinitely by using more and more processors. This is not just because of physical limitations but dependencies between various fragments of the code, imposed by precedence constraints.
At a lower level, namely, in digital hardware design, parallelism is inherent – any circuit can be viewed as a parallel computational model. Signals travel across different paths and components and combine to yield the desired result. In contrast, a program is coded in a very sequential manner and the data flows are often dependent on each other – just think about a loop that executes in a sequence. Second, for a given problem, one may have to re-design a sequential algorithm to extract more parallelism. In this chapter, we focus on designing fast parallel algorithms for fundamental problems.
A very important facet of parallel algorithm design is the underlying architecture of the computer, viz., how do the processors communicate with each other and access data concurrently. Moreover, is there a common clock across which we can measure the actual running time? Synchronization is an important property that makes parallel algorithm design somewhat more tractable. In more generalized asynchronous models, there are additional issues like deadlock and even convergence, which are very challenging to analyze.
In this chapter, we will consider synchronous parallel models (sometimes called SIMD) and look at two important models – parallel random access machine (PRAM) and the interconnection network model. The PRAM model is the parallel counterpart of the popular sequential RAM model where p processors can simultaneously access a common memory called shared memory.
Designing memory architecture is an important component of computer organization that tries to achieve a balance between computational speed and memory speed, viz., the time to fetch operands from memory. Computational speeds are much faster since the processing happens within the chip; whereas, a memory access could involve off chip memory units. To bridge this disparity, the modern computer has several layers of memory, called cache memory that provides faster access to the operands. Because of technological and cost limitations, cache memories offer a range of speed–cost tradeoffs. For example, the L1 cache, the fastest cache level is usually also of the smallest size. The L2 cache is larger, say by a factor of ten but also considerably slower. The secondary memory which is the largest in terms of size, e.g., the disk could be 10,000 times slower than the L1 cache. For any large size application, most of the data resides on disk and is transferred to the faster levels of cache when required.
This movement of data is usually beyond the control of the normal programmer and managed by the operating system and hardware. By using empirical principles called temporal and spatial locality of memory access, several replacement policies are used to maximize the chances of keeping the operands in the faster cache memory levels. However, it must be obvious that there will be occasions when the required operand is not present in L1; one has to reach out to L2 and beyond and pay the penalty of higher access cost. In other words, memory access cost is not uniform as discussed in the beginning of this book but for simplicity of the analysis, we had pretended that it remains same.
In this chapter, we will do away with this assumption; however, for simpler exposition, we will deal with only two levels of memory – slowand fast where the slower memory has infinite size while the faster one is limited, say, of size M and significantly faster. Consequently, we can pretend that the faster memory has zero (negligible) access cost and the slower memory has access cost 1.
In the previous chapters we surveyed many well-known algorithmic techniques and successfully applied them to obtain efficient algorithmic solutions for problems from varied domains. Yet, we cannot claim that there is some general methodology for obtaining efficient algorithms for any given problem. To the contrary any new problem often presents unknown challenges that require new insights and the question that is uppermost in anyone's mind is - what are the problems that are notoriously difficult? We need to first set some target before we can assign a notion of difficulty to a problem. For reasons that will become clear later on, the quest is to design a polynomial time algorithm for any given well-defined problem. This may appear too liberal initially since by definition even n100 is a polynomial. However, even this has proved elusive for a large number of problems among which we have come across one in the preceding chapters, namely, 0-1 Knapsack problem. Despite promising starts, we could never claim a truly polynomial time algorithm.
In addition to fixing the limits of practical computation being a polynomial time algorithm, we need to specify the underlying computational model since that will also affect what is achievable in polynomial time. Fortunately, the notion of polynomial time is a robust concept that is not significantly affected by the choice of computational model, except for some constant factors in the exponent of n. We will discuss about a large class of natural and important problems that admits a characterization, that is very intuitive and has resulted in a very interesting theoretical framework. Since the reader is familiar with the Knapsack problem, we will illustrate this framework in the context of the Knapsack problem. Consider a Prover, Verifier interactive game regarding any given Knapsack instance. The Prover is trying to convince the Verifier that she has an efficient (polynomial-time) algorithm without actually revealing the technique. For proof, given an instance of Knapsack, she provides a list of objects chosen in the optimal solution to the Verifier who can easily verify the feasibility and the total profit that can be obtained from this solution.
We provide a deterministic algorithm that finds, in ɛ-O(1)n2 time, an ɛ-regular Frieze–Kannan partition of a graph on n vertices. The algorithm outputs an approximation of a given graph as a weighted sum of ɛ-O(1) many complete bipartite graphs.
As a corollary, we give a deterministic algorithm for estimating the number of copies of H in an n-vertex graph G up to an additive error of at most ɛnv(H), in time ɛ-OH(1)n2.
The text covers important algorithm design techniques, such as greedy algorithms, dynamic programming, and divide-and-conquer, and gives applications to contemporary problems. Techniques including Fast Fourier transform, KMP algorithm for string matching, CYK algorithm for context free parsing and gradient descent for convex function minimization are discussed in detail. The book's emphasis is on computational models and their effect on algorithm design. It gives insights into algorithm design techniques in parallel, streaming and memory hierarchy computational models. The book also emphasizes the role of randomization in algorithm design, and gives numerous applications ranging from data-structures such as skip-lists to dimensionality reduction methods.
We consider the asymptotics of the difference between the empirical measures of the β-Hermite tridiagonal matrix and its minor. We prove that this difference has a deterministic limit and Gaussian fluctuations. Through a correspondence between measures and continual Young diagrams, this deterministic limit is identified with the Vershik–Kerov–Logan–Shepp curve. Moreover, the Gaussian fluctuations are identified with a sectional derivative of the Gaussian free field.
Given complex numbers w1,…,wn, we define the weight w(X) of a set X of 0–1 vectors as the sum of $w_1^{x_1} \cdots w_n^{x_n}$ over all vectors (x1,…,xn) in X. We present an algorithm which, for a set X defined by a system of homogeneous linear equations with at most r variables per equation and at most c equations per variable, computes w(X) within relative error ∊ > 0 in (rc)O(lnn-ln∊) time provided $|w_j| \leq \beta (r \sqrt{c})^{-1}$ for an absolute constant β > 0 and all j = 1,…,n. A similar algorithm is constructed for computing the weight of a linear code over ${\mathbb F}_p$. Applications include counting weighted perfect matchings in hypergraphs, counting weighted graph homomorphisms, computing weight enumerators of linear codes with sparse code generating matrices, and computing the partition functions of the ferromagnetic Potts model at low temperatures and of the hard-core model at high fugacity on biregular bipartite graphs.
An (improper) graph colouring has defect d if each monochromatic subgraph has maximum degree at most d, and has clustering c if each monochromatic component has at most c vertices. This paper studies defective and clustered list-colourings for graphs with given maximum average degree. We prove that every graph with maximum average degree less than (2d+2)/(d+2)k is k-choosable with defect d. This improves upon a similar result by Havet and Sereni (J. Graph Theory, 2006). For clustered choosability of graphs with maximum average degree m, no (1-ɛ)m bound on the number of colours was previously known. The above result with d=1 solves this problem. It implies that every graph with maximum average degree m is $\lfloor{\frac{3}{4}m+1}\rfloor$-choosable with clustering 2. This extends a result of Kopreski and Yu (Discrete Math., 2017) to the setting of choosability. We then prove two results about clustered choosability that explore the trade-off between the number of colours and the clustering. In particular, we prove that every graph with maximum average degree m is $\lfloor{\frac{7}{10}m+1}\rfloor$-choosable with clustering 9, and is $\lfloor{\frac{2}{3}m+1}\rfloor$-choosable with clustering O(m). As an example, the later result implies that every biplanar graph is 8-choosable with bounded clustering. This is the best known result for the clustered version of the earth–moon problem. The results extend to the setting where we only consider the maximum average degree of subgraphs with at least some number of vertices. Several applications are presented.