Due to planned maintenance, between 12:00 am - 2:30 am GMT, you may experience difficulty in adding to basket and purchasing. We apologise for any inconvenience.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We combine a new data model, where the random classification is subjected to rather weakrestrictions which in turn are based on the Mammen−Tsybakov [E. Mammen and A.B. Tsybakov,Ann. Statis. 27 (1999) 1808–1829; A.B. Tsybakov,Ann. Statis. 32 (2004) 135–166.] small margin conditions,and the statistical query (SQ) model due to Kearns [M.J. Kearns, J. ACM45 (1998) 983–1006] to what we refer to as PAC + SQ model. We generalize the classconditional constant noise (CCCN) model introduced by Decatur [S.E. Decatur, inICML ’97: Proc. of the Fourteenth Int. Conf. on Machine Learn. MorganKaufmann Publishers Inc. San Francisco, CA, USA (1997) 83–91] to the noise modelorthogonal to a set of query functions. We show that every polynomial time PAC + SQ learning algorithm can beefficiently simulated provided that the random noise rate is orthogonal to the queryfunctions used by the algorithm given the target concept. Furthermore, we extend theconstant-partition classification noise (CPCN) model due to Decatur [S.E. Decatur, inICML ’97: Proc. of the Fourteenth Int. Conf. on Machine Learn. MorganKaufmann Publishers Inc. San Francisco, CA, USA (1997) 83–91] to what we call theconstant-partition piecewise orthogonal (CPPO) noise model. We show how statisticalqueries can be simulated in the CPPO scenario, given the partition is known to thelearner. We show how to practically use PAC +SQ simulators in the noise model orthogonal to the query space bypresenting two examples from bioinformatics and software engineering. This way, wedemonstrate that our new noise model is realistic.
We prove the existence of a function $f :\mathbb{N} \to \mathbb{N}$ such that the vertices of every planar graph with maximum degree Δ can be 3-coloured in such a way that each monochromatic component has at most f(Δ) vertices. This is best possible (the number of colours cannot be reduced and the dependence on the maximum degree cannot be avoided) and answers a question raised by Kleinberg, Motwani, Raghavan and Venkatasubramanian in 1997. Our result extends to graphs of bounded genus.
We show that the expected time for a random walk on a (multi-)graph G to traverse all m edges of G, and return to its starting point, is at most 2m2; if each edge must be traversed in both directions, the bound is 3m2. Both bounds are tight and may be applied to graphs with arbitrary edge lengths. This has interesting implications for Brownian motion on certain metric spaces, including some fractals.
The preferential attachment network with fitness is a dynamic random graph model. New vertices are introduced consecutively and a new vertex is attached to an old vertex with probability proportional to the degree of the old one multiplied by a random fitness. We concentrate on the typical behaviour of the graph by calculating the fitness distribution of a vertex chosen proportional to its degree. For a particular variant of the model, this analysis was first carried out by Borgs, Chayes, Daskalakis and Roch. However, we present a new method, which is robust in the sense that it does not depend on the exact specification of the attachment law. In particular, we show that a peculiar phenomenon, referred to as Bose–Einstein condensation, can be observed in a wide variety of models. Finally, we also compute the joint degree and fitness distribution of a uniformly chosen vertex.
Let ${\mathcal H}$ denote a collection of subsets of {1,2,. . .,n}, and assign independent random variables uniformly distributed over [0,1] to the n elements. Declare an element p-present if its corresponding value is at most p. In this paper, we quantify how much the observation of the r-present (r>p) set of elements affects the probability that the set of p-present elements is contained in ${\mathcal H}$. In the context of percolation, we find that this question is closely linked to the near-critical regime. As a consequence, we show that for every r>1/2, bond percolation on the subgraph of the square lattice given by the set of r-present edges is almost surely noise sensitive at criticality, thus generalizing a result due to Benjamini, Kalai and Schramm.
For d ≥ 2, let Hd(n,p) denote a random d-uniform hypergraph with n vertices in which each of the $\binom{n}{d}$ possible edges is present with probability p=p(n) independently, and let Hd(n,m) denote a uniformly distributed d-uniform hypergraph with n vertices and m edges. Let either H=Hd(n,m) or H=Hd(n,p), where m/n and $\binom{n-1}{d-1}p$ need to be bounded away from (d−1)−1 and 0 respectively. We determine the asymptotic probability that H is connected. This yields the asymptotic number of connected d-uniform hypergraphs with given numbers of vertices and edges. We also derive a local limit theorem for the number of edges in Hd(n,p), conditioned on Hd(n,p) being connected.
Let Hd(n,p) signify a random d-uniform hypergraph with n vertices in which each of the $\binom{n}{d}$ possible edges is present with probability p=p(n) independently, and let Hd(n,m) denote a uniformly distributed d-uniform hypergraph with n vertices and m edges. We derive local limit theorems for the joint distribution of the number of vertices and the number of edges in the largest component of Hd(n,p) and Hd(n,m) in the regime $(d-1)\binom{n-1}{d-1}p>1+\varepsilon$, resp. d(d−1)m/n>1+ϵ, where ϵ>0 is arbitrarily small but fixed as n → ∞. The proofs are based on a purely probabilistic approach.
This chapter reduces the inference problem in probabilistic graphical models to an equivalent maximum weight stable set problem on a graph. We discuss methods for recognizing when the latter problem can be solved efficiently by appealing to perfect graph theory. Furthermore, practical solvers based on convex programming and message-passing are presented.
Tractability is the study of computational tasks with the goal of identifying which problem classes are tractable or, in other words, efficiently solvable. The class of tractable problems is traditionally assumed to be solvable in polynomial time by a deterministic Turing machine and is denoted by P.The class contains many natural tasks such as sorting a set of numbers, linear programming (the decision version), determining if a number is prime, and finding a maximum weight matching. Many interesting problems, however, lie in another class that generalizes P and is known as NP: the class of languages decidable in polynomial time on a non-deterministic Turing machine. We trivially have that P isasubset of NP (many researchers also believe that it is a strict subset). It is believed that many problems in the class NP are, in the worst case, intractable and do not admit efficient inference. Problems such as maximum stable set, the traveling salesman problem and graph coloring are known to be NP-hard (at least as hard as the hardest problems in NP). It is, therefore, widely suspected that there are no polynomial-time algorithms for NP-hard problems.
This chapter covers methods for identifying islands of tractability for NP-hard combinatorial problems by exploiting suitable properties of their graphical structure. Acyclic structures are considered, as well as nearly-acyclic ones identified by means of so-called structural decomposition methods. In particular, the chapter focuses on the tree decomposition method, which is the most powerful decomposition method for graphs, and on the hypertree decomposition method, which is its natural counterpart for hypergraphs. These problem-decomposition methods give rise to corresponding notions of width of an instance, namely, treewidth and hypertree width. It turns out that many NP-hard problems can be solved efficiently over classes of instances of bounded treewidth or hypertree width: deciding whether a solution exists, computing a solution, and even computing an optimal solution (if some cost function over solutions is specified) are all polynomial-time tasks. Example applications include problems from artificial intelligence, databases, game theory, and combinatorial auctions.
Many NP-hard problems in different areas such as AI [42], Database Systems [6, 81], Game theory [45, 31, 20], and Network Design [34], are known to be efficiently solvable when restricted to instances whose underlying structures can be modeled via acyclic graphs or acyclic hypergraphs. For such restricted classes of instances, solutions can usually be computed via dynamic programming. However, as a matter of fact, (graphical) structures arising from real applications are in most relevant cases not properly acyclic. Yet, they are often not very intricate and exhibit some rather limited degree of cyclicity, which suffices to retain most of the nice properties of acyclic instances.
Machine learning and data analysis have driven explosive growth in interest in the methods of large-scale optimization. Many commonly used techniques such as stochastic-gradients date back several decades, but owing to their practical success they have gained great importance in machine learning. Before interior point methods totally dominated the field of optimization, first-order methods had already been studied and theoretically analyzed in substantial detail. But interest in these techniques skyrocketed after the prolific rise of applications in machine learning, signal processing, etc. This chapter is a brief introduction to this vast and flourishing area of large-scale optimization.
Introduction
Machine Learning (ML) broadly encompasses a variety of adaptive, autonomous, and intelligent tasks where one must “learn” to predict from observations and feedback. Throughout its evolution, ML has drawn heavily and successfully on optimization algorithms; this relation to optimization is not surprising as “learning” and “adapting” ultimately involve problems where some quality function must be optimized.
But the interaction between ML and optimization is now undergoing rapid change. The increased size, complexity, and variety of ML problems, not only prompts a refinement of existing optimization techniques, but also spurs development of new methods tuned to the specific needs of ML applications.
In particular, ML applications must usually cope with large-scale data, which forces us to prefer “simpler,” perhaps less accurate but more scalable algorithms. Such methods can also crunch through more data, and may actually be better suited for learning – for a more precise characterization see [11]. The use of possibly less accurate methods is also grounded in pragmatic concerns: modeling limitations, observational noise, uncertainty, and computational errors are pervasive in real data.
Optimization problems are often hard to solve precisely. However solutions that are only nearly optimal are often good enough in practical applications. Approximation algorithms can find such solutions efficiently for many interesting problems. Profound theoretical results additionally help us understand what problems are approximable. This chapter gives an overview of existing approximation techniques, along five broad categories: greedy algorithms, linear and semi-definite programming relaxations, metric embeddings and special techniques. It concludes with an overview of the main inapproximability results.
Introduction
NP-hard optimization problems are ubiquitous, and unless P=NP, we cannot expect algorithms that find optimal solutions on all instances in polynomial time. This intractability thus forces us to relax one of the three above mentioned constraints. Approximation algorithms relax the optimality constraint, and aim to do so by as small an amount as possible. We shall concern ourselves with discrete optimization problems, where the goal is to find amongst the set of feasible solutions, the one that minimizes (or maximizes) the value of the objective function. Usually, the space of feasible solutions is defined implicitly, e.g. the set of cuts in a graph on n vertices. The objective function associates with each feasible solution a real value; this usually has a succinct representation as well, e.g. the number of edges in the cut. We measure the performance of an approximation algorithm on a given instance by the ratio of the value of the solution output by the algorithm, to that of the optimal solution.
In this chapter we will introduce submodularity and some of its generalizations, illustrate how it arises in various applications, and discuss algorithms for optimizing submodular functions.
Submodularity is a property of set functions with deep theoretical consequences and far-reaching applications. At first glance it seems very similar to concavity, in other ways it resembles convexity. It appears in a wide variety of applications: in Computer Science it has recently been identified and utilized in domains such as viral marketing [39], information gathering [44], image segmentation [10, 40, 36], document summarization [56], and speeding up satisfiability solvers [73]. Our emphasis in this chapter is on maximization; there are many important results and applications related to minimizing submodular functions that we do not cover.
As a concrete running example, we will consider the problem of deploying sensors in a drinking water distribution network (see Figure 3.1) in order to detect contamination. In this domain, we may have a model of how contaminants, accidentally or maliciously introduced into the network, spread over time. Such a model then allows to quantify the benefit f(A) of deploying sensors at a particular set A of locations (junctions or pipes in the network) in terms of the detection performance (such as average time to detection).
Based on this notion of utility, we then wish to find an optimal subset A ⊆ V of locations maximizing the utility, maxAf(A), subject to some constraints (such as bounded cost). This application requires solving a difficult real-world optimization problem, that can be handled with the techniques discussed in this chapter (Krause et al. [49] show in detail how submodular optimization can be applied in this domain.)
This chapter discusses recent advances in modern coding theory, in particular the use of popular graph-based codes and their low complexity decoding algorithms. We describe absorbing sets as the key object for characterizing the performance of iteratively-decoded graph-based codes and we propose several directions for future investigation in this thriving discipline.
Chapter Overview
Every engineered communication system, ranging from satellite communications to hard disk drives to Ethernet must operate under noisy conditions. The key to reliable communication and storage is to add an appropriate amount of redundancy to make the system reliable. The field of channel coding is concerned with constructing channel codes and their decoding algorithms: controlled redundancy is introduced into a message prior to its transmission over a noisy channel (the encoding step), and this redundancy is removed from the received noisy string to unveil the intended message (the decoding step). The encoded message is referred to as the codeword. The collection of all codewords is a channel code. Assuming all the messages have the same length, and all the codewords have the same length, the ratio of message length to codeword length is the code rate. To make coding systems implementable in practice, channel codes must provide the best possible protection to noise while their decoding algorithms must be of acceptable complexity.
There is a clear tension with this dual goal: if a channel code protects a fairly long encoded message with relatively few but carefully derived redundancy bits (necessary for high performance), the optimal, maximum likelihood decoding algorithm has exponential complexity.
Boolean Satisfiability (SAT) can be considered a success story of Computer Science. Since the mid-90s, SAT has evolved from a decision problem with theoretical interest, to a problem with key practical benefits, finding a wide range of practical applications. From the early 60s until the mid 90s, existing SAT solvers were able to solve small instances with few tens of variables and hundreds of clauses. In contrast, modern SAT solvers are able to solve practical instances with hundreds of thousands of variables and millions of clauses. This chapter describes the techniques that are implemented in SAT solvers aiming to explain why SAT solvers work (so well) in practice. These techniques range from efficient search techniques to dedicated data structures, among others. Whereas some techniques are commonly implemented in modern SAT solvers, some others are more speciic in the sense that only some instances beneit from its implementation. Furthermore, a tentative glimpse of the future is presented.
Introduction
Boolean Satisfiability (SAT) is an NP-complete decision problem [14]. SAT was the first problem to be shown NP-complete. There are no known polynomial time algorithms for SAT. Moreover, it is believed that any algorithm that solves SAT is exponential in the number of variables, in the worst-case.
Although SAT is in theory an NP-complete problem, in practice it can be seen as a success story of Computer Science. There have been remarkable improvements since the mid 90s, namely clause learning and unique implication points (UIPs) [43], search restarts [15, 26], lazy data structures [48], adaptive branching heuristics [48], clause minimization [59] and preprocessing [18].
In this chapter, we will survey recent results on the broad family of optimisation problems that can be cast as valued constraint satisfaction problems (VCSPs). We discuss general methods for analysing the complexity of such problems, and give examples of tractable cases.
Introduction
Computational problems from many different areas involve finding values for variables that satisfy certain specified restrictions and optimise certain specified criteria.
In this chapter, we will show that it is useful to abstract the general form of such problems to obtain a single generic framework. Bringing all such problems into a common framework draws attention to common aspects that they all share, and allows very general analytical approaches to be developed. We will survey some of these approaches, and the results that have been obtained by using them.
The generic framework we shall use is the valued constraint satisfaction problem (VCSP), defined formally in Section 4.3. We will show that many combinatorial optimisation problems can be conveniently expressed in this framework, and we will focus on finding restrictions to the general problem which are sufficient to ensure tractability.
An important and well-studied special case of the VCSP is the constraint satisfaction problem (CSP), which deals with combinatorial search problems which have no optimisation criteria. We give a brief introduction to the CSP in Section 4.2, before defining the more general VCSP framework in Section 4.3. Section 4.4 then presents a number of examples of problems that can be seen as special cases of the VCSP.
The remainder of the chapter discusses what happens to the complexity of the valued constraint satisfaction problem when we restrict it in various ways.