Due to planned maintenance, between 12:00 am - 2:30 am GMT, you may experience difficulty in adding to basket and purchasing. We apologise for any inconvenience.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The subject of this book is automated learning, or, as we will more often call it, Machine Learning (ML). That is, we wish to program computers so that they can “learn” from input available to them. Roughly speaking, learning is the process of converting experience into expertise or knowledge. The input to a learning algorithm is training data, representing experience, and the output is some expertise, which usually takes the form of another computer program that can perform some task. Seeking a formal-mathematical understanding of this concept, we'll have to be more explicit about what we mean by each of the involved terms: What is the training data our programs will access? How can the process of learning be automated? How can we evaluate the success of such a process (namely, the quality of the output of a learning program)?
WHAT IS LEARNING?
Let us begin by considering a couple of examples from naturally occurring animal learning. Some of the most fundamental issues in ML arise already in that context, which we are all familiar with.
Bait Shyness – Rats Learning to Avoid Poisonous Baits: When rats encounter food items with novel look or smell, they will first eat very small amounts, and subsequent feeding will depend on the flavor of the food and its physiological effect. If the food produces an ill effect, the novel food will often be associated with the illness, and subsequently, the rats will not eat it.
Let us begin our mathematical analysis by showing how successful learning can be achieved in a relatively simplified setting. Imagine you have just arrived in some small Pacific island. You soon find out that papayas are a significant ingredient in the local diet. However, you have never before tasted papayas. You have to learn how to predict whether a papaya you see in the market is tasty or not. First, you need to decide which features of a papaya your prediction should be based on. On the basis of your previous experience with other fruits, you decide to use two features: the papaya's color, ranging from dark green, through orange and red to dark brown, and the papaya's softness, ranging from rock hard to mushy. Your input for figuring out your prediction rule is a sample of papayas that you have examined for color and softness and then tasted and found out whether they were tasty or not. Let us analyze this task as a demonstration of the considerations involved in learning problems.
Our first step is to describe a formal model aimed to capture such learning tasks.
A FORMAL MODEL – THE STATISTICAL LEARNING FRAMEWORK
The learner's input: In the basic statistical learning setting, the learner has access to the following:
Domain set: An arbitrary set, χ. This is the set of objects that we may wish to label.
In the previous chapter we introduced the families of convex-Lipschitz-bounded and convex-smooth-bounded learning problems. In this section we show that all learning problems in these two families are learnable. For some learning problems of this type it is possible to show that uniform convergence holds; hence they are learnable using the ERM rule. However, this is not true for all learning problems of this type. Yet, we will introduce another learning rule and will show that it learns all convex-Lipschitz-bounded and convex-smooth-bounded learning problems.
The new learning paradigm we introduce in this chapter is called Regularized Loss Minimization, or RLM for short. In RLM we minimize the sum of the empirical risk and a regularization function. Intuitively, the regularization function measures the complexity of hypotheses. Indeed, one interpretation of the regularization function is the structural risk minimization paradigm we discussed in Chapter 7. Another view of regularization is as a stabilizer of the learning algorithm. An algorithm is considered stable if a slight change of its input does not change its output much. We will formally define the notion of stability (what we mean by “slight change of input” and by “does not change much the output”) and prove its close relation to learnability. Finally, we will show that using the squared l2 norm as a regularization function stabilizes all convex-Lipschitz or convex-smooth learning problems. Hence, RLM can be used as a general learning rule for these families of learning problems.
We consider the popular and well-studied push model, which is used to spread information in a given network with n vertices. Initially, some vertex owns a rumour and passes it to one of its neighbours, which is chosen randomly. In each of the succeeding rounds, every vertex that knows the rumour informs a random neighbour. It has been shown on various network topologies that this algorithm succeeds in spreading the rumour within O(log n) rounds. However, many studies are quite coarse and involve huge constants that do not allow for a direct comparison between different network topologies. In this paper, we analyse the push model on several important families of graphs, and obtain tight runtime estimates. We first show that, for any almost-regular graph on n vertices with small spectral expansion, rumour spreading completes after log2n + log n+o(log n) rounds with high probability. This is the first result that exhibits a general graph class for which rumour spreading is essentially as fast as on complete graphs. Moreover, for the random graph G(n,p) with p=c log n/n, where c > 1, we determine the runtime of rumour spreading to be log2n + γ (c)log n with high probability, where γ(c) = clog(c/(c−1)). In particular, this shows that the assumption of almost regularity in our first result is necessary. Finally, for a hypercube on n=2d vertices, the runtime is with high probability at least (1+β) ⋅ (log2n + log n), where β > 0. This reveals that the push model on hypercubes is slower than on complete graphs, and thus shows that the assumption of small spectral expansion in our first result is also necessary. In addition, our results combined with the upper bound of O(log n) for the hypercube (see [11]) imply that the push model is faster on hypercubes than on a random graph G(n, clog n/n), where c is sufficiently close to 1.
We call a graph H Ramsey-unsaturated if there is an edge in the complement of H such that the Ramsey number r(H) of H does not change upon adding it to H. This notion was introduced by Balister, Lehel and Schelp in [J. Graph Theory51 (2006), pp. 22–32], where it is shown that cycles (except for C4) are Ramsey-unsaturated, and conjectured that, moreover, one may add any chord without changing the Ramsey number of the cycle Cn, unless n is even and adding the chord creates an odd cycle.
We prove this conjecture for large cycles by showing a stronger statement. If a graph H is obtained by adding a linear number of chords to a cycle Cn, then r(H)=r(Cn), as long as the maximum degree of H is bounded, H is either bipartite (for even n) or almost bipartite (for odd n), and n is large.
This motivates us to call cycles strongly Ramsey-unsaturated. Our proof uses the regularity method.
In this paper we prove that two local conditions involving the degrees and co-degrees in a graph can be used to determine whether a given vertex partition is Frieze–Kannan regular. With a more refined version of these two local conditions we provide a deterministic algorithm that obtains a Frieze–Kannan regular partition of any graph G in time O(|V(G)|2).
independent sets. This improves a recent result of the first and third authors [8]. In particular, it implies that as n → ∞, every triangle-free graph on n vertices has at least ${e^{(c_1-o(1)) \sqrt{n} \ln n}}$ independent sets, where $c_1 = \sqrt{\ln 2}/4 = 0.208138 \ldots$. Further, we show that for all n, there exists a triangle-free graph with n vertices which has at most ${e^{(c_2+o(1))\sqrt{n}\ln n}}$ independent sets, where $c_2 = 2\sqrt{\ln 2} = 1.665109 \ldots$. This disproves a conjecture from [8].
Let H be a (k+1)-uniform linear hypergraph with n vertices and average degree t. We also show that there exists a constant ck such that the number of independent sets in H is at least
This is tight apart from the constant ck and generalizes a result of Duke, Lefmann and Rödl [9], which guarantees the existence of an independent set of size
Let $(G_m)_{0\le m\le \binom{n}{2}}$ be the random graph process starting from the empty graph on vertex set [n] and with a random edge added in each step. Let mk denote the minimum integer such that Gmk contains a k-regular subgraph. We prove that for all sufficiently large k, there exist two constants εk ≥ σk > 0, with εk → 0 as k → ∞, such that asymptotically almost surely any k-regular subgraph of Gmk has size between (1 − εk)|${\mathcal C}_k$| and (1 − σk)|${\mathcal C}_k$|, where ${\mathcal C}_k$ denotes the k-core of Gmk.
Some of the most interesting and important results concerning quantum finite automata arethose showing that they can recognize certain languages with (much) less resources thancorresponding classical finite automata. This paper shows three results of such a typethat are stronger in some sense than other ones because (a) they deal with models ofquantum finite automata with very little quantumness (so-called semi-quantum one- andtwo-way finite automata); (b) differences, even comparing with probabilistic classicalautomata, are bigger than expected; (c) a trade-off between the number of classical andquantum basis states needed is demonstrated in one case and (d) languages (or the promiseproblem) used to show main results are very simple and often explored ones in automatatheory or in communication complexity, with seemingly little structure that could beutilized.