To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A family of sets is said to be intersecting if A ∩ B ≠ ∅ for all A, B ∈ . It is a well-known and simple fact that an intersecting family of subsets of [n] = {1, 2, . . ., n} can contain at most 2n−1 sets. Katona, Katona and Katona ask the following question. Suppose instead ⊂ [n] satisfies || = 2n−1 + i for some fixed i > 0. Create a new family p by choosing each member of independently with some fixed probability p. How do we choose to maximize the probability that p is intersecting? They conjecture that there is a nested sequence of optimal families for i = 1, 2,. . ., 2n−1. In this paper, we show that the families [n](≥r) = {A ⊂ [n]: |A| ≥ r} are optimal for the appropriate values of i, thereby proving the conjecture for this sequence of values. Moreover, we show that for intermediate values of i there exist optimal families lying between those we have found. It turns out that the optimal families we find simultaneously maximize the number of intersecting subfamilies of each possible order.
Standard compression techniques appear inadequate to solve the problem as they do not preserve intersection properties of subfamilies. Instead, our main tool is a novel compression method, together with a way of ‘compressing subfamilies’, which may be of independent interest.
For each of us who appear to have had a successful experiment there are many to whom their own experiments seem barren and negative.
Melvin Calvin, 1961 Nobel Lecture
An experiment is not considered “barren and negative” when it disproves your conjecture: an experiment fails by being inconclusive.
Successful experiments are partly the product of good experimental designs, as described in Chapter 2; there is also an element of luck (or savvy) in choosing a well-behaved problem to study. Furthermore, computational research on algorithms provides unusual opportunities for “tuning” experiments to yield more successful analyses and stronger conclusions. This chapter surveys techniques for building better experiments along these lines.
We start with a discussion of what makes a data set good or bad in this context. The remainder of this section surveys strategies for tweaking experimental designs to yield more successful outcomes.
If tweaks are not sufficient, stronger measures can be taken; Section 6.1 surveys variance reduction techniques, which modify test programs to generate better data, and Section 6.2 describes simulation shortcuts, which produce more data per unit of computation time.
The key idea is to exploit the fact, pointed out in Section 5.1, that the application program that implements an algorithm for practical use is distinct from the test program that describes algorithm performance. The test program need not resemble the application program at all; it is only required to reproduce faithfully the algorithm properties of interest.
Richard Hamming, Numerical Methods for Scientists and Engineers
Some questions:
You are a working programmer given a week to reimplement a data structure that supports client transactions, so that it runs efficiently when scaled up to a much larger client base. Where do you start?
You are an algorithm engineer, building a code repository to hold fast implementations of dynamic multigraphs. You read papers describing asymptotic bounds for several approaches. Which ones do you implement?
You are an operations research consultant, hired to solve a highly constrained facility location problem. You could build the solver from scratch or buy optimization software and tune it for the application. How do you decide?
You are a Ph.D. student who just discovered a new approximation algorithm for graph coloring that will make your career. But you're stuck on the average-case analysis. Is the theorem true? If so, how can you prove it?
You are the adviser to that Ph.D. student, and you are skeptical that the new algorithm can compete with state-of-the-art graph coloring algorithms. How do you find out?
One good way to answer all these questions is: run experiments to gain insight.
This book is about experimental algorithmics, which is the study of algorithms and their performance by experimental means. We interpret the word algorithm very broadly, to include algorithms and data structures, as well as their implementations in source code and machine code.
In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selection amongst them for the purposes of a Calculating Engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation.
Ada Byron, Memoir on the Analytic Engine, 1843
This chapter considers an essential question raised by Lady Byron in her famous memoir: How to make it run faster?
This question can be addressed at all levels of the algorithm design hierarchy sketched in Figure 1.1 of Chapter 1, including systems, algorithms, code, and hardware. Here we focus on tuning techniques that lie between the algorithm design and hardware levels. We start with the assumption that the system analysis and abstract algorithm design work has already taken place, and that a basic implementation of an algorithm with good asymptotic performance is in hand. The tuning techniques in this chapter are meant to improve upon the abstract design work, not replace it.
Tuning exploits the gaps between practical experience and the simplifying assumptions necessary to theory, by focusing on constant factors instead of asymptotics, secondary instead of dominant costs, and performance on “typical” inputs rather than theoretical classes. Many of the ideas presented here are known in the folklore under the general rubric of “code tuning.”
This guidebook is written for anyone – student, researcher, or practitioner – who wants to carry out computational experiments on algorithms (and programs) that yield correct, general, informative, and useful results. (We take the wide view and use the term “algorithm” to mean “algorithm or program” from here on.)
Whether the goal is to predict algorithm performance or to build faster and better algorithms, the experiment-driven methodology outlined in these chapters provides insights into performance that cannot be obtained by purely abstract means or by simple runtime measurements. The past few decades have seen considerable developments in this approach to algorithm design and analysis, both in terms of number of participants and in methodological sophistication.
In this book I have tried to present a snapshot of the state-of-the-art in this field (which is known as experimental algorithmics and empirical algorithmics), at a level suitable for the newcomer to computational experiments. The book is aimed at a reader with some undergraduate computer science experience: you should know how to program, and ideally you have had at least one course in data structures and algorithm analysis. Otherwise, no previous experience is assumed regarding the other topics addressed here, which range widely from architectures and operating systems, to probability theory, to techniques of statistics and data analysis
A note to academics: The book takes a nuts-and-bolts approach that would be suitable as a main or supplementary text in a seminar-style course on advanced algorithms, experimental algorithmics, algorithm engineering, or experimental methods in computer science.
Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat.
Sun Tzu, The Art of War
W. I. B. Beveridge, in his classic guidebook for young scientists [7], likens scientific research “to warfare against the unknown”:
The procedure most likely to lead to an advance is to concentrate one's forces on a very restricted sector chosen because the enemy is believed to be weakest there.Weak spots in the defence may be found by preliminary scouting or by tentative attacks.
This chapter is about developing small- and large-scale plans of attack in algorithmic experiments.
To make the discussion concrete, we consider algorithms for the graph coloring (GC) problem. The input is a graph G containing n vertices and m edges. A coloring of G is an assignment of colors to vertices such that no two adjacent vertices have the same color. Figure 2.1 shows an example graph with eight vertices and 10 edges, colored with four colors. The problem is to find a coloring that uses a minimum number of colors – is 4 the minimum in this case?
When restricted to planar graphs, this is the famous map coloring problem, which is to color the regions of a map so that adjacent regions have different colors. Only four colors are needed for any map, but in the general graph problem, as many as n colors may be required.
Really, the slipshod way we deal with data is a disgrace to civilization.
M. J. Moroney, Facts from Figures
Information scientists tell us that data, alone, have no value or meaning [1]. When organized and interpreted, data become information, which is useful for answering factual questions: Which is bigger, X or Y? How many Z's are there? A body of information can be further transformed into knowledge, which reflects understanding of how and why, at a level sufficient to direct choices and make predictions: which algorithm should I use for this application? How long will it take to run?
Data analysis is a process of inspecting, summarizing, and interpreting a set of data to transform it into something useful: information is the immediate result, and knowledge the ultimate goal.
This chapter surveys some basic techniques of data analysis and illustrates their application to algorithmic questions. Section 7.1 presents techniques for analyzing univariate (one-dimensional) data samples. Section 7.2 surveys techniques for analyzing bivariate data samples, which are expressed as pairs of (X, Y) points. No statistical background is required of the reader.
One chapter is not enough to cover all the data analysis techniques that are useful to algorithmic experiments – something closer to a few bookshelves would be needed. Here we focus on describing a small collection of techniques that address the questions most commonly asked about algorithms, and on knowing which technique to apply in a given scenario.
Write your workhorse program well; instrument your program; your experimental results form a database: treat it with respect; keep a kit full of sharp tools.
Jon Louis Bentley, Ten Commandments for Experiments on Algorithms
They say the workman is only as good as his tools; in experimental algorithmics the workman must often build his tools.
The test environment is the collection of programs and files assembled together to support computational experiments on algorithms. This collection includes test programs that implement the algorithms of interest, code to generate input instances and files containing instances; scripts to control and document tests, tools for measuring performance, and data analysis software.
This chapter presents tips for assembling and building these components to create a reliable, efficient, and flexible test environment. We start with a survey of resources available to the experimenter. Section 5.1 surveys aspects of test program design, and Section 5.2 presents a cookbook of methods for generating random numbers and combinatorial objects to use as test inputs or inside randomized algorithms.
Most algorithm researchers prefer to work in Unix-style operating systems, which provide excellent tools for conducting experiments, including:
Utilities such as time and gprof for measuring elapsed and CPU times.
Shell scripts and makefiles. Shell scripting makes it easy to automate batches of tests, and makefiles make it easy to mix and match compilation units. Scripts and make files also create a document trail that records the history of an experimental project.
Let kr(n, δ) be the minimum number of r-cliques in graphs with n vertices and minimum degree at least δ. We evaluate kr(n, δ) for δ ≤ 4n/5 and some other cases. Moreover, we give a construction which we conjecture to give all extremal graphs (subject to certain conditions on n, δ and r).
Simple families of increasing trees were introduced by Bergeron, Flajolet and Salvy. They include random binary search trees, random recursive trees and random plane-oriented recursive trees (PORTs) as important special cases. In this paper, we investigate the number of subtrees of size k on the fringe of some classes of increasing trees, namely generalized PORTs and d-ary increasing trees. We use a complex-analytic method to derive precise expansions of mean value and variance as well as a central limit theorem for fixed k. Moreover, we propose an elementary approach to derive limit laws when k is growing with n. Our results have consequences for the occurrence of pattern sizes on the fringe of increasing trees.
More than forty years ago, Erdős conjectured that for any , every k-uniform hypergraph on n vertices without t disjoint edges has at most max edges. Although this appears to be a basic instance of the hypergraph Turán problem (with a t-edge matching as the excluded hypergraph), progress on this question has remained elusive. In this paper, we verify this conjecture for all . This improves upon the best previously known range , which dates back to the 1970s.
A causal set is a countably infinite poset in which every element is above finitely many others; causal sets are exactly the posets that have a linear extension with the order-type of the natural numbers; we call such a linear extension a natural extension. We study probability measures on the set of natural extensions of a causal set, especially those measures having the property of order-invariance: if we condition on the set of the bottom k elements of the natural extension, each feasible ordering among these k elements is equally likely. We give sufficient conditions for the existence and uniqueness of an order-invariant measure on the set of natural extensions of a causal set.
We discuss the connection between the expansion of small sets in graphs, and the Schatten norms of their adjacency matrices. In conjunction with a variant of the Azuma inequality for uniformly smooth normed spaces, we deduce improved bounds on the small-set isoperimetry of Abelian Alon–Roichman random Cayley graphs.
We construct a sequence of finite graphs that weakly converge to a Cayley graph, but there is no labelling of the edges that would converge to the corresponding Cayley diagram. A similar construction is used to give graph sequences that converge to the same limit, and such that a Hamiltonian cycle in one of them has a limit that is not approximable by any subgraph of the other. We give an example where this holds, but convergence is meant in a stronger sense. This is related to whether having a Hamiltonian cycle is a testable graph property.
Any amicable pair ϕ, ψ of Sturmian morphisms enables aconstruction of a ternary morphism η which preserves the set of infinitewords coding 3-interval exchange. We determine the number of amicable pairs with the sameincidence matrix in SL±(2,ℕ) and we study incidence matricesassociated with the corresponding ternary morphisms η.