To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Princeton's polymath John Tukey (1915–2000) often declared that a graph is the best, and sometimes the only, way to find what you were not looking for. Tukey was giving voice to what all data scientists now accept as gospel – statistical graphs are powerful tools for the discovery of quantitative phenomena, for their communication, and even for the efficient storage of information.
Yet, despite their ubiquitousness in modern life, graphical display is a relatively modern invention. Its origins are not shrouded in history like the invention of the wheel or of fire. There was no reason for the invention of a method to visually display data until the use of data, as evidence, was an accepted part of scientific epistemology. Thus it isn't surprising that graphs only began to appear during the eighteenth-century Enlightenment after the writings of the British empiricists John Locke (1632–1704), George Berkeley (1685–1753), and David Hume (1711–76) popularized and justified empiricism as a way of knowing things.
Graphical display did not emerge from the preempirical murk in bits and pieces. Once the epistemological ground was prepared, its birth was more like Botticelli's Venus – arising fully adult. The year 1786 is the birthday of modern statistical graphics, devised by the Scottish iconoclast William Playfair (1759–1823) who invented what was an almost entirely new way to communicate quantitative phenomena. Playfair showed the rise and fall over time of imports and exports between nations with line charts; the extent of Turkey that lay in each of three continents with the first pie chart; and the characteristics of Scotland's trade in a single year with a bar chart. Thus in a single remarkable volume, an atlas that contained not a single map, he provided spectacular versions of three of the four most important graphical forms. His work was celebrated and has subsequently been seized upon by data scientists as a crucial tool to communicate empirical findings to one another and, indeed, even to oneself.
In this section we celebrate graphical display as a tool to communicate quantitative evidence by telling four stories. In Chapter 8 we set the stage with a discussion of the empathetic mind-set required to design effective communications of any sort, although there is a modest tilting toward visual communications.
Not all of data science requires the mastery of deep ideas; sometimes aid in thinking can come from some simple rules of thumb. We start with a couple of warm-up chapters to get us thinking about evidence. In the first, I show how the Rule of 72, long used in finance, can have much wider application. Chapter 2 examines a puzzle posed by a New York Times music critic, why are there so many piano virtuosos? By adjoining this puzzle with a parallel one in athletics I unravel both with one simple twist of my statistical wrist. In these two chapters we also meet two important statistical concepts: (1) the value of an approximate answer and (2) that the likelihood of extreme observations increases apace with the size of the sample. This latter idea – that, for example, the tallest person in a group of one hundred is likely not as tall as the tallest in a group of one thousand – although this result can be expressed explicitly with a little mathematics it can be understood intuitively without them and so be used to explain phenomena we encounter every day.
I consider the most important contribution to scientific thinking since David Hume to be Donald B. Rubin's Model for Causal Inference. Rubin's Model is the heart of this section and of this book. Although the fundamental ideas of Rubin's Model are easy to state, the deep contemplation of counterfactual conditionals can give you a headache. Yet the mastery of it changes you. In a very real sense learning this approach to causal inference is closely akin to learning how to swim or how to read. They are difficult tasks both, but once mastered you are changed forever. After learning to read or to swim, it is hard to imagine what it was like not being able to do so. In the same way, once you absorb Rubin's Model your thinking about the world will change. It will make you powerfully skeptical, pointing the way to truly find things out. In Chapter 3 I illustrate how to use this approach to assess of the causal effect of school performance on happiness as well as the opposite: the causal effect happiness has on school performance.
There have been many remarkable changes in the world over the last century, but few have surprised me as much as the transformation in public attitude toward my chosen profession, statistics – the science of uncertainty. Throughout most of my life the word boring was the most common adjective associated with the noun statistics. In the statistics courses that I have taught, stretching back almost fifty years, by far the most prevalent reason that students gave for why they were taking the course was “it's required.” This dreary reputation nevertheless gave rise to some small pleasures. Whenever I found myself on a plane, happily involved with a book, and my seatmate inquired, “What do you do?” I could reply, “I'm a statistician,” and confidently expect the conversation to come to an abrupt end, whereupon I could safely return to my book. This attitude began to change among professional scientists decades ago as the realization grew that statisticians were the scientific generalists of the modern information age. As Princeton's John Tukey, an early convert from mathematics, so memorably put it, “as a statistician, I can play in everyone's backyard.”
Statistics, as a discipline, grew out of the murk of applied probability as practiced in gambling dens to wide applicability in demography, agriculture, and the social sciences. But that was only the beginning. The rise of quantum theory made clear that even physics, that most deterministic of sciences, needed to understand uncertainty. The health professions joined in as Evidence-Based Medicine became a proper noun. Prediction models combined with exit polls let us go to sleep early with little doubt about election outcomes. Economics and finance was transformed as “quants” joined the investment teams and their success made it clear that you ignore statistical rigor in devising investment schemes at your own peril.
These triumphs, as broad and wide ranging as they were, still did not capture the public attention until Nate Silver showed up and starting predicting the outcomes of sporting events with uncanny accuracy. His success at this gave him an attentive audience for his early predictions of the outcomes of elections. Talking heads and pundits would opine, using their years of experience and deeply held beliefs, but anyone who truly cared about what would happen went to FiveThirtyEight, Silver's website, for the unvarnished truth.
A famous paradox, attributed to the Greek mathematician Zeno, involves a race between the great hero Achilles and a lowly tortoise. In view of their vastly different speeds, the tortoise was granted a substantial head start. The race began, and in a short time Achilles had reached the tortoise's starting spot. But in that short time, the tortoise had moved slightly ahead. In the second stage of the race Achilles quickly covered that short distance, but the tortoise was not stationary and he moved a little further onward. And so they continued – Achilles would reach where the tortoise had been, but the tortoise would always inch ahead, just out of his reach. From this example, the great Aristotle, concluded that, “In a race, the quickest runner can never overtake the slowest, since the pursuer must first reach the point whence the pursued started, so that the slower must always hold a lead.”
The lesson that we should take from this paradox is that when we focus only on the differences between groups, we too easily lose track of the big picture. Nowhere is this more obvious than in the current public discussions of the size of the gap in test scores between racial groups. In New Jersey the gap between the average scores of white and black students on the well-developed scale of the tests of the NAEP has shrunk by only about 25 percent over the past two decades. The conclusion drawn was that even though the change is in the right direction, it is far too slow.
But focusing on the difference blinds us to a remarkable success in education over the past twenty years. Although the direction and size of student improvements occur across many subject areas and many age groups, I will describe just one – fourth grade mathematics. The dots in Figure 12.1 represent the average scores for all available states on NAEP's fourth grade mathematics test (with the nation as a whole as well as the state of New Jersey's dots labeled for emphasis), for black students and white students in 1992 and 2011. Both racial groups made steep gains over this time period (somewhat steeper gains for blacks than for whites).
From 1996 until 2001 I served as an elected member of the Board of Education for the Princeton Regional Schools. Toward the end of that period a variety of expansion projects were planned that required the voters pass a $61 million bond issue. Each board member was assigned to appear in several public venues to describe the projects and try to convince those in attendance of their value so that they would agree to support the bond issue. The repayment of the bond was projected to add about $500 to the annual school taxes for the average house, which would continue for the forty years of the bond. It was my misfortune to be named as the board representative to a local organization of senior citizens.
At their meeting I was treated repeatedly to the same refrain: that they had no children in the schools, that the schools were more than good enough, and that they were living on fixed incomes and any substantial increase in taxes could constitute a hardship, which would likely continue for the rest of their lives. During all of this I wisely remained silent. Then, when a pugnacious octogenarian strode to the microphone, I feared the worst. He glared out at the gathered crowd and proclaimed, “You're all idiots.” He then elaborated, “What can you add to your house for $500/year that would increase its value as much as this massive improvement to the schools? Not only that, you get the increase in your property value immediately, and you won't live long enough to pay even a small portion of the cost. You're idiots.” Then he stepped down. A large number of the gray heads in the audience turned to one another and nodded in agreement. The bond issue passed overwhelmingly.
Each year, when the real estate tax bill arrives, every homeowner is reminded how expensive public education is. Yet, when the school system works well, it is money well spent, even for those residents without children in the schools. For as surely as night follows day, real estate values march in lockstep with the reputation of the local schools. Of course, the importance of education to all of us goes well beyond the money spent on it.
The naturalist Gilbert White (1720–93) was known for his meticulous observations of flora and fauna in their natural environment, primarily around his village of Selborne in Hampshire. This posthumous 1795 publication, edited by the physician and writer John Aikin (1747–1822), comprises a collection of extracts from White's previously unpublished papers from 1768 to his death. Presented here for 'lovers of natural knowledge' is a full year of White's observations. Following the month-by-month record of natural events, the book contains brief studies of birds, quadrupeds, insects, plants and the weather. A lifelong lover of the outdoors, White had kept a near daily record of his activities for more than forty years. Regarded as one of the fathers of ecology, inspiring others to appreciate the natural world, White is best known for The Natural History and Antiquities of Selborne (1789), which is also reissued in the Cambridge Library Collection.
Reginald J. Farrer (1880–1920) was a horticulturalist and plant finder who made a lasting contribution to British gardening, the rockery designs for which he is best known having been greatly influenced by those he discovered in Asia. First published in 1909, this study eloquently describes the author's own garden and its surrounding countryside in his home town of Clapham, Yorkshire. Focusing on the early spring, Farrer reveals through figurative prose the awakening of the flowers and shrubs, the character of the garden as winter disappears, and the aesthetics inherent to the natural world. The study shows his passion for horticulture, and his dedication to an aesthetic that led him to influence generations of gardeners. Featuring an extensive index of plant names and illustrated with photographs taken by the author, it is as informative as it is descriptive, and offers a wealth of anecdotal advice that remains of great interest.
With Thomas Edison's invention of the phonograph, the beautiful music that was the preserve of the wealthy became a mass-produced consumer good, cheap enough to be available to all. In 1877 Edison dreamed that one day there would be a talking machine in every home. America on Record: A History of Recorded Sound, first published in 2006, provides a history of sound recording from the first thin sheet of tinfoil that was manipulated into retaining sound to the home recordings of rappers in the 1980s and the high-tech studios of the 1990s. This book examines the important technical developments of acoustic, electric, and digital sound reproduction while outlining the cultural impact of recorded music and movies. This second edition updates the story, describing the digital revolution of sound recording with the rise of computers, Napster, DVD, MP3, and iPod.
An Irish-born gardener and writer, William Robinson (1838–1935) travelled widely to study gardens and gardening in Europe and America. He founded a weekly illustrated periodical, The Garden, in 1871, which he owned until 1919, and published numerous books on different aspects of horticulture. Topics included annuals, hardy perennials, alpines and subtropical plants, as well as accounts of his travels. This book, his most famous work, was first published in 1883, and fifteen editions were issued in his lifetime. It has been described as 'the most widely read and influential gardening book ever written'. Aimed at both amateurs and experienced gardeners, it sets out clearly the different types of plant suitable for each type of situation, and how to grow them. Robinson advocated a revolution in garden design, rejecting the more formal flower-beds which had long been popular in favour of a more natural and individual style.
James Shirley Hibberd (1825–90) was a journalist and horticultural writer who worked as a bookseller before devoting his time to researching and lecturing and publishing on gardening. An active member of the Royal Horticultural Society, he edited several gardening magazines including Floral World, and his writing was widely enjoyed and respected. This book, first published in 1856, is Hibberd's carefully researched and practical guide to decorating the home and garden. Hibberd explains the practical aspects of garden design, the pleasures of bee-keeping, and how to construct a pond or aquarium. Full of useful advice on everything from preserving cut flowers to the ideal species of bird to keep in an aviary, this is a charming and enjoyable manual for the Victorian gardener which was very popular in its time, and remains a useful source for the cultural historian as well as an entertaining treat for the general reader.
The first day we connected our NES to our TV and Mario appeared
The first day I instant messaged a friend using MSN Messenger from France to England
The first day I was doing a presentation and said I could get online without a cable
The first day I was carrying my laptop between rooms and an email popped up on my computer
The first day I tentatively spoke into my computer and my friend’s voice came back
The first day the map on my phone automatically figured out where I was
Each of these moments separately blew my mind on the day. It was like magic when they happened. The closest I have had recently was probably the first successful call using Facetime and waving my hand at a Kinect sensor. (Another, that most people probably haven’t experienced, was watching a glass door instantly turn opaque at the touch of a button. Unbelievable.)
Each of these moments blew me away because things happened that weren’t even part of my expectations. I expect our expectations these days have now risen sufficiently high that it’ll probably take teleportation to get a similar effect from a teenager. Maybe life would be more fun if we kept our expectations low?
As I prepared for this event, I began to have serious doubts about my sanity. My calculations were telling me that, contrary to all the current lore in the field, we could scale down the technology such that everything got better: the circuits got more complex, they ran faster, and they took less power – WOW!
Carver Mead
Silicon and semiconductors
When we left the early history of computers in Chapter 2, we had seen that logic gates were first implemented using electromechanical relays – as in the Harvard Mark 1 – and then with vacuum tubes – as in the ENIAC and the first commercial computers. These early computers with many thousands of vacuum tubes actually worked much better and more reliably than many engineers had expected. Nevertheless, the hunt was on for a more dependable technology. After World War II, Bell Labs (Fig. 7.1) initiated a research program to develop solid-state devices as a replacement for vacuum tubes. The focus of the program was not on materials that were metals or insulators but on strange, “in-between” materials called semiconductors.
In a solid, it is the flow of electrons that gives rise to electric currents when a voltage is applied. One of the great successes of quantum physics has been in giving us an understanding of the way in which different types of solids – metals, insulators, and semiconductors – conduct electricity. This quantum mechanical understanding of materials has led directly to the present technological revolution, with its accompanying avalanche of stereo systems, color TVs, computers, and mobile phones. A good conductor, such as copper, must have many conduction electrons that are able to move and thus constitute a current when a voltage is applied. By contrast, an insulator such as glass or carbon has very few conduction electrons, and little or no current flows when a voltage is applied. Semiconductors are solids that conduct electricity much better than insulators but much worse than metals. The elements germanium and silicon are two examples. The importance of silicon for computer technology is evident in the naming of California’s “Silicon Valley,” home to many of the earliest electronic component manufacturers (Fig. 7.2).
In the last chapter, we saw that it was possible to logically separate the design of the actual computer hardware – the electromagnetic relays, vacuum tubes, or transistors – from the software – the instructions that are executed by the hardware. Because of this key abstraction, we can either go down into the hardware layers and see how the basic arithmetic and logical operations are carried out by the hardware, or go up into the software levels and focus on how we tell the computer to perform complex tasks. Computer architect Danny Hillis says:
This hierarchical structure of abstraction is our most important tool in understanding complex systems because it lets us focus on a single aspect of a problem at a time.
We will also see the importance of “functional abstraction”:
Naming the two signals in computer logic 0 and 1 is an example of functional abstraction. It lets us manipulate information without worrying about the details of its underlying representation. Once we figure out how to accomplish a given function, we can put the mechanism inside a “black box,” or a “building block” and stop thinking about it. The function embodied by the building block can be used over and over, without reference to the details of what’s inside.
In this chapter, like Strata Smith going down the mines, we’ll travel downward through the hardware layers (Fig. 2.1) and see these principles in action.
… one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols.”
Warren Weaver
Ideas of probability: The frequentists and the Bayesians
We are all familiar with the idea that a fair coin has an equal chance of coming down as heads or tails when tossed. Mathematicians say that the coin has a probability of 0.5 to be heads and 0.5 to be tails. Because heads or tails are the only possible outcomes, the probability for either heads or tails must add up to one. A coin toss is an example of physical probability, probability that occurs in a physical process, such as rolling a pair of dice or the decay of a radioactive atom. Physical probability means that in such systems, any given event, such as the dice landing on snake eyes, tends to occur at a persistent rate or relative frequency in a long run of trials. We are also familiar with the idea of probabilities as a result of repeated experiments or measurements. When we make repeated measurements of some quantity, we do not get the same answer each time because there may be small random errors for each measurement. Given a set of measurements, classical or frequentist statisticians have developed a powerful collection of statistical tools to estimate the most probable value of the variable and to give an indication of its likely error.