To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Why you care: What is the point of running an experiment if you cannot analyze it in a trustworthy way? Variance is the core of experiment analysis. Almost all the key statistical concepts we have introduced are related to variance, such as statistical significance, p-value, power, and confidence interval. It is imperative to not only correctly estimate variance, but also to understand how to achieve variance reduction to gain sensitivity of the statistical hypothesis tests.
In the previous chapter we were introduced to the concept of learning – both for humans and for machines. In either case, a primary way one learns is first knowing what is a correct outcome or label of a given data point or a behavior. As it happens, there are many situations when we have training examples with correct labels. In other words, we have data for which we know the correct outcome value. This set of data problems collectively fall under supervised learning.
Sherlock Holmes would have loved living in the twenty-first century. We are drenched in data, and so many of our problems (including a murder mystery) can be solved using large amounts of data existing at personal and societal levels.
These days it is fair to assume that most people are familiar with the term “data.” We see it everywhere. And if you have a cellphone, then chances are this is something you have encountered frequently. Assuming you are a “connected” person who has a smartphone, you probably have a data plan from your phone service provider.
Why you care: In most experiment analyses, we assume that the behavior of each unit in the experiment is unaffected by variant assignment to other units. This is a plausible assumption in most practical applications. However, there are also many cases where this assumption fails.
Why you care: To design and run a good online controlled experiment, you need metrics that meet certain characteristics. They must be measurable in the short term (experiment duration) and computable, as well as sufficiently sensitive and timely to be useful for experimentation. If you use multiple metrics to measure success for an experiment, ideally you may want to combine them into an Overall Evaluation Criterion (OEC), which is believed to causally impact long-term objectives. It often requires multiple iterations to adjust and refine the OEC, but as the quotation above, by Eliyahu Goldratt, highlights, it provides a clear alignment mechanism to the organization.
Why you care: Randomized controlled experiments are the gold standard for establishing causality, but sometimes running such an experiment is not possible. Given that organizations are collecting massive amounts of data, there are observational causal studies that can be used to assess causality, although with lower levels of trust. Understanding the space of possible designs and common pitfalls can be useful if an online controlled experiment is not possible.
We started this book with a glimpse into data and data science. Then we spent the rest of the book, especially Parts II and III, learning various tools and techniques to solve data problems of different kinds. Our approach to all of this has been hands-on. And now we have come full circle. As we wrap up, it is important to take a look at where that data comes from, and how we should broadly think about analyzing it. This final chapter, therefore, is dedicated to those two goals, as you will see in the next two sections. One section is an overview of some of the most common methods for collecting/soliciting data, and the other provides information and ideas about how to approach a data analysis problem with broad methods. Then the final section provides a commentary on evaluation and experimentation.
“Just as trees are the raw material from which paper is produced, so too, can data be viewed as the raw material from which information is obtained.” To present and interpret information, one must start with a process of gathering and sorting data. And for any kind of data analysis, one must first identify the right kinds of information sources.
previous chapter, we discussed different forms of data. The height–weight data we saw was numerical and structured. When you post a picture using your smartphone, that is an example of multimedia data. The datasets mentioned in the section on public policy are government or open data collections.
Why you care: We begin with an end-to-end example of the design (with explicit assumptions), execution, and interpretation of an experiment to assess the importance of speed. Many examples of experiments focus on the User Interface (UI) because it is easy to show examples, but there are many breakthroughs on the back-end side, and as multiple companies discovered: speed matters a lot! Of course, faster is better, but how important is it to improve performance by a tenth of a second? Should you have a person focused on performance? Maybe a team of five? The return-on-investment (ROI) of such efforts can be quantified by running a simple slowdown experiment. In 2017, every tenth of a second improvement for Bing was worth $18 million in incremental annual revenue, enough to fund a sizable team. Based on these results and multiple replications at several companies through the years, we recommend using latency as a guardrail metric.