To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In the previous chapter, we saw how to learn from data when the labels or true values associated with them are available. In other words, we knew what was right or wrong and we used that information to build a regression or classification model that could then make predictions for new data. Such a process fell under supervised learning. Now, we will consider the other big area of machine learning where we do not know true labels or values with the given data, and yet we will want to learn the underlying structure of that data and be able to explain it. This is called unsupervised learning.
Why you care: Running A/A tests is a critical part of establishing trust in an experimentation platform. The idea is so useful because the tests fail many times in practice, which leads to re-evaluating assumptions and identifying bugs.
As discussed in Chapter 1, running trustworthy controlled experiments is the scientific gold standard in evaluating many (but not all) ideas and making data-informed decisions. What may be less clear is that making controlled experiments easy to run also accelerates innovation by decreasing the cost of trying new ideas, as the quotation from Moran shows above, and learning from them in a virtuous feedback loop. In this chapter, we focus on what it takes to build a robust and trustworthy experiment platform. We start by introducing experimentation maturity models that show the various phases an organization generally goes through when starting to do experiments, and then we dive into the technical details of building an experimentation platform.
Why you care: Understanding the ethics of experiments is critical for everyone, from leadership to engineers to product managers to data scientists; all should be informed and mindful of the ethical considerations. Controlled experiments, whether in technology, anthropology, psychology, sociology, or medicine, are conducted on actual people. Here are questions and concerns to consider when determining when to seek expert counsel regarding the ethics of your experiments.
Why you care: Guardrail metrics are critical metrics designed to alert experimenters about violated assumptions. There are two types of guardrail metrics: organizational and trust-related. Chapter 7 discusses organizational guardrails that are used to protect the business, and this chapter describes the Sample Ratio Mismatch (SRM) in detail, which is a trust-related guardrail. The SRM guardrail should be included for every experiment, as it is used to ensure the internal validity and trustworthiness of the experiment results. A few other trust-related guardrail metrics are also described here.
William Anthony Twyman was a UK radio and television audience measurement veteran (MR Web 2014) credited with formulating Twyman’s law, although he apparently never explicitly put it in writing, and multiple variants of it exist, as shown in the above quotations.
So far, we have seen data that comes in a file – whether it is in a table, a CSV, or an XML format. But text files (including CSV) are not the best way to store or transfer data when we are dealing with a large amount of them. We need something better – something that allows us not only to store data more effectively and efficiently, but also provides additional tools to process that data. That is where databases come in. There are several databases in use today, but MySQL tops them all in the free, open-source category. It is widely available and used, and thanks to its powerful Structured Query Language (SQL), it is also a comprehensive solution for data storage and processing.