To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter describes how to model multiple discrete quantities as discrete random variables within the same probability space and manipulate them using their joint pmf. We explain how to estimate the joint pmf from data, and use it to model precipitation in Oregon. Then, we introduce marginal distributions, which describe the individual behavior of each variable in a model, and conditional distributions, which describe the behavior of a variable when other variables are fixed. Next, we generalize the concepts of independence and conditional independence to random variables. In addition, we discuss the problem of causal inference, which seeks to identify causal relationships between variables. We then turn our attention to a fundamental challenge: It is impossible to completely characterize the dependence between all variables in a model, unless they are very few. This phenomenon, known as the curse of dimensionality, is the reason why independence assumptions are needed to make probabilistic models tractable. We conclude the chapter by describing two popular models based on such assumptions: Naive Bayes and Markov chains.
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
This chapter discusses how to build probabilistic models that include both discrete and continuous variables. Mathematically, this is achieved by defining them as random variables within the same probability space. In practice, the variables are manipulated using their marginal and conditional distributions. We define the conditional pmf of a discrete random variable given a continuous variable, and the conditional probability density of a continuous random variable given a discrete variable. We use these objects to build mixture models and apply them to model height in a population. Next, we describe Gaussian discriminant analysis, a classification method based on mixture models with Gaussian conditional distributions, and apply it to diagnose Alzheimer's disease. Then, we explain how to perform clustering using Gaussian mixture models and leverage the approach to cluster NBA players. Finally, we introduce the framework of Bayesian statistics which enables us to explicitly encode our uncertainty about model parameters, and use it to analyze poll data from the 2020 United States presidential election.
This chapter introduces random variables and explains how to use them to model uncertain numerical quantities that are discrete. We first provide a mathematical definition of random variables, building upon the framework of probability spaces. Then, we explain how to manipulate discrete random variables in practice, using their probability mass function (pmf), and describe the main properties of the pmf. Motivated by an example where we analyze Kevin Durant's free-throw shooting, we define the empirical pmf, a nonparametric estimator of the pmf that does not make strong assumptions about the data. Next, we define several popular discrete parametric distributions (Bernoulli, binomial, geometric, and Poisson), which yield parametric estimators of the pmf, and explain how to fit them to data via maximum-likelihood estimation. We conclude the chapter by comparing the advantages and disadvantages of nonparametric and parametric models, illustrated by a real-data example, where we model the number of calls arriving at a call center.
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
This chapter begins by defining an averaging procedure for random variables, known as the mean. We show that the mean is linear, and also that the mean of the product of independent variables equals the product of their means. Then, we derive the mean of popular parametric distributions. Next, we caution that the mean can be severely distorted by extreme values, as illustrated by an analysis of NBA salaries. In addition, we define the mean square, which is the average squared value of a random variable, and the variance, which is the mean square deviation from the mean. We explain how to estimate the variance from data and use it to describe temperature variability at different geographic locations. Then, we define the conditional mean, a quantity that represents the average of a variable when other variables are fixed. We prove that the conditional mean is an optimal solution to the problem of regression, where the goal is to estimate a quantity of interest as a function of other variables. We end the chapter by studying how to estimate average causal effects.
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
This chapter covers principal component analysis and low-rank models, which are popular techniques to process high-dimensional datasets with many features. We begin by defining the mean of random vectors and random matrices. Then, we introduce the covariance matrix which encodes the variance of any linear combination of the entries in a random vector, and explain how to estimate it from data. We model the geographic location of Canadian cities as a running example. Next, we present principal component analysis (PCA), a method to extract the directions of maximum variance in a dataset. We explain how to use PCA to find optimal low-dimensional representations of high-dimensional data and apply it to a dataset of human faces. Then, we introduce low-rank models for matrix-valued data and describe how to fit them using the singular-value decomposition. We show that this approach is able to automatically identify meaningful patterns in real-world weather data. Finally, we explain how to estimate missing entries in a matrix under a low-rank assumption and apply this methodology to predict movie ratings via collaborative filtering.
This chapter introduces continuous random variables which enable us to model uncertain continuous quantities. We again begin with a formal definition, but quickly move on to describe how to manipulate continuous random variables in practice. We define the cumulative distribution function and quantiles (including the median) and explain how to estimate them from data. We then introduce the concept of probability density and describe its main properties. We present two approaches to obtain nonparametric models of probability densities from data: The histogram and kernel density estimation. Next, we define two celebrated continuous parametric distributions – the exponential and the Gaussian – and show how to fit them to data using maximum-likelihood estimation. We use these distributions to model the interarrival time of calls at a call center, and height in a population, respectively. Finally, we discuss how to simulate continuous random variables via inverse transform sampling.
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
Le Liang, Southeast University, Nanjing,Shi Jin, Southeast University, Nanjing,Hao Ye, University of California, Santa Cruz,Geoffrey Ye Li, Imperial College of Science, Technology and Medicine, London
This chapter introduces probability. We begin with an informal definition which enables us to build intuition about the properties of probability. Then, we present a more rigorous definition, based on the mathematical framework of probability spaces. Next, we describe conditional probability, a concept that makes it possible to update probabilities when additional information is revealed. In our first encounter with statistics, we explain how to estimate probabilities and conditional probabilities from data, as illustrated by an analysis of votes in the United States Congress. Building upon the concept of conditional probability, we define independence and conditional independence, which are critical concepts in probabilistic modeling. The chapter ends with a surprising twist: In practice, probabilities are often impossible to compute analytically! Fortunately, the Monte Carlo method provides a pragmatic solution to this challenge, allowing us to approximate probabilities very accurately using computer simulations. We apply w 3 × 3 basketball tournament from the 2020 Tokyo Olympics.