To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Spatial Data Aggregation is defined in our book as the integration of heterogeneous spatial or spatio-temporal data sources, with the aim of predicting the occurrence of hazards and resources. Much of the transition to net zero carbon relies on changing from fossil fuels to materials for renewable energy and batteries. Mineral exploration is therefore key to achieve this goal. Readers will engage in an active mineral exploration for battery metals in Cape Smith, Canada. Readers will learn how data science can be an effective guide in geological field work. To achieve this, several spatial information sources will be used, such as remote sensing, geophysical, and geochemical data. These sources need to be aggregated to guide field geologists in locating areas of interest with the aim of collecting samples. In that context, we introduce Bayes’ rule and Bayesian reasoning about knowledge and information. We emphasize the counterintuitive results of Bayesian reasoning: rare events are very difficult to predict even with very accurate data. Next, we cover the alternative to Bayes’: logistic regression. We emphasize the advantage and disadvantage of these opposite approaches.
In this chapter we introduce developing and interpreting multilevel models. We first define multilevel models and explore how this approach is an improvement on disaggregation and aggregation of data across multiple levels. We then work through four different multilevel models. We provide examples of what kinds of questions can be answered by each model and how to interpret the statistical output. We then explore some additional issues in fitting multilevel models in Stata and consider additional applications of multilevel models.
Logistic regression is not limited to the modeling of binary dependent variables. It may be extended to the modeling of dependent variables with three or more categories that are either ordered or are unordered. In this chapter we discuss logistic regression of a multi-categorical dependent variable with ordered categories. An ordinal variable is one that is multi-categorical, and its categories are ordered. For example, one’s quality of life might be classified as “excellent,” “very good,” “good,” “fair,” or “poor.” Although these categories might be coded consecutively, 1, 2, 3, 4, and so forth, the dependent variable is not continuous. The responses may be coded from 1 = “poor” to 5 = “excellent.” But we do not know that the distances between each contiguous pair of responses is the same. Even though the responses might be coded as 1 to 5, we should not use an OLS regression model to predict a dependent variable such as the person’s categorical response to a quality of life question. We should use a statistical model that does not assume that the distances between any pair of categories is not the same. This chapter focuses on ordinal logistic regression.
Review of mathematical and statistical concepts includes some foundational materials such as probability densities, Monte Carlo methods and Bayes’ rule are covered. We provide concept reviews that provide additional learning to the previous chapters. We aim to generate first an intuitive understanding of statistical concepts, then, if the student is interested, dive deeper into the mathemetical derivations. For example, principal component analysis can be taught by deriving the equations and making the link with eigenvalue decomposition of the covariance matrix. Instead, we start from simple two- and three-dimensional datasets and appeal to the student’s insight into the geometrical aspect: the study of an ellipse, and how we can transform it to a circle. This geometric aspect is explained without equations, but instead with plots and figures that appeal to intuition starting from geometry. In general, it is our experience that students in the geosciences retain much more practical knowledge when presented with material starting from case studies and intuitive reasoning.
Dental occlusion is the way in which teeth fit into the mouth, side by side in the same jaw and between their chewing surfaces when the jaws are closed. It shows much variation in living people, many of whom have their occlusion adjusted by dentists. In archaeological and fossil remains there is much less variation and there has been considerable research and discussion aimed at finding the explanation. This chapter provides a concise introduction to the clinical background and outlines methods for recording occlusal variation. It follows this with a critical review of the evidence for a cause of the high prevalence of occlusal anomalies today.
Extreme Value Statistics focuses on predicting extremes larger than observed in datasets. An important area of application is natural hazards. In the chapter, we use diamond sizes and volcano eruptions as two specific examples. We start this chapter by focusing on graphical techniques, in particular quantile–quantile plots to analyze extremal data. We show why the exponential quantile plot is an essential tool in extremal analysis. One challenge in extremal analysis is to select a suitable probability distribution model to estimate extremes. Instead of making derivations of theoretical models, we illustrate how these models emerge from intuitive Monte Carlo experiments. The key statistical parameter in extreme value statistical models is the extreme value index. We link this index to quantile plots such as the Pareto quantile plot. We conclude with practical examples of predicting rare large diamonds, as well the return period of large volcano eruptions from a historical dataset.
This first chapter introduces our unique approach to teaching statistics. We note that while we review the statistical formulas for each method, we focus on the practical component of statistical analysis. We teach the readers how to apply and interpret the statistical methods and results. We then briefly describe the book’s content, which includes a concise explanation of the statistical techniques covered in each chapter. We end the chapter with suggestions on using the book to gain maximum benefit.
This chapter covers how to develop and interpret statistical tables and cross-tabulations. We begin by exploring the basic structure and components of tables starting with univariate tables. We then describe how to develop and interpret bivariate tables and introduce multivariate tables. Finally, we conclude the chapter with general recommendations about table design and how best to communicate statistical information in table form.
This chapter reviews several methods for addressing address the statistical problem of missing data. We first explain how missing data can affect different components of the study design and the statistical analyses in such a way that the validity of the findings may become questionable. We next describe several methods to address the missing data problem and show why some may be problematic. We explain why multiple imputation (MI) and maximum likelihood (ML) are the preferred methods for addressing missing data issues. We then present an example using Stata, focusing on one of the preferred methods, multiple imputation. Lastly, within the context of an analysis of adolescent pregnancy, we use several methods to handle missing data and show how the analysis results may differ depending on which missing data method is used.