To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we present brief discussions of a few statistical topics not covered in earlier chapters. We first cover structural equation models, factor analysis, and path analysis. In future work fitting regression models in the social sciences, we frequently see reference to one or more of them. In the second section of the chapter, we address in summary form a few topics already discussed but which we believe require some additional attention. For instance, as part of our discussion of ordinary least squares regression, we covered in Chapter 8 the topic of regression diagnostics. But regression diagnostics is not an issue applicable only to OLS regression; so we present here a further discussion. Similarly, we expand with some additional commentary our earlier discussions of addressing issues of survey design (covered in Chapter 10) and multilevel models (covered in Chapter 16).
This chapter covers the two topics of descriptive statistics and the normal distribution. We first discuss the role of descriptive statistics and the measures of central tendency, variance, and standard deviation. We also provide examples of the kinds of graphs often used in descriptive statistics. We next discuss the normal distribution, its properties and its role in descriptive and inferential statistical analysis.
This chapter is an introduction to Stata. We note the essential features and commands of the Stata statistical software package. Our objective is to familiarize the reader with the skills that will allow them to understand and complete the examples in the later chapters of our book. We describe the main components of the interface, followed by those of each Stata file option, e.g., do file, log file, and graphs. The third section of this chapter gives examples that readers can use to practice the most commonly used commands. Last, we summarize best practices for data management.
When we use ordinary least squares (OLS) regression with data sampled from a larger population, there are several assumptions that need to be met for the results to be reliably extended to the larger population. In the first part of this chapter, we discuss each of these assumptions. We note some of the problems that will occur if one or more of them are violated. In the second part of the chapter, we turn to issues of regression diagnostics, that is, methods and approaches for determining whether the assumptions are met in the sample data. In the third and last section of the chapter, we discuss the topic of robust regression. We note that, under ideal conditions, OLS regression is preferred over other regression methods. But sometimes when some of the OLS regression assumptions are not met, the OLS regression breaks down and should not be used for the analysis. In such situations, regression methods less demanding than OLS may be introduced. Robust regression is one such method. It sometimes performs in a more satisfactory manner than OLS when some of the OLS assumptions are not met and when there are other statistical problems in the analysis.
In this chapter we cover the modeling of a dependent variable that is neither continuous nor categorical, but is a count of the number of events. Dependent count variables measure the number of times an event has occurred. An example from demography is the number of children ever born to a woman or man in their lifetimes. Frequently, count variables are treated as though they are continuous and unbounded, and ordinary least squares (OLS) models are used to estimate the effects of independent variables on their occurrence. But if the OLS assumptions we discussed in Chapter 8 are not met, then the use of OLS for count outcomes may result in inefficient, inconsistent, and biased estimates. There are many kinds or classes of models that may be used to estimate count dependent variables. In this chapter we consider five models: (1) the Poisson regression model; (2) the negative binomial regression model; (3) the zero-inflated count model; (4) the zero-truncated count model; and (5) the hurdle regression model.
A major concern in the social sciences is understanding and explaining the relationship between two variables. We showed in Chapter 5 how to address this issue using tabular presentations. In this chapter we show how to address the issue statistically via regression and correlation. We first cover the two concepts of regression and correlation. We then turn to the issue of statistical inference and ways of evaluating the statistical significance of our results. Since most social science research is undertaken using sample data, we need to determine whether the regression and correlation coefficients we calculate using the sample data are statistically significant in the larger population from which the sample data were drawn.
Dental enamel is the most resistant tissue in fossils and archaeological remains. Seen under the microscope, its complex mineral structure preserves a wealth of detailed information about the owner of the teeth. This chapter introduces the biology underlying enamel development and describes its structure in detail, with many images. It explains the daily developmental rhythm which creates this structure and shows how it can be used to reconstruct a chronology of the formation of the tooth crown. This has important applications for establishing the developmental schedule of an individual, and for understanding the variations that lead to a variety of defects in the smooth crown surface (commonly known as enamel hypoplasia). It is also an important method for estimating chronological age in children’s remains.
In living children, teeth develop to a clear schedule which shows only limited differences between boys and girls, populations, diet and disease experience. By contrast, bone development is greatly affected by these factors. This makes dental development and important focus for recording the maturity of an individual represented by human remains. The chapter outlines the large range of different approaches to recording dental development and reviews their results. Dental development is widely used to estimate age-at-death in archaeological remains but the chapter cautions on the applicability of modern clinical standards and on the differences between a chronological age and a developmental stage. The pace of childhood development in living humans is very slow in comparison with non-human primates and the chapter ends with a discussion of the central role that histological studies of dental development in hominid fossils play in understanding this situation.
In this chapter we explore the statistical software program called R along with the integrated development environment known as RStudio. We provide a brief review of the history of R followed by guidance on how to download and install R and RStudio. Next, we explore features of the program including a basic review of the graphical interface. We then turn to working with basic commands, loading and saving data, and provide examples of working with packages in R.
Anatomy is the foundation of any work in dental anthropology, which starts with the correct identification of teeth. This chapter outlines general anatomical concepts and schemes for the labelling of teeth. Detailed anatomical details are given with sections for deciduous and permanent incisors and canines, permanent premolars, permanent molars and deciduous premolars. They are based on human teeth but make comparisons with living non-human primates and fossil hominids. Problems and confusions in identification are highlighted, including differences between sides, similarities between different teeth, variation in form and the effects of wear. Roots as well as crowns are included, together with the form of the pulp chamber.
This chapter extends to the multivariate context our discussions of ordinary least squares (OLS) regression in the Chapter 6. We first address the logic of multiple regression. We next cover the interpretation of the multiple regression coefficient intercept and slopes, paying particularly close attention to the interpretation of the b slopes. We then address model fit in the multivariate context and extend our discussions of the F-test and the coefficient of multiple determination (R2) by including the standard error of estimate and the Bayesian information criterion (BIC).
Multivariate Analysis focuses on the most essential tools for analyzing compositional and/or multivariate data sets that often emerge when performing geochemical analysis. The chapter starts by introducing groundwater contamination in one of the world’s largest agricultural areas: the Central Valley of California. The goal is to use data science to discover the processes that caused contaminations, whether geogenic or anthropogenic. Knowing these causes aids deciding on mitigation actions. The reader will take a path of discovery through several protocols of applying data-scientific tools to unmask the processes, including principal component analysis, multivariate outlier detection and factor analysis. The key to using these tools is to understand the compositional nature of geochemical datasets, and how compositions need to be treated appropriately to draw meaningful conclusions, a field termed compositional data analysis. This chapter emphasizes the need for data scientists to work with domain experts.
Most social science research analyzes data from samples drawn from larger populations. However, most of the standard statistical methods used for analyzing the data are based on the assumption that the sample data have been drawn with simple random samples. But few probability samples are completely random. Some sample respondents may be more heavily weighted than other respondents, and some respondents may be included in the sample by virtue of their membership in groups based on race, sex, age, and other characteristics. Nonetheless, many investigators treat their samples as random where each person in the larger population has an equal chance or probability of being included in the sample. We discuss in this chapter the methods that need to be followed to enable researchers to make correct inferences to the larger population with sample data that are not completely random. We review the three main types of probability samples. Then we discuss how and why researchers need to address and take into account the design of their samples.