To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter focuses on using Python for statistical analysis in data science. It begins with statistics essentials, teaching how to calculate descriptive statistics like mean, median, variance, and standard deviation using NumPy. The chapter covers data visualization techniques using Matplotlib to create histograms, bar charts, and scatterplots for exploring data patterns. Key topics include importing data using Pandas DataFrames, performing correlation analysis to measure relationships between variables, and conducting statistical inference through hypothesis testing. Students learn to implement t-tests for comparing means between two groups and ANOVA for comparing multiple groups. The chapter emphasizes practical applications through hands-on examples, from analyzing family age data to comparing exam scores across different classes. These statistical techniques form the foundation for more advanced data science work, enabling students to extract meaningful insights from datasets and make data-driven decisions.
Both qualitative and quantitative methods provide rigorous ways to establish and assess causality, though their understandings of causality and techniques for doing so differ. Qualitative and quantitative techniques are generally complementary rather than substitutes or opposites. Common quantitative tools include cross-tabulation, regression, and logit/probit models; which is appropriate typically depends on the level of measurement of the dependent variable. Common qualitative tools include two within-case techniques, process tracing and analytic narratives, and three between-case techniques, case control, structured focused comparison, and content analysis. “Intermediate-N” techniques exist for research questions whose number of cases is greater than that typically used for qualitative analysis but below the threshold for successful quantitative analysis, along with Big Data approaches for extreme-N analysis, content analysis techniques for corpora, and mixed methods approaches.
This chapter demonstrates R’s capabilities for statistical analysis and data science applications. It covers data importing from CSV/TSV files into R dataframes and computing basic statistics (mean, median, mode, variance, standard deviation) using built-in functions.
The chapter explores data visualization with ggplot2, creating histograms, bar charts, pie charts, and scatterplots for effective data presentation. Key statistical concepts include correlation analysis to measure variable relationships and statistical inference through hypothesis testing.
Practical statistical tests covered include t-tests for comparing two group means and ANOVA for comparing multiple groups. The chapter emphasizes R’s strengths in statistical computing, providing hands-on examples with real datasets and demonstrating how to interpret results for data-driven decision making.
Students are guided through learning about comparing the means of two groups / levels using a t-test. The differences between a paired samples t-test and an independent samples t-test are reviewed along with the statistics’ assumptions. When two independent groups do not have equal variances, students are coached through completing a Mann–Whitney U test. Students are also guided through creating charts that can accompany their results in SPSS or R.
A common and unfortunate error in statistical analysis is the failure to account for dependencies in the data. In many studies, there is a set of individual participants or experimental objects where two observations are made on each individual or object. This leads to a natural pairing of data. This editorial discusses common situations where paired data arises and gives guidance on selecting the correct analysis plan to avoid statistical errors.
From observed data, statistical inference infers the properties of the underlying probability distribution. For hypothesis testing, the t-test and some non-parametric alternatives are covered. Ways to infer confidence intervals and estimate goodness of fit are followed by the F-test (for test of variances) and the Mann-Kendall trend test. Bootstrap sampling and field significance are also covered.
The previous chapter considered the following problem: given a distribution, deduce the characteristics of samples drawn from that distribution. This chapter goes in the opposite direction: given a random sample, infer the distribution from which the sample was drawn. It is impossible to infer the distribution exactly from a finite sample. Our strategy is more limited: we propose a hypothesis about the distribution, then decide whether or not to accept the hypothesis based on the sample. Such procedures are called hypothesis tests. In each test, a decision rule for deciding whether to accept or reject the hypothesis is formulated. The probability that the rule gives the wrong decision when the hypothesis is true leads to the concept of a significance level. In climate studies, the most common questions addressed by hypothesis test are whether two random variables (1) have the same mean, (2) have the same variance, or (3) are independent. This chapter discusses the corresponding tests for normal distributions, called the (1) t-test (or difference-in-means test), (2) F-test (or difference-in-variance test), and (3) correlation test.
Biostatistics with R provides a straightforward introduction on how to analyse data from the wide field of biological research, including nature protection and global change monitoring. The book is centred around traditional statistical approaches, focusing on those prevailing in research publications. The authors cover t-tests, ANOVA and regression models, but also the advanced methods of generalised linear models and classification and regression trees. Chapters usually start with several useful case examples, describing the structure of typical datasets and proposing research-related questions. All chapters are supplemented by example datasets, step-by-step R code demonstrating analytical procedures and interpretation of results. The authors also provide examples of how to appropriately describe statistical procedures and results of analyses in research papers. This accessible textbook will serve a broad audience, from students, researchers or professionals looking to improve their everyday statistical practice, to lecturers of introductory undergraduate courses. Additional resources are provided on www.cambridge.org/biostatistics.
In this chapter, the reader is given a survey of two basic approaches for statistical analysis, the quantitative approach focused on measures of effect size and confidence intervals, and the qualitative approach based on significance values and null hypothesis significance testing.
The t-test is a work horse of a lot of statistical analysis in HCI. There are a lot of myths about how robust it is to deviations from normality and other assumptions. However, when faced with practical data, particularly those coming from usability studies, the claims of robustness do not stand up. This chapter reevaluates the t-test as a test for an effect on the location of data. This leads to considering robust measures of location, such as trimmed or Winsorized means and associated Yuen–Welch test as a robust alternative to the traditional t-test.
Likert items and questionnaires are widely used in HCI, particularly to measure user experience. However, there is some confusion over which is the right test to use to analyse data arising from these instruments. Furthermore, this book has proposed several more modern alternatives to traditional statistical tests but there is little evidence if they are better in the context of this particular sort of data. This chapter therefore reports on several simulation studies to compare the variety of tests that can be used to analyse Likert item and questionnaire data. The results suggest that this sort of data best reveals dominance effects and therefore that tests of dominance are the most suitable, and most robust, tests to use.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.