We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Non-parametric tests, in particular rank-based tests, are often proposed as robust alternatives to parametric tests like t-tests when the assumptions of parametric tests are violated. However, non-parametric tests have their own assumptions which, when not considered, can lead to misinterpretation and unsound conclusions based on those tests. This chapter explores these problems and differentiates between the more and less robust non-parametric tests. Modern robust alternative non-parametric tests are suggested to replace the less robust tests.
Traditional statistical testing, sometimes called null hypothesis significance testing (NHST), and the use of p-values has come under strong criticism. This chapter looks at the social issues of achieving significance in statistics that has led to the problems of NHST. It also discusses, though, how the framework of severe testing provides a way to understand NHST as a means of uniting experiments, statistics and evidence for research ideas. When viewed this way, NHST can still be an important approach to data analysis in HCI.
The focus of statistical tests on significance can lead researchers to desperately seek significance, particularly when an experiment has 'failed'. However, this chapter tries to make clear, using the framework of severe testing, the problem of seeking significance at any cost and the resulting weakening of results based on over-testing or fishing for significance. The chapter proposes some rules to guide the researcher to both explore data thoroughly but not go too far in pursuit of significance.
Analysis of variance (ANOVA) is a family of tests widely used in HCI but these tests are not as robust as claimed by those who use them. This chapter looks at exactly what ANOVAs are testing and therefore what makes suitable robust alternatives to ANOVA when the assumptions of ANOVA are not met.
This chapter discusses how statistics support scientific practice by providing evidence for new ideas despite natural variation we see between people, systems and contexts. In particular, the frameworks of severe testing and new experimentalism are used to show how experiments can add to knowledge in HCI, even in the absence of strong theories of interaction.
Likert items are widely used in HCI research both as a convenient measure on their own and in questionnaires to give a unified instrument. Researchers often have questions about how best to format Likert items and this chapter looks at the most common issues of the number of response options, whether to have a midpoint and how to label the options. There are clear recommendations based on the state-of-the-art research on these topics.
Correlations are important to see connections between different facets of people and their experiences with systems. However, consideration of correlation coefficients alone can be misleading and, in particular, this chapter discusses how outliers and clustering can distort an interpretation of correlation. It also raises a note of caution for the many other methods that implicitly use correlation.
Outliers are a problem for statistical analysis as they can have a disproportionate effect on means and statistical tests that rely on means. This chapter looks first at how to identify outliers and then, from thinking about what might cause an outlier, how best to analyse data when there are outliers.
A common question of researchers learning statistics is which test to use. However, this greatly depends on the research question and what experiments have been or might be done. It is not possible to give a simple answer to this question. Instead, this chapter provides three principles that can help guide researchers to devise the right experiment and to choose the right test to address their particular research question. These three principles arearticulation, simplicity and honesty.
Likert items and questionnaires are widely used in HCI, particularly to measure user experience. However, there is some confusion over which is the right test to use to analyse data arising from these instruments. Furthermore, this book has proposed several more modern alternatives to traditional statistical tests but there is little evidence if they are better in the context of this particular sort of data. This chapter therefore reports on several simulation studies to compare the variety of tests that can be used to analyse Likert item and questionnaire data. The results suggest that this sort of data best reveals dominance effects and therefore that tests of dominance are the most suitable, and most robust, tests to use.
Bayesian statistics are often presented as a better, modern alternative to the Frequentist approaches centred around NHST and the resulting obsession with statistical significance. This chapter outlines the basic ideas of Frequentist and Bayesian statistics. It raises critiques of the Frequentist approach but also points out constraints on the Bayesian approach that are often omitted or overlooked. In particular, the chapter discusses how both Bayesian and Frequentist approaches rely on a move from statistical hypotheses to substantive hypotheses that cannot be justified by consideration of the statistics alone. Instead, both approaches can lead to sound knowledge through a care for data analysis, tied to the experiments that generate the data.
Many statistical tests that are commonly used rely on the assumption that data are normally distributed. This chapter discusses why normality commonly occurs in statistics but also how, in many practical situations in HCI, it is not safe to assume normality. It also shows how tests for normality are not meaningful or useful. Instead, where normality is in doubt, analysis should be more careful and use suitable alternative tests.
Effects and the size of effects are proposed as a key way to overcome the limitations of significance testing in modern statistics. However, traditionally, effects are usually added on to the statistical testing procedure. This chapter proposes instead thinking first about what effects to look for in the context of the research and its maturity within the discipline of HCI. The main types of effect, changes in location, stochastic dominance and (co)variation, are described. Consideration of which of these effects that current best knowledge can predict leads to choosing the test most suitable to find those effects.
Cronbach's alpha is often used as a key measure of reliability in questionnaires developed in HCI. However, the state-of-the-art literature on Cronbach's alpha shows that it is itself not a reliable measure. This chapter demonstrates the problems and leads to a more nuanced interpretation of reliability, using both Cronbach's alpha and other more modern alternatives.