Search

Biostatistics with R

An Introductory Guide for Field Biologists
Jan Lepš, Petr Šmilauer
Published online:

18 September 2020

Print publication:

30 July 2020
- Textbook
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Biostatistics with R provides a straightforward introduction on how to analyse data from the wide field of biological research, including nature protection and global change monitoring. The book is centred around traditional statistical approaches, focusing on those prevailing in research publications. The authors cover t-tests, ANOVA and regression models, but also the advanced methods of generalised linear models and classification and regression trees. Chapters usually start with several useful case examples, describing the structure of typical datasets and proposing research-related questions. All chapters are supplemented by example datasets, step-by-step R code demonstrating analytical procedures and interpretation of results. The authors also provide examples of how to appropriately describe statistical procedures and results of analyses in research papers. This accessible textbook will serve a broad audience, from students, researchers or professionals looking to improve their everyday statistical practice, to lecturers of introductory undergraduate courses. Additional resources are provided on www.cambridge.org/biostatistics.

11 - Hierarchical ANOVA, Split-Plot ANOVA, Repeated Measurements
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 164-182
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter handles more advanced types of ANOVA models, those that contain multiple explanatory variables (factors). We start with the hierarchical ANOVA, illustrated by two example studies, and we describe how the variation of the response variable is decomposed, introducing the concept of variance components. We set apart and discuss the properties of the split-plot ANOVA model and we illustrate its use by evaluating the results of a field experiment. Finally, we discuss the repeated measurements ANOVA, which is a very important model for analysing both monitoring data and data from manipulative experiments. Although it is typically analysed using a type of a split-plot ANOVA, the repeated measurements ANOVA model has further assumptions that are discussed in the text. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the nlme, lme4, effects, and car packages.

First Steps with R Software
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 343-362
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

13 - Correlation: Relationship Between Two Quantitative Variables
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 206-218
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We start by comparing the use of correlation (namely the Pearson linear correlation coefficient), for describing the relationship of two variables, with the approach based on a simple linear regression. Then we describe how to test the hypothesis that there is no correlation between the two variables within the sampled population. We conclude this topic by discussing the power of this test. We then move to the nonparametric correlation coefficients suitable for measuring the strength of a monotonic relationship between two variables. An additional section focuses on how to appropriately interpret correlation strength and significance, factoring in the specific questions being asked. Finally, we discuss the differences between correlation-based and causal relationships. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the pwr package.

Acknowledgements
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp xvii-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

15 - Generalised Linear Models
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 239-251
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We start by outlining how the generalised linear models (GLM) extend the classical linear models, namely by the use of the link function, transforming the values of the response variable predicted by the model. We also present the types of statistical distribution we can choose for the unexplained (residual) variation and relate them to the most commonly encountered forms of biological data. The decomposition of the variation in the response variable, using the analysis of deviance, is described together with the concepts of maximum likelihood and of the null model. We also explain how to handle overdispersion, which is the larger-than expected residual variation in GLMs with an assumed Poisson or binomial distribution. We show the ways we can select predictors for inclusion in our model, focusing on the idea of model parsimony, measured by AIC criterion. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.

5 - Student’s t Distribution
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 65-83
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The t distribution plays an important role in many statistical tests and in the estimation of parametric confidence intervals. We introduce properties of the t distribution and its relationship to the normal distribution. The single sample t test is described, as well as the related paired t test. The concept of a one-sided test is then introduced and compared with the two-sided test. We explain the meaning of confidence intervals and show their calculation. We discuss the assumptions of the t tests introduced in this chapter. A separate section is devoted to a detailed treatment of how to present the variability in our data, and the precision of mean value estimation, both numerically and visually. The reporting of standard deviations, standard errors, and confidence intervals is compared and discussed. We round off this chapter by outlining how to calculate the sample size required to attain a specified precision for the mean estimate. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.

16 - Regression Models for Non-linear Relationships
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 252-260
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Although there are many ways of describing nonlinear relationships among variables, this chapter focuses primarily on the polynomial regression, which is related to the multiple linear regression model. We pay particular attention to models using the second-order polynomial. These models are often employed in the field of community ecology to describe unimodal changes of species abundances along environmental gradients. The downsides of using polynomial regression are also addressed. We bring this chapter to a close by touching on the non-linear least-squares regression models and the appropriate context in which they should be applied. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the nlme package.

22 - Ordination
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 326-342
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter compares two major families of ordination methods, the unconstrained and constrained ordination. We start by describing the tasks achieved with the help of unconstrained ordination and illustrate how to interpret the resulting ordination diagrams. The methods of constrained ordination allow us to build and test statistical models describing the effects of predictors (such as environmental descriptors) on multivariate response data (such as the composition of biotic communities). We discuss linear discriminant analysis separately, which aims to use a set of numerical variables to predict the membership of observations in a priori defined classes. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, in this case employing the vegan package.

8 - One-Way Analysis of Variance (ANOVA) and Kruskal–Wallis Test
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 104-128
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The analysis of variance is introduced as a method of testing differences among means of more than two groups of observations. We outline the basic assumptions of ANOVA models, focusing on the expected homogeneity of variances across the compared groups, which is assessed by the Bartlett test. The decomposition of variability in the response variable (its total sum of squares) into among-group and within-group (residual) variation leads to the definition of the F-ratio, which is the central test statistic in ANOVA models. We also introduce the distinction between fixed and random effects and discuss the F test power as well as the robustness of the test to violations of ANOVA model assumptions. The first part of the chapter, dealing with one-way ANOVA, concludes with a description of the multiple comparisons procedure. We focus on two types - Tukey's test and Dunnett's test. This chapter concludes by presenting a nonparametric counterpart of one-way ANOVA, the Kruskal-Wallis test. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the multcomp package.

21 - Classification
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 317-325
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

After a general introduction to multivariate statistical analyses, we focus on describing the task of multivariate classification, distinguishing its non-hierarchical and hierarchical forms. Focusing on hierarchical agglomerative classification methods (cluster analysis), we highlight the important decisions that must be made regarding the measurement of dissimilarity (distance) among objects. Following this, we explain the construction of dendrograms representing this hierarchical classification. We also briefly mention divisive classification methods, focusing on the TWINSPAN method. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, in this case employing the cluster package.

Frontmatter
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Two-Way Analysis of Variance
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 129-150
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The two-way ANOVA , is applied to data with a factorial arrangement (and its extensions to more factors), and is an important tool for analysing data from experimental studies. We start by characterising the properties of a factorial design and compare it with a hierarchical design. We introduce two important experimental concepts here - the ideas of a balanced design and of a proportional design. We then describe the two-way ANOVA model, including an explanation of the interaction term and its use in ANOVA models. We outline some basic types of correct experimental designs, including complete randomised blocks, and contrast them with incorrect designs resulting in pseudo-replicated observations. Separate sections deal with ANOVA model specification for randomised blocks and Latin square designs, and with the specific issues of the multiple comparisons procedure in ANOVA models with multiple factors. A nonparametric counterpart of the randomised complete block ANOVA - the Friedman test - is also introduced. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the multcomp package.

12 - Simple Linear Regression: Dependency Between Two Quantitative Variables
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 183-205
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The simple linear regression is one of the most frequently employed statistical model. Linear regression is used to describe the relationship between two numerical variables, but it also serves as a building block for more complex statistical methods, such as the multivariate ordination. We start by comparing the concepts of regression and correlation, before introducing the equation of the simple linear regression. We also explain the decomposition of the observed values of the response variable into fitted values and regression residuals. Following this is a discussion regarding the hypotheses that can be tested for a regression model, distinguishing the F-ratio based test from the t tests of individual regression coefficients. The calculation of confidence and prediction intervals allow us to enhance diagrams displaying the fitted model. A separate section is devoted to the graphs of regression diagnostics and their interpretation, as well as to the effects of log-transforming the variables to linearise their relationship. Additional specialised sections deal with regression through the origin and its possible dangers, regression using a predictor with random variation, and with linear calibration. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the effects and lmodel2 packages.

Contents
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp v-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

3 - Contingency Tables
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 39-54
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The contingency tables are used to quantify and test relationships between two or more categorical (qualitative) variables. Taking a simple example of a two-way contingency table (relating two categorical variables), we illustrate the process of calculating the frequencies of category combinations expected under the assumption of variable independence, and show how observed and expected frequencies are compared within the chi-square test statistic. We also briefly describe the task of measuring the strength of association between two categorical variables, which is important for evaluating the co-occurrence of biological taxa. We illustrate the differences between statistical and causal relationships between variables, highlighting the essential role of manipulative experiments for revealing causality. Finally, we demonstrate the possible ways of visualising contingency tables and their test results. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the vcd package.

14 - Multiple Regression and General Linear Models
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 219-238
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

We explain linear regression models with multiple predictors, including an overview of partial regression coefficients. The related concept of partial correlation is discussed in a separate section. We also contrast the overall model test using the F-ratio statistic and the t tests of partial effects of individual predictors. The adjusted coefficient of determination is presented as a more accurate way of conveying the explanatory power of a regression model. Finally, we characterise the family of general linear models, focusing specifically on analysis of covariance (ANCOVA). We provide examples of ANCOVA models and demonstrate their usefulness when applied to the analysis of biological experiments. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use, including the effects and ppcor packages.

Index
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 363-366
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Comparing Two Samples
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 84-91
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The core of this chapter is the two-sample t test which compares means among two groups of observations, but we start with comparing the variances of two groups using the F test. We discuss the assumptions of the two-sample t test and we also present the approximate Welch test, used when the assumption of variance homogeneity is violated. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.

7 - Non-parametric Methods for Two Samples
Jan Lepš, University of South Bohemia, Czech Republic, Petr Šmilauer, University of South Bohemia, Czech Republic
Book:

Biostatistics with R

Published online:

18 September 2020

Print publication:

30 July 2020, pp 92-103
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

After a general introduction to nonparametric tests, we review two useful nonparametric tests for comparing two samples. The Mann-Whitney test is a counterpart of the two-sample t test, but it uses the ranks of recorded values instead. Despite this test often being described as a test of the differences between mean values, this only applies when both samples come from distributions where both distribution curves are of the same shape. The Wilcoxon test for paired observations corresponds to the parametric paired t test. We also introduce permutation tests, which represent another group of non-parametric methods for hypothesis testing. The methods described in this chapter are accompanied by a carefully-explained guide to the R code needed for their use.

Search Results

Refine search

Refine search

Actions for selected content:

86 results

Biostatistics with R

11 - Hierarchical ANOVA, Split-Plot ANOVA, Repeated Measurements

Summary

First Steps with R Software

13 - Correlation: Relationship Between Two Quantitative Variables

Summary

Acknowledgements

15 - Generalised Linear Models

Summary

5 - Student’s t Distribution

Summary

16 - Regression Models for Non-linear Relationships

Summary

22 - Ordination

Summary

8 - One-Way Analysis of Variance (ANOVA) and Kruskal–Wallis Test

Summary

21 - Classification

Summary

Frontmatter

9 - Two-Way Analysis of Variance

Summary

12 - Simple Linear Regression: Dependency Between Two Quantitative Variables

Summary

Contents

3 - Contingency Tables

Summary

14 - Multiple Regression and General Linear Models

Summary

Index

6 - Comparing Two Samples

Summary

7 - Non-parametric Methods for Two Samples

Summary

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

86 results

Biostatistics with R

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary