Search

Analysis of Variance and Covariance

How to Choose and Construct Models for the Life Sciences
C. Patrick Doncaster, Andrew J. H. Davey
Published online:

13 November 2009

Print publication:

30 August 2007
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Analysis of variance (ANOVA) is a core technique for analysing data in the Life Sciences. This reference book bridges the gap between statistical theory and practical data analysis by presenting a comprehensive set of tables for all standard models of analysis of variance and covariance with up to three treatment factors. The book will serve as a tool to help post-graduates and professionals define their hypotheses, design appropriate experiments, translate them into a statistical model, validate the output from statistics packages and verify results. The systematic layout makes it easy for readers to identify which types of model best fit the themes they are investigating, and to evaluate the strengths and weaknesses of alternative experimental designs. In addition, a concise introduction to the principles of analysis of variance and covariance is provided, alongside worked examples illustrating issues and decisions faced by analysts.

7 - Unreplicated designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 229-236
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Every model in Chapters 2 and 3 has one or more equivalents without full replication. For model 2.1 it is 1.1, for 2.2 it is 2.1, for 3.1 it is 4.1 or 6.1, for 3.2 it is 4.2 or 6.2, for 3.3 it is 5.6 or 6.3, and for 3.4 it is 3.1. Here we give two further versions of factorial models 3.1 and 3.2 without full replication. The lack of replicated sampling units means that at least one of the factors must be random, as demonstrated by model 7.1(i) below in comparison to (ii) and (iii). Factorial designs that lack full replication must further assume that there are no significant higher-order interactions between factors, which cannot be tested by the model since there is no measure of the residual error among replicate observations (subjects). This is problematic because lower-order effects can only be interpreted fully with respect to their higher-order interactions (chapter 3). Falsely assuming an absence of higher-order interactions will cause tests of lower-order effects to overestimate the Type I error (rejection of a true null hypothesis) and to underestimate the Type II error (acceptance of a false null hypothesis). Without testing for interactions, causality cannot be attributed to significant main effects, and no conclusion can be drawn about non-significant main effects. For some analyses, the existence of a significant main effect when levels of an orthogonal random block are pooled together may hold interest regardless of whether or not the effect also varies with block; the main effect indicates an overall trend averaged across levels of the random factor.

Choosing experimental designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 248-257
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Empirical research invariably requires making informed choices about the design of data collection. Although the number and identity of experimental treatments is determined by the question(s) being addressed, the investigator must decide at what spatial and temporal scales to apply them and whether to include additional fixed or random factors to extend the generality of the study. The investigator can make efficient use of resources by balancing the cost of running the experiment against the power of the experiment to detect a biologically significant effect. In practice this means either minimising the resources required to achieve a desired level of statistical power or maximising the statistical power that can be attained using the finite resources available. An optimum design can be achieved only by careful planning before data collection, particularly in the selection of an appropriate model and allocation of sampling effort at appropriate spatial and temporal scales.
Inadequate statistical power continues to plague biological research (Jennions and Moller 2003; Ioannidis 2005), despite repeated calls to incorporate it into planning (Peterman 1990; Greenwood 1993; Thomas and Juanes 1996). Yet efficient experimentation has never been more in demand.

2 - Nested designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 67-75
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Nested designs extend one-factor ANOVA to consider two or more factors in a hierarchical structure. Nested factors cannot be cross factored with each other because each level of one factor exists only in one level of another (but see models 3.3 and 3.4 for cross-factored models with nesting). Nested designs allow us to quantify and compare the magnitudes of variation in the response at different spatial, temporal or organisational scales. They are used particularly for testing a factor of interest without confounding different scales of variation. For example, spatial variation in the infestation of farmed salmon with sea lice could be compared at three scales – among farms (A′), among cages within each farm (B′) and among fish within each cage (S′) – by sampling n fish in each of b cages on each of a farms. Similarly, seasonal variation (A) in infestation of farmed salmon by sea lice, over and above short term fluctuations in time (B′), could be measured by sampling n independent fish on b random occasions in each of a seasons.
Designs are inherently nested when treatments are applied across one organisational scale and responses are measured at a finer scale. For example the genotype of a plant may influence the mean length of its parasitic fungal hyphae. A test of this hypothesis must recognise the fact that hyphae grow in colonies (S′) that are nested within leaves (C′), which in turn are nested within plants (B′), which in turn are nested in genotype (A′) (discussed further on page 23).

References
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 281-283
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Categories of model
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 288-288
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

How to request models in a statistics package
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 258-259
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

You will need to declare any random factors and covariates as such. For balanced designs you may have an option to use the restricted form of the model (see page 242).
For a fully replicated design, most packages will give you all main effects and their interactions if you request the model in its abbreviated form. For example, the design Y = C|B|A+ε (model 3.2) can be requested as: ‘C|B|A’. Where a model has nested factors, you may need to request it with expansion of the nesting. For example the design Y = C|B′(A)+ε (model 3.3) is requested with ‘C|A+C|B(A)’.
Repeated-measures and unreplicated designs have no true residual variation. The package may require residual variation nevertheless, in which case declare all the terms except the highest-order term (always the last row with non-zero d.f. in the ANOVA tables in this book). For example, for the design Y = B|S′(A) (model 6.3) request: ‘B|A+B|S(A)–B*S(A)’, and the package will take the residual from the subtracted term. Likewise, for the design Y = S′|A (model 4.1) request: ‘S|A–S*A’, and the package will take the residual from the subtracted term; or equally, request ‘ A+ S’, and the package will take the residual from the one remaining undeclared term: S*A.

Troubleshooting problems during analysis
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 264-270
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Correctly identifying the appropriate model to use (see page 57) is the principal hurdle in any analysis, but running the chosen model in your favourite statistics package also presents a number of potential pitfalls. If you encounter problems when using a statistics package, do refer to its help routines and tutorials in order to understand the input requirements and output formats, and to help you interpret error messages. If that fails then look to see if you have encountered one of these common problems.
Problems with sampling design
If I just want to identify any differences amongst a suite of samples, can I do t tests on all sample pairs? No, the null hypothesis of no difference requires a single test yielding a single P-value. Multiple P-values are problematic in any unplanned probing of the data with more than one test of the same null hypothesis, because the repeated testing inflates the Type I error rate (illustrated by an example on page 252). If an ANOVA reveals a general difference between samples, explore where the significance lies using post hoc tests designed to account for the larger family-wise error (page 245).

5 - Split-plot designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 141-178
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Glossary
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 271-280
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp ix-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Hypothesis testing in the life sciences often involves comparing samples of observations, and analysis of variance is a core technique for analysing such information. Parametric analysis of variance, abbreviated as ‘ANOVA’, encompasses a generic methodology for identifying sources of variation in continuous data, from the simplest test of trend in a single sample, or difference between two samples, to complex tests of multiple interacting effects. Whilst simple one-factor models may suffice for closely controlled experiments, the inherent complexities of the natural world mean that rigorous tests of causality often require more sophisticated multi-factor models. In many cases, the same hypothesis can be tested using several different experimental designs, and alternative designs must be evaluated to select a robust and efficient model. Textbooks on statistics are available to explain the principles of ANOVA and statistics packages will compute the analyses. The purpose of this book is to bridge between the texts and the packages by presenting a comprehensive selection of ANOVA models, emphasising the strengths and weaknesses of each and allowing readers to compare between alternatives.
Our motivation for writing the book comes from a desire for a more systematic comparison than is available in textbooks, and a more considered framework for constructing tests than is possible with generic software. The obvious utility of computer packages for automating otherwise cumbersome analyses has a downside in their uncritical production of results. Packages adopt default options until instructed otherwise, which will not suit all types of data.

Introduction to model structures
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 42-60
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

In the following Chapters 1 to 7, we will describe all common models with up to three treatment factors for seven principal classes of ANOVA design:
One-factor – replicate measures at each level of a single explanatory factor;
Nested – one factor nested in one or more other factors;
Factorial – fully replicated measures on two or more crossed factors;
Randomised blocks – repeated measures on spatial or temporal groups of sampling units;
Split plot – treatments applied at multiple spatial or temporal scales;
Repeated measures – subjects repeatedly measured or tested in temporal or spatial sequence;
Unreplicated factorial – a single measure per combination of two or more factors.
For each model we provide the following information:
The model equation;
The test hypothesis;
A table illustrating the allocation of factor levels to sampling units;
Illustrative examples;
Any special assumptions;
Guidance on analysis and interpretation;
Full analysis of variance tables showing all sources of variation, their associated degrees of freedom, components of variation estimated in the population, and appropriate error mean squares for the F-ratio denominator;
Options for pooling error mean square terms.
As an introduction to Chapters 1 to 7, we first describe the notation used, explain the layout of the allocation tables, present some worked examples and provide advice on identifying the appropriate statistical model.
Notation
Chapters 1 to 3 describe fully randomised and replicated designs. This means that each combination of levels of categorical factors (A, B, C) is assigned randomly to n sampling units (S′), which are assumed to be selected randomly and independently from the population of interest.

Further Topics
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 237-247
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Balanced and unbalanced designs
Balanced designs have the same number of replicate observations in each sample. Thus a one-factor model Y = A+ε will be balanced if sample sizes all take the same value n at each of the a levels of factor A. Balanced designs are generally straightforward to analyse because factors are completely independent of each other and the total sum of squares (SS) can be partitioned completely among the various terms in the model. The SS explained by each term is simply the improvement in the residual SS as that term is added to the model. These are often termed ‘sequential SS’ or ‘Type I SS’.
Designs become unbalanced when some sampling units are lost, destroyed or cannot be measured, or when practicalities mean that it is easier to sample some populations than others. For nested models, imbalance may result from unequal nesting as well as unequal sample sizes. Thus a nested model Y = B′(A)+ε will be balanced only if each of the a levels of factor A has b levels of factor B′, and each of the ba level of B′ has n replicate observations. For factorial models, an imbalance means that some combinations of treatments have more observations than others. An extreme case of unbalanced data arises in factorial designs where there are no observations for one or more combinations of treatments, resulting in missing samples and a substantially more complicated analysis.

6 - Repeated-measures designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 179-228
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Repeated-measures designs involve measuring each sampling unit repeatedly over time or applying treatment levels in temporal or spatial sequence to each sampling unit. Because these designs were developed primarily for use in medical research, sampling units are often referred to as subjects. Those factors for which each subject participates in every level are termed ‘within-subject’ or ‘repeated-measures’ factors; levels of the within-subject factor are applied in sequence to each subject. Conversely, ‘between-subjects’ factors are grouping factors, for which each subject participates in only one level. Repeated-measures models are classified into two types, subject-by-trial and subject-by-treatment models, according to the nature of the within-subject factors (Kirk 1994).
Subject-by-trial designs apply the levels of the within-subject factor to each subject in an order that cannot be randomised, because time or space is an inherent component of the factor. Subjects (sampling units) may be measured repeatedly over time to track natural temporal changes in some measurable trait – for example, blood pressure of patients at age 40, 50 and 60, biomass of plants in plots at fixed times after planting, build-up of lactic acid in muscle during exercise. Likewise, subjects may be measured repeatedly through space to determine how the response varies with position – for example barnacle density in plots at different shore elevations, or lichen diversity on the north and south sides of trees.

Index of all ANOVA models with up to three factors
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 284-285
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Frontmatter
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp i-iv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 286-287
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Introduction to analysis of variance
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 1-41
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

What is analysis of variance?
Analysis of variance, often abbreviated to ANOVA, is a powerful statistic and a core technique for testing causality in biological data. Researchers use ANOVA to explain variation in the magnitude of a response variable of interest. For example, an investigator might be interested in the sources of variation in patients' blood cholesterol level, measured in mg/dL. Factors that are hypothesised to contribute to variation in the response may be categorical or continuous. A categorical factor has levels – the categories – that are each applied to a different group of sampling units. For example, sampling units of hospital patients may be classified as male or female, representing two levels of the factor ‘Gender’. By contrast, a continuous factor has a continuous scale of values and is therefore a covariate of the response. For example, age of patients may be quantified by the covariate ‘Age’. ANOVA determines the influence of these effects on the response by testing whether the response differs among levels of the factor, or displays a trend across values of the covariate. Thus, blood cholesterol level of patients may be deemed to differ among male and female patients, or to increase or decrease with age of the patient.
A factor of interest can be experimental, with sampling units that are manipulated to impose contrasting treatments. For example, patients may be given a cholesterol-lowering drug or a placebo, which represent two levels of the factor ‘Drug’.

3 - Fully replicated factorial designs
C. Patrick Doncaster, University of Southampton, Andrew J. H. Davey
Book:

Analysis of Variance and Covariance

Published online:

13 November 2009

Print publication:

30 August 2007, pp 76-114
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Search Results

Refine search

Refine search

Actions for selected content:

24 results

Analysis of Variance and Covariance

7 - Unreplicated designs

Summary

Choosing experimental designs

Summary

2 - Nested designs

Summary

References

Contents

Categories of model

How to request models in a statistics package

Summary

Troubleshooting problems during analysis

Summary

5 - Split-plot designs

Glossary

Preface

Summary

Introduction to model structures

Summary

Further Topics

Summary

6 - Repeated-measures designs

Summary

Index of all ANOVA models with up to three factors

Frontmatter

Index

Introduction to analysis of variance

Summary

3 - Fully replicated factorial designs

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

24 results

Analysis of Variance and Covariance

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary