Hostname: page-component-89b8bd64d-ksp62 Total loading time: 0 Render date: 2026-05-07T00:26:33.331Z Has data issue: false hasContentIssue false

One, two, three: portable sample size in agricultural research

Published online by Cambridge University Press:  25 August 2022

Hans-Peter Piepho*
Affiliation:
Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
Doreen Gabriel
Affiliation:
Institute for Crop and Soil Science, Julius Kühn-Institut (JKI), Braunschweig, Germany
Jens Hartung
Affiliation:
Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
Andreas Büchse
Affiliation:
BASF SE, Ludwigshafen am Rhein, Germany
Meike Grosse
Affiliation:
Research Institute of Organic Agriculture FiBL, Frick, Switzerland
Sabine Kurz
Affiliation:
Fachgebiet Pflanzenbau, Hochschule für Wirtschaft und Umwelt Nürtingen-Geislingen (HfWU), Nürtingen, Germany
Friedrich Laidig
Affiliation:
Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany
Volker Michel
Affiliation:
Mecklenburg-Vorpommern Research Centre for Agriculture and Fisheries, Gülzow, Germany
Iain Proctor
Affiliation:
BASF SE, Ludwigshafen am Rhein, Germany
Jan Erik Sedlmeier
Affiliation:
Applied Entomology, Institute of Phytomedicine, University of Hohenheim, Stuttgart, Germany
Kathrin Toppel
Affiliation:
Fachgebiet Tierhaltung und Produkte, Hochschule Osnabrück, Osnabrück, Germany
Dörte Wittenburg
Affiliation:
Research Institute for Farm Animal Biology (FBN), Institute of Genetics and Biometry, Dummerstorf, Germany
*
Author for correspondence: Hans-Peter Piepho, E-mail: piepho@uni-hohenheim.de
Rights & Permissions [Opens in a new window]

Abstract

Determination of sample size (the number of replications) is a key step in the design of an observational study or randomized experiment. Statistical procedures for this purpose are readily available. Their treatment in textbooks is often somewhat marginal, however, and frequently the focus is on just one particular method of inference (significance test, confidence interval). Here, we provide a unified review of approaches and explain their close interrelationships, emphasizing that all approaches rely on the standard error of the quantity of interest, most often a pairwise difference of two means. The focus is on methods that are easy to compute, even without a computer. Our main recommendation based on standard errors is summarized as what we call the 1-2-3 rule for a difference of two treatment means.

Information

Type
Crops and Soils Research Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Mean, variance, smallest relevant difference, and required sample size per treatment for α = 5% and a power of 85% for four traits in a dairy cow population

Figure 1

Table 2. Overview of procedures for determining sample size for a single mean

Figure 2

Table 3. Expected and median of range (maximum − minimum) for samples of different sample size n from standard normal distribution

Figure 3

Table 4. Frequency distribution of the number of plants infested with the potato weevil (c) in samples of m = 20 plants at n = 347 control points (Trommer, 1986)

Figure 4

Table 5. Frequency distribution of the number y of microsclerotia (Cylindrocladium crotalariae) per quadrat on n = 96 quadrats in a peanut field (Hau et al., 1982; Figure 3(b))

Figure 5

Table 6. Values of Cα,β = (z1−α/2 + z1−β)2 for typical choices of α and β

Figure 6

Table 7. Required sample size for unpaired t-test at α = 5% with a power of 90% for σ2 = 2199 lb.2 and a range of values for the smallest relevant difference δ in lb

Figure 7

Table 8. Overview of procedures for determining sample size for a mean difference

Figure 8

Table 9. Total counts of Bromus sterilis on 1.25 m2 per plot in an experiment laid out as a randomized complete block design with four treatments and four blocks (Büchse and Piepho, 2006)

Figure 9

Table 10. Expected counts in a 2 × 2 classification of groups and treatments

Figure 10

Table 11. Lying times (min day−1) of 13 cows indoors and outdoors

Figure 11

Table 12. Values of Dv (see near Eqn 44) for v = 2, 3, …, 8 treatment levels

Figure 12

Table 13. Number of plots (ne) required as depending on number of samples per plot (no) for α = 5%, β = 20%, μ1 = 0.1 and μ2 = 0.2 based on angular transformation for intermediate crop experiment (Example 13)

Figure 13

Table 14. Number of ears per section within rows (2 metres) (Example 14)

Figure 14

Table 15. Variance component estimates (obtained by residual maximum likelihood) and treatment mean estimates (Example 15)

Figure 15

Fig. 1. Colour online. Plot of SED versus ne for no = 40, 80, 120, 160, 212. Trait: Stem circumference.

Figure 16

Fig. 2. Colour online. Plot of SED versus ne for no = 40, 150, 300, 450, 672. Trait: Stem weight.

Figure 17

Fig. 3. Colour online. Plot of SED versus ne for no = 2, 5, 10, 20, 59. Trait: Plant height.

Figure 18

Fig. 4. Colour online. Plot of SED versus ne for no = 2, 5, 15, 50, 72. Trait: Culm number.

Figure 19

Table 16. Variance component estimates (obtained by residual maximum likelihood) for yield ( × 10−2 t2/ha2) in post-registration wheat variety trials in Mecklenburg-Vorpommern (Landesforschungsanstalt für Landwirtschaft und Fischerei)

Figure 20

Fig. 5. Colour online. Standard error of a difference (SED; in 10−1 t/ha) as per (61) with $SED = \sqrt {VD}$ for variance components $\sigma _{gs}^2$, $\sigma _{gy}^2$, $\sigma _{gsy}^2$, and $\sigma _e^2$ as shown in Table 16 for different values of ny (1, 2, 3, 4, 5) = number of years, ns (1, 3, 5, 7, 9) = number of sites, and nr (1, 2, 3, 4) = number of replicates. Reference lines on ordinate at SED = 4 (× 10−1 t/ha) (desirable for early assessment) and SED = 2 (× 10−1 t/ha) (desirable for recommendations).