Contemporary life and research are peppered with factoids which are accepted as universal truths but rest on quicksand. One of these is power calculations, which are widely regarded as a cornerstone of rigorous clinical trial design. Reflecting this orthodoxy, they are mandated by funding bodies, ethics committees and journals as evidence that a study is adequately planned and ethically justified. Yet, in many areas of contemporary clinical research such as early phase trials and studies of heterogeneous or poorly understood conditions, power calculations function less as scientific safeguard and more as an institutionalised and often ill-informed ritual. Their guise of precision masks substantial uncertainty surrounding the estimates on which participant numbers depend, often lending false authority to study designs that are epistemically tenuous. The essence of the conundrum is that to do an accurate power calculation, you need to know the magnitude of the clinical effect. But to know this, the clinical effect has to be formally established via definitive trial data. However, for any truly novel research that is worth doing, this remains unknown.
Hence, the routine use of conventional power calculations is frequently misguided, and their primacy reflects institutional convention rather than methodological utility or necessity. At a basic level, classical power calculations depend on several inputs: an assumed effect size, the significance level, the desired power estimate and the analysis model. Of these, the effect size is the most consequential and the least reliable. In many clinical contexts, effect sizes are inferred from small pilot studies or open label studies that are often replete with inflated estimates. Published pilot trials are often subject to publication bias, underpowering that risks effect size inflation or minimally important differences that are selected to render a trial feasible. Reference de Vries, Schoevers, Higgins, Munafò and Bastiaansen1 In addition, pilot studies are not capable of generating meaningful effect size estimates for planning definitive studies because of the inherent imprecision present in small data samples.
These assumptions about size rarely reflect the true underlying effect – if that can even be known. Power calculations invert their intended logic: rather than determining the sample size required to detect a plausible effect, they often cynically determine the effect size that must be assumed to justify a feasible sample size. When the assumed effect does not reflect reality, the resulting power risks being fiction.
Effect size estimates are also highly sensitive to population heterogeneity, Reference Holzmeister, Johannesson, Böhm, Dreber, Huber and Kirchler2 measurement error Reference Nab, Groenwold, Welsing and van Smeden3 and imprecision, disease stage and severity, and site-level and behavioural effects. Small changes in either the anticipated mean difference in treatment effect or the pooled standard deviation can dramatically alter required sample sizes, yet most power calculations rely on single-point estimates derived from prior studies conducted under distinct conditions. Uncertainty is rarely explicitly modelled. Consequently, power calculations often serve as statistical theatre, projecting authority while resting on unstable foundations. For complex, multifactorial diseases, treatment effects are often highly heterogeneous. Average treatment effects may be small even when substantial benefit exists within identifiable subgroups. Powering a trial to detect a modest mean effect can therefore be counterproductive, obscuring biologically meaningful signals through aggregation. Conventional power calculations implicitly model settings that assume homogeneity that does not exist. The result is a trial that may be well powered yet poorly aligned with the structure of the underlying biology.
Power calculations are routinely defended on ethical grounds, with underpowered trials portrayed as inherently unethical. Reference Halpern, Karlawish and Berlin4 Ethical justification should depend on expected knowledge gain, not adherence to an arbitrary power threshold. A trial that is ‘adequately powered’ under incorrect assumptions may expose participants to intervention and burden while producing results that are uninterpretable or misleading. Conversely, smaller, exploratory studies – if designed to characterise heterogeneity, refine outcomes or generate mechanistic insight – may offer greater ethical value despite low nominal power.
Power calculations are typically anchored to null hypothesis significance testing, typically framed as detecting a departure from a null of zero effect. In many clinical settings, this null is implausible, and small non-zero effects may be either clinically trivial or, conversely, highly important, depending on context. As a result, designing studies around a null of zero effect can misalign statistical aims with clinical decision thresholds, encouraging trials that are powered to detect any difference rather than differences that are scientifically or clinically meaningful. Reference Szucs and Ioannidis5 Power calculations offer no protection against conceptual errors in trial design. A study may be perfectly powered and still incapable of detecting an effect of interest because the outcome measure is insensitive, the measurement window is misaligned to illness progression, or the analysis model is inappropriate. Power cannot compensate for flawed assumptions about what should be measured, or how or when it should be measured.
None of this implies that power calculations are universally inappropriate. They are most defensible and helpful in late phase confirmatory trials in which effect sizes are well characterised, outcome measures are validated and stable, and populations are relatively homogeneous. However, this scenario represents an increasing minority of contemporary clinical research, not the dominant state. In mental health and psychiatry, such scenarios are rare.
Rather than ritualised power calculations, trial design would benefit from approaches that foreground uncertainty, emphasising confidence intervals (or posterior intervals) rather than point estimates of sample size, and explicit modelling of population heterogeneity and sensitivity analyses over plausible parameter ranges. When appropriate, clinical research should also embrace adaptive and sequential designs that incorporate interim analyses, allowing key assumptions to be refined in light of accumulating real-world data and thus improving the credibility of final inferences. Currently, effect sizes are often an amalgam of aspirations, wish fulfilment and plea bargaining. Putting power analyses in their place requires putting the objectives and aspirations of individual trials in their rightful place: there’s more to trials than P < 0.05!
The routine use of power calculations in clinical trials persists not because it reliably advances scientific understanding but because it satisfies institutional expectations of rigour. In many contexts, power calculations offer precision without knowledge: numerical confidence built on speculative assumptions at best. For early phase trials in particular, power calculations are essentially circular. It is not possible to perform a power calculation without knowing the magnitude of effect with some degree of certainty. But for an early phase or pilot trial, there is no ability to know what the magnitude of effect is going to be, let alone whether it will prove effective. So, for the trials that are most critical, traditional power calculations are essentially a fiction, a pretence that the unknowable is known. They retain utility precisely where they are not needed, to replicate existing findings in confirmatory trials. We need to cease holding power calculations as having universal significance, instead being cognisant of their limitations in those circumstances in which they have utility and being honest when they are a mirage.
Funding
M.B. is supported by a National Health and Medical Research Council (NHMRC) Leadership 3 Investigator grant (GNT2017131).
Declaration of interest
M.B. has received grant funding from NHMRC, Medical Research Future Fund (MRFF), Patient-Centered Outcomes Research Institute, Wellcome Trust, Stanley Medical Research Institute, Danmarks Frie Forskningsfond. Psykiatrisk Center Kobenhavn, Congressionally Directed Medical Research Programs USA, Equity Trustees Limited and HCF Health Services Research; given lectures for International Society for Bipolar Disorders, Precision Psych Fondamental, Penn State College of Medicine, East Meets West webinar, International College of Neuropsychopharmacology, NeuroSAS, World Federation of Societies of Biological Psychiatry, PsychScene, Specialised Treatment Australia, Lundbeckfonden, Actinogen Medical Limited, The Royal Australian and New Zealand College of Psychiatrists and Neurotorium; received royalties from Cambridge University Press and Allen and Unwin; and served on advisory boards for Servier, Janssen, Johnson & Johnson and Actinogen (all unrelated to this work). R.I. is supported by an MRFF grant (MRFF2006296) and has received grant funding from NHMRC, Wellcome Trust and Suicide Prevention Australia, all unrelated to this work.
eLetters
No eLetters have been published for this article.