Skip to main content Accessibility help

Null Hypothesis Significance Testing, p-values, Effects Sizes and Confidence Intervals

  • Michael Perdices (a1)

There has been controversy over Null Hypothesis Significance Testing (NHST) since the first quarter of the 20th century and misconceptions about it still abound. The first section of this paper briefly discusses some of the problems and limitations of NHST. Overwhelmingly, the ‘holy grail’ of researchers has been to obtain significant p-values. In 1999 the American Psychological Association (APA) recommended that if NHST was used in data analysis, then researchers should report effect sizes (ESs) and their confident intervals (CIs) as well as p-values. The APA recommendations are summarised in the next section of the paper. But as neuropsychological rehabilitation clinicians, the primary interest is (or should be) to determine whether or not the effect of an intervention is clinically important, not just statistically significant. In this context, ESs and their CIs provide information relevant to clinicians. The next section of the paper reviews common ESs and worked out examples are provided for the calculation of three commonly used ES (Cohen's d , Hedge's g and Glass’ delta). Web-based resources for calculating other ESs and their CIs are also reviewed.

Corresponding author
Address for correspondence: Department Of Neurology, Royal North Shore Hospital, The University of Sydney Medical School, Northern Clinical School, Discipline of Psychiatry, New South Wales, Australia. E-mail:
Hide All
APA Publications and Communications Board Working Group on Journal Article Reporting Standards (2008). Reporting standards for research in psychology: Why do we need them? What might they be?. American Psychologist, 63 (9), 839851.
Bakeman, R. (2005). Recommended effect size statistics for repeated measures designs. Behavior Research Methods, 37 (3), 379384.
Berben, L., Sereika, S.M., & Engberg, S. (2012). Effect size estimation: Methods and examples. International Journal of Nursing Studies, 49, 10391047.
Berkson, J. (1938). Some difficulties of interpretation encountered in application of Chi squared. Journal of the American Statistical Association, 33 (203), 526536.
Carver, R.P. (1978). The case against statistical significance. Harvard Educational Review, 48 (3), 378399.
Castro Sotos, A.E., Vanhoof, S., Van den Noortgate, W., & Onghena, P. (2007). Students' misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review, 2 (2), 98113.
Clark, C.A. (1963). Hypothesis testing in relation to statistical methodology. Review of Educational Research, 33, 455473.
Cohen, J. (1962). The statistical power of abnormal–social psychological research. Journal of Abnormal and Social Psychology, 65, 145153.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J. (1990). Things I have learned (So far). American Psychologist, 45 (12), 13041312.
Cohen, J. (1994). The Earth is round (p < .5). American Psychologist, 49 (12), 9971003.
Cooper, H., Hedges, L.V., & Valentine, J.C. (2009). The handbook of research and synthesis and meta-analysis. New York: Russell Sage Foundation.
Cumming, G. & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61 (4), 532574.
Draper, S.W. (2016). Effect Size. Retrieved from
Ellis, P.D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. New York: Cambridge University Press.
Falk, R. & Greenbaum, C.W. (1995). Significance tests die hard. The amazing persistence of a probabilistic misconception. Theory and Psychology, 5 (1), 7698.
Ferguson, C.J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40 (5), 532538.
Fethney, J. (2010). Statistical and clinical significance, and how to use confidence intervals to help interpret both. Australian Critical Care, 23, 9397.
Fisher, R.A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues (pp. 311339). Hillsdale, NJ: Lawrence Erlbaum Associates.
Glaser, D.N. (1999). The controversy of significance testing: Misconceptions and alternatives. American Journal of Critical care, 8 (5), 291296.
Glass, G.V., McGaw, B., & Smith, M.L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage
Gliner, J.A., Leech, N.L., & Morgan, G.A. (2002). Problems with null hypothesis significance testing (NHST): What do the textbooks say?. The Journal of Experimental Education, 71 (1), 8392.
Halsey, L.G., Curran-Everett, D., Vowler, S.L., & Drummond, G.B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12 (3), 179185.
Hedges, L.V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6 (2), 106128.
Howell, D.C. (2010). Confidence intervals on effect size. Retrieved from:
Huberty, C.J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62 (2), 227240.
Huberty, C.J., & Pike, C.J. (1999). On some history regarding statistical testing. Advances in Social Science Methodology, 5, 122.
Keselman, H.J., Huberty, C.J., Lix, L.M., Olejnik, S., Cribbie, R., Donahue, B., . . . Levin, J.R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350386.
Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56 (5), 746759.
Kraemer, H.C., Morgan, G.A., Leech, N.L., Gliner, J.A., Vaske, J.J., & Harmon, R.J. (2003). Measures of clinical significance. Journal of the American Academy of Child and Adolescent Psychiatry, 42 (12), 15241529.
Krishnan, S. & Idris, N. (2014). Students’ misconceptions about hypothesis test. REDIMAT: Journal of Research in Mathematics Education, 3 (3), 276293.
Lambdin, C. (2012). Significance tests as sorcery: Science is empirical-significance tests are not. Theory and Psychology, 22 (1), 6790.
Li-Ting, C., & Chao-Ying, J.P. (2013). Constructing confidence intervals for effect sizes in ANOVA designs. Journal of Modern Applied Statistical Methods, 12 (2), 82104.
Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34 (2), 103115.
Meyer, G.J., McGrath, R.E., & Rosenthal, R. (2003 ). Basic effect size guide with SPSS® and SAS® syntax. Retrieved from
Neyman, J., & Pearson, E. (1928a). On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika, 20A, 175240.
Neyman, J., & Pearson, E. (1928b). On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika, 20A, 263294.
Nickerson, R.S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5 (2), 241301.
Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241286.
Peng, C.-Y. J., Chen, L.-T., Chiang, H.-M., & Chiang, Y.-C. (2013). The impact of APA and AERA guidelines on effect size reporting. Educational Psychology Review, 25, 157209.
Prentice, D.A., & Miller, D.T. (1992). When small effects are impressive. Psychological Bulletin, 112 (1), 160164.
Rea, L.M., & Parker, R.A. (1992). Designing and conducting survey research. San Francisco: Jossey-Boss.
Richardson, J.T.E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6 (12), 135147.
Rozeboom, W.W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57 (5), 416428.
Sainani, K.L. (2012). Clinical versus statistical significance. American Academy of Physical Medicine and Rehabilitation, 4 (6), 442445.
Schatz, P., Jay, K.A., McComb, J., & McLaughlin, J.R. (2005). Misuse of statistical tests in archives of clinical neuropsychology publications. Archives of Clinical Neuropsychology, 20, 10531059.
Smithson, M. (2001). Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Educational and Psychological Measurement, 61 (4), 605632.
Thompson, B. (2002). “Statistical,” “Practical,” and “Clinical”: How many kinds of significance do counselors need to consider?. Journal of Counselling and Development, 80, 6471.
Torciano, M. (2017 ) Efficient effect size computation. Retrieved from
Turner, H.M., & Bernard, R.M. (2006). Calculating and synthesizing effect sizes. Contemporary Issues in Communication Science and Disorders, 33, 4255.
Vallecillos, A. (2001). Cuestiones metodológicas en la investigación educativa. Quinto Simposio de la Sociedad Española de Investigación en Educación Matemática, Almería, Spain.
Vallecillos, A., & Batanero, C. (1997b). Conceptos activados en el contraste de hipótesis estadísticas y su comprensión por estudiantes universitarios. Recherches en Didactique des Mathématiques, 17 (1), 2948.
Vallecillos, A., & Batanero, M.C. (1997a). Aprendizaje y enseñanza del contraste de hipotesis: Concepciones y errores. Enseñanza de las Ciencias, 15 (2), 189197.
Wilkinson, L. and the Task Force on Statistical Inference APA Board of Scientific Affairs. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54 (8), 594604.
Wilson, D.B. (2011). Interpretation.ppt. Retrieved from
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Brain Impairment
  • ISSN: 1443-9646
  • EISSN: 1839-5252
  • URL: /core/journals/brain-impairment
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed