Skip to main content

Fact or fiction: reducing the proportion and impact of false positives

  • D. Stahl (a1) and A. Pickles (a1)

False positive findings in science are inevitable, but are they particularly common in psychology and psychiatry? The evidence that we review suggests that while not restricted to our field, the problem is acute. We describe the concept of researcher ‘degrees-of-freedom’ to explain how many false-positive findings arise, and how the various strategies of registration, pre-specification, and reporting standards that are being adopted both reduce and make these visible. We review possible benefits and harms of proposed statistical solutions, from tougher requirements for significance, to Bayesian and machine learning approaches to analysis. Finally we consider the organisation and methods for replication and systematic review in psychology and psychiatry.

Corresponding author
Author for correspondence: A. Pickles, E-mail:
Hide All
Aarts, AA, Anderson, JE, Anderson, CJ, Attridge, PR, Attwood, A, Axt, J, Babel, M, Bahnik, S, Baranski, E, Barnett-Cowan, M, Bartmess, E, Beer, J, Bell, R, Bentley, H, Beyan, L, Binion, G, Borsboom, D, Bosch, A, Bosco, FA, Bowman, SD, Brandt, MJ, Braswell, E, Brohmer, H, Brown, BT, Brown, K, Bruning, J, Calhoun-Sauls, A, Callahan, SP, Chagnon, E, Chandler, J, Chartier, CR, Cheung, F, Christopherson, CD, Cillessen, L, Clay, R, Cleary, H, Cloud, MD, Cohn, M, Cohoon, J, Columbus, S, Cordes, A, Costantini, G, Alvarez, LDC, Cremata, E, Crusius, J, DeCoster, J, DeGaetano, MA, Della Penna, N, den Bezemer, B, Deserno, MK, Devitt, O, Dewitte, L, Dobolyi, DG, Dodson, GT, Donnellan, MB, Donohue, R, Dore, RA, Dorrough, A, Dreber, A, Dugas, M, Dunn, EW, Easey, K, Eboigbe, S, Eggleston, C, Embley, J, Epskamp, S, Errington, TM, Estel, V, Farach, FJ, Feather, J, Fedor, A, Fernandez-Castilla, B, Fiedler, S, Field, JG, Fitneva, SA, Flagan, T, Forest, AL, Forsell, E, Foster, JD, Frank, MC, Frazier, RS, Fuchs, H, Gable, P, Galak, J, Galliani, EM, Gampa, A, Garcia, S, Gazarian, D, Gilbert, E, Giner-Sorolla, R, Glockner, A, Goellner, L, Goh, JX, Goldberg, R, Goodbourn, PT, Gordon-McKeon, S, Gorges, B, Gorges, J, Goss, J, Graham, J, Grange, JA, Gray, J, Hartgerink, C, Hartshorne, J, Hasselman, F, Hayes, T, Heikensten, E, Henninger, F, Hodsoll, J, Holubar, T, Hoogendoorn, G, Humphries, DJ, Hung, COY, Immelman, N, Irsik, VC, Jahn, G, Jakel, F, Jekel, M, Johannesson, M, Johnson, LG, Johnson, DJ, Johnson, KM, Johnston, WJ, Jonas, K, Joy-Gaba, JA, Kappes, HB, Kelso, K, Kidwell, MC, Kim, SK, Kirkhart, M, Kleinberg, B, Knezevic, G, Kolorz, FM, Kossakowski, JJ, Krause, RW, Krijnen, J, Kuhlmann, T, Kunkels, YK, Kyc, MM, Lai, CK, Laique, A, Lakens, D, Lane, KA, Lassetter, B, Lazarevic, LB, LeBel, EP, Lee, KJ, Lee, M, Lemm, K, Levitan, CA, Lewis, M, Lin, L, Lin, S, Lippold, M, Loureiro, D, Luteijn, I, Mackinnon, S, Mainard, HN, Marigold, DC, Martin, DP, Martinez, T, Masicampo, EJ, Matacotta, J, Mathur, M, May, M, Mechin, N, Mehta, P, Meixner, J, Melinger, A, Miller, JK, Miller, M, Moore, K, Moschl, M, Motyl, M, Muller, SM, Munafo, M, Neijenhuijs, KI, Nervi, T, Nicolas, G, Nilsonne, G, Nosek, BA, Nuijten, MB, Olsson, C, Osborne, C, Ostkamp, L, Pavel, M, Penton-Voak, IS, Perna, O, Pernet, C, Perugini, M, Pipitone, RN, Pitts, M, Plessow, F, Prenoveau, JM, Rahal, RM, Ratliff, KA, Reinhard, D, Renkewitz, F, Ricker, AA, Rigney, A, Rivers, AM, Roebke, M, Rutchick, AM, Ryan, RS, Sahin, O, Saide, A, Sandstrom, GM, Santos, D, Saxe, R, Schlegelmilch, R, Schmidt, K, Scholz, S, Seibel, L, Selterman, DF, Shaki, S, Simpson, WB, Sinclair, HC, Skorinko, JLM, Slowik, A, Snyder, JS, Soderberg, C, Sonnleitner, C, Spencer, N, Spies, JR, Steegen, S, Stieger, S, Strohminger, N, Sullivan, GB, Talhelm, T, Tapia, M, te Dorsthorst, A, Thomae, M, Thomas, SL, Tio, P, Traets, F, Tsang, S, Tuerlinckx, F, Turchan, P, Valasek, M, van 't Veer, AE, Van Aert, R, van Assen, M, van Bork, R, van de Ven, M, van den Bergh, D, van der Hulst, M, van Dooren, R, van Doorn, J, van Renswoude, DR, van Rijn, H, Vanpaemel, W, Echeverria, AV, Vazquez, M, Velez, N, Vermue, M, Verschoor, M, Vianello, M, Voracek, M, Vuu, G, Wagenmakers, EJ, Weerdmeester, J, Welsh, A, Westgate, EC, Wissink, J, Wood, M, Woods, A, Wright, E, Wu, S, Zeelenberg, M, Zuni, K Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349, 943952.
Allen, DM (1974) Relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125127.
Bakker, M, van Dijk, A, Wicherts, JM (2012) The rules of the game called psychological science. Perspectives on Psychological Science 7, 543554.
Bem, DJ (2011) Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology 100, 407425.
Bender, R, Lange, S (2001) Adjusting for multiple testing – when and how? Journal of Clinical Epidemiology 54, 343349.
Benjamin, DJ, Berger, JO, Johannesson, M, Nosek, BA, Wagenmakers, E-J, Berk, R, Bollen, KA, Brembs, B, Brown, L, Camerer, C, Cesarini, D, Chambers, CD, Clyde, M, Cook, TD, De Boeck, P, Dienes, Z, Dreber, A, Easwaran, K, Efferson, C, Fehr, E, Fidler, F, Field, AP, Forster, M, George, EI, Gonzalez, R, Goodman, S, Green, E, Green, DP, Greenwald, AG, Hadfield, JD, Hedges, LV, Held, L, Hua Ho, T, Hoijtink, H, Hruschka, DJ, Imai, K, Imbens, G, Ioannidis, JPA, Jeon, M, Jones, JH, Kirchler, M, Laibson, D, List, J, Little, R, Lupia, A, Machery, E, Maxwell, SE, McCarthy, M, Moore, DA, Morgan, SL, MunafA3 M, Nakagawa, S, Nyhan, B, Parker, TH, Pericchi, L, Perugini, M, Rouder, J, Rousseau, J, Savalei, V, Schӧnbrodt, FD, Sellke, T, Sinclair, B, Tingley, D, Van Zandt, T, Vazire, S, Watts, DJ, Winship, C, Wolpert, RL, Xie, Y, Young, C, Zinman, J & Johnson, VE (2017) Redefine statistical significance. Human Nature Behavior 1, 15.
Browne, MW (1975) Comparison of single sample and cross-validation methods for estimating mean squared error of prediction in multiple linear-regression. British Journal of Mathematical and Statistical Psychology 28, 112120.
Caldwell, DM (2014) An overview of conducting systematic reviews with network meta-analysis. Systematic Reviews 3, 109109.
Caldwell, DM, Ades, AE, Higgins, JPT (2005) Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 331, 897900.
Cappelleri, JC, Ioannidis, JPA, Schmid, CH, deFerranti, SD, Aubert, M, Chalmers, TC, Lau, J (1996) Large trials vs meta-analysis of smaller trials – how do their results compare? Jama-Journal of the American Medical Association 276, 13321338.
Cawley, GC, Talbot, NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research 11, 20792107.
Chamber, C, Sumner, P (2012) Replication is the only solution to scientific fraud. The Guardian [online] ( Accessed 8 September 2017.
Contopoulos-Ioannidis, DG, Gilbody, SM, Trikalinos, TA, Churchill, R, Wahlbeck, K, Ioannidis, JPA, Project, E.-P. (2005) Comparison of large versus smaller randomized trials for mental health-related interventions. American Journal of Psychiatry 162, 578584.
Cumming, G (2014) The new statistics. Psychological Science 25, 729.
Debray, TPA, Vergouwe, Y, Koffijberg, H, Nieboer, D, Steyerberg, EW, Moons, KGM (2015) A new framework to enhance the interpretation of external validation studies of clinical prediction models. Journal of Clinical Epidemiology 68, 280289.
Dienes, Z (2011) Bayesian versus orthodox statistics: which side are you on? Perspectives on Psychological Science 6, 274290.
Drummond, C (2009) Replicability is not reproducibility: nor is it good science. In Proceedings of the Twenty-Sixth International Conference on Machine Learning. ICML: Montreal, Canada, p. 4.
Dumas-Mallet, E, Button, KS, Boraud, T, Gonon, F, Munafò, MR (2017 a). Low statistical power in biomedical science: a review of three human research domains. Royal Society Open Science 4, 160254.
Dumas-Mallet, E, Smith, A, Boraud, T, Gonon, F (2017 b). Poor replication validity of biomedical association studies reported by newspapers. Plos ONE 12, 15.
Edmonds, B, Gilbert, N, Ahrweiler, P, Scharnhorst, A (2011) Simulating the social processes of science. Jasss-the Journal of Artificial Societies and Social Simulation 14, 14.
Eich, E (2014) Business not as usual. Psychological Science 25, 36.
Fanelli, D (2009) How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. Plos ONE 4, 111.
Fanelli, D (2010) “Positive” results increase down the hierarchy of the sciences. Plos ONE 5, 110.
Festinger, L, Hutte, HA (1954) An experimental investigation of the effect of unstable interpersonal relations in a group. Journal of Abnormal and Social Psychology 49, 513522.
Flint, J, Cuijpers, P, Horder, J, Koole, SL, Munafo, MR (2015) Is there an excess of significant findings in published studies of psychotherapy for depression? Psychological Medicine 45, 439446.
Freedman, DA (1983) A note on screening regression equations. American Statistician 37, 152155.
Gallistel, R (2015) Bayes for beginners: probability and likelihood. Observer Magazine 28(7) [online] ( Accessed 8 September 2017.
Geisser, S (1975) Predictive sample reuse method with applications. Journal of the American Statistical Association 70, 320328.
Gelman, A, Loken, E (2014) The statistical crisis in science. American Scientist 102, 460465.
Gelman, A, O'Rourke, K (2014) Discussion: difficulties in making inferences about scientific truth from distributions of published p-values. “Biostatistics (Oxford, England)” 15, 1823.
Giofrè, D, Cumming, G, Fresc, L, Boedker, I, Tressoldi, P (2017) The influence of journal submission guidelines on authors’ reporting of statistics and use of open research practices. PLOS ONE 12, e0175583.
Hand, DJ (2006) Classifier technology and the illusion of progress, Statistical Science 21, 114.
Harrell, F (2015) Regression Modeling Strategies. Springer: New York, USA.
Harrell, FE, Lee, KL, Mark, DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15, 361387.
Hastie, T, Tibshirani, R, Friedman, J (2009) The Elements of Statistical Learning. Springer-Verlag: New York.
Hoerl, AE, Kennard, RW (1970) Ridge regression – biased estimation for nonorthogonal problems. Technometrics 12, 5567.
Hunter, L (2017) An introduction to machine learning for statisticians. [online] ( Accessed 8 September 2017.
Ioannidis, JPA (2005) Why most published research findings are false. Plos Medicine 2, 696701.
Ioannidis, JPA (2008) Interpretation of tests of heterogeneity and bias in meta-analysis. Journal of Evaluation in Clinical Practice 14, 951957.
Ioannidis, JPA, Karassa, FB (2010) The need to consider the wider agenda in systematic reviews and meta-analyses: breadth, timing, and depth of the evidence. British Medical Journal 341, c4875.
Ioannidis, JPA, Trikalinos, TA (2007) The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. CMAJ: Canadian Medical Association Journal 176, 10911096.
Johnson, VE (2013) Uniformly most powerful Bayesian tests. Annals of Statistics 41, 17161741.
Kerr, NL, Niedermeier, KE, Kaplan, MF (1999) Bias in jurors vs bias in juries: new evidence from the SDS perspective. Organizational Behavior and Human Decision Processes 80, 7086.
Kuhn, M, Johnson, K (2013) Applied Predcitive Modelling. Springer-Verlag: New York.
Lawlor, DA, Tilling, K, Davey Smith, G (2016) Triangulation in aetiological epidemiology. International Journal of Epidemiology 45, 18661886.
Luengo-Fernandez, R, Leal, J, Gray, A (2015) UK research spend in 2008 and 2012: comparing stroke, cancer, coronary heart disease and dementia. British Medical Journal Open 5, e006648.
Ly, A, Verhagen, J, Wagenmakers, E-J (2016) Harold Jeffreys's default Bayes factor hypothesis tests: explanation, extension, and application in psychology. Journal of Mathematical Psychology 72, 1932.
Martin, GN, Clarke, RM (2017) Are psychology journals anti-replication? A snapshot of editorial practices. Frontiers in Psychology 8, 16.
Munafo, MR, Black, S (2017) Personality and smoking status: a longitudinal analysis (vol 9, pg 397, 2007). Nicotine and Tobacco Research 19, 129129.
Munafò, MR, Nosek, BA, Bishop, DVM, Button, KS, Chambers, CD, Percie du Sert, N Simonsohn, U, Wagenmakers, E-J, Ware, JJ, Ioannidis, JPA (2017) A manifesto for reproducible science. Nature Human Behaviour 1, 0021.
Parmar, MKB, Sydes, MR, Morris, TP (2016) How do you design randomised trials for smaller populations? A framework. BMC Medicine 14, 183.
Pashler, H, Wagenmakers, EJ (2012) Editors’ introduction to the special section on replicability in psychological science: a crisis of confidence? Perspectives on Psychological Science 7, 528530.
Pickles, A (2009) What clinicians need to know about statistical issues and methods. In Rutter's Child and Adolescent Psychiatry (ed. Rutter, M., Bishop, D.V.M., Pine, D. S., Scott, S., Stevenson, S., Taylor, E., Thapar, A.), pp. 111122. Wiley: Oxford.
Rouder, JN, Speckman, PL, Sun, D, Morey, RD, Iverson, G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and Review 16, 225237.
Sagan, C (1995) The Demon-Haunted World: Science as A Candle in the Dark. Random House: New York.
Schaller, M (2016) The empirical benefits of conceptual rigor: systematic articulation of conceptual hypotheses can reduce the risk of non-replicable results (and facilitate novel discoveries too). Journal of Experimental Social Psychology 66, 107115.
Schuit, E, Roes, KC, Mol, BW, Kwee, A, Moons, KG, Groenwold, RH (2015) Meta-analyses triggered by previous (false-) significant findings: problems and solutions. Systematic Reviews 4, 57.
Shmueli, G (2010) To explain or to predict? Statistical Science 25, 289310.
Shmueli, G, Koppius, OR (2011) Predictive analytics in information systems research. Management Information Systems Quarterly 35, 553572.
Simmons, JP, Nelson, LD, Simonsohn, U (2011) False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 13591366.
Simonsohn, U, Nelson, LD, Simmons, JP (2014) P-curve: a key to the file-drawer. Journal of Experimental Psychology General 143, 534547.
Sober, E (2006) Parsimony. In The Philosophy of Science: An Encyclopaedia (ed. Sarkar, S., Pfeifer, J.), pp. 531538. Routledge: Oxford.
Sterling, TD (1959) Publication decisions and their possible effects on inferences drawn from tests of significance – or vice versa. Journal of the American Statistical Association 54, 3034.
Sterling, TD, Rosenbaum, WL, Weinkam, JJ (1995) Publication decisions revisited – the effect of the outcome of statistical tests on the decision to publish and vice-versa. American Statistician 49, 108112.
Sterne, JAC, Sutton, AJ, Ioannidis, JPA, Terrin, N, Jones, DR, Lau, J, Carpenter, J, Rücker, G, Harbord, RM, Schmid, CH, Tetzlaff, J, Deeks, JJ, Peters, J, Macaskill, P, Schwarzer, G, Duval, S, Altman, DG, Moher, D, Higgins, JPT (2011) Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. British Medical Journal 343, D002.
Steyerberg, EW (2009) Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer: New York.
Stone, M (1974) Cross-Validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B-Statistical Methodology 36, 111147.
Sumner, P, Vivian-Griffiths, S, Boivin, J, Williams, A, Venetis, CA, Davies, A, Ogden, J, Whelan, L, Hughes, B, Dalton, B, Boy, F, Chambers, CD (2014) The association between exaggeration in health related science news and academic press releases: retrospective observational study. British Medical Journal 349, g7015.
Taylor, AE, Munafò, MR (2016) Triangulating meta-analyses: the example of the serotonin transporter gene, stressful life events and major depression. BMC Psychology 4, 23.
Tsilidis, KK, Panagiotou, OA, Sena, ES, Aretouli, E, Evangelou, E, Howells, DW, Salman, RAS, Macleod, MR, Ioannidis, JPA (2013) Evaluation of excess significance bias in animal studies of neurological diseases. Plos Biology 11.
Van Batenburg-Eddes, T, Brion, MJ, Henrichs, J, Jaddoe, VWV, Hofman, A, Verhulst, FC, Lawlor, DA, Smith, GD, Tiemeier, H (2013) Parental depressive and anxiety symptoms during pregnancy and attention problems in children: a cross-cohort consistency study. Journal of Child Psychology and Psychiatry 54, 591600.
Varma, S, Simon, R (2006) Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 9191.
Wagenmakers, EJ, Wetzels, R, Borsboom, D, van der Maas, HLJ (2011) Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology 100, 426432.
Wetzels, R, Matzke, D, Lee, MD, Rouder, JN, Iverson, GJ, Wagenmakers, E-J (2011) Statistical evidence in experimental psychology. Perspectives on Psychological Science 6, 291298.
Wicherts, JM, Veldkamp, CLS, Augusteijn, HEM, Bakker, M, van Aert, RCM, van Assen, MALM (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid p-hacking. Frontiers in Psychology 7, 112.
Yong, E (2012) Bad copy. Nature 485, 298300.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Psychological Medicine
  • ISSN: 0033-2917
  • EISSN: 1469-8978
  • URL: /core/journals/psychological-medicine
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary materials

Stahl and Pickles supplementary material 1

 Word (151 KB)
151 KB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed