Skip to main content
    • Aa
    • Aa

The reliability of peer review for manuscript and grant submissions: A cross-disciplinary investigation

  • Domenic V. Cicchetti (a1)

The reliability of peer review of scientific documents and the evaluative criteria scientists use to judge the work of their peers are critically reexamined with special attention to the consistently low levels of reliability that have been reported. Referees of grant proposals agree much more about what is unworthy of support than about what does have scientific value. In the case of manuscript submissions this seems to depend on whether a discipline (or subfield) is general and diffuse (e.g., cross-disciplinary physics, general fields of medicine, cultural anthropology, social psychology) or specific and focused (e.g., nuclear physics, medical specialty areas, physical anthropology, and behavioral neuroscience). In the former there is also much more agreement on rejection than acceptance, but in the latter both the wide differential in manuscript rejection rates and the high correlation between referee recommendations and editorial decisions suggests that reviewers and editors agree more on acceptance than on rejection. Several suggestions are made for improving the reliability and quality of peer review. Further research is needed, especially in the physical sciences.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

Ad Hoc Working Group for Critical Appraisal of the Medical Literature (1987) A proposal for more informative abstracts of clinical articles. Annals of Internal Medicine 106:598604. [SPL]

R. K. Adair (1981) Anonymous refereeing. Physics Today 34:1315. [aDVC]

E. M. Allen (1960) Why are research grant applications disapproved? Science 132:1532–34. [aDVC]

T. M. Amabile (1983) Brilliant but cruel: Perceptions of negative evaluators. Journal of Experimental Social Psychology 19:146–56. [RC]

J. S. Armstrong (1982b) The ombudsman: Is peer review by peers as fair as it appears? Interfaces 12:6274. [aDVC, JSA]

J. S. Armstrong (1982c) Research on scientific journals: Implications for editors and authors. Journal of Forecasting 1:83104. [aDVC, JSA]

J. C. Bailar III & K. Patterson (1985) Journal peer review: The need for a research agenda. The New England Journal of Medicine 312:654–57. [aDVC]

V. Bakanic C. McPhail & R. J. Simon (1987) The manuscript review and decision-making process. American Sociological Report 52:631–42. [aDVC, LJS]

J. J. Bartko (1966) The intraclass correlation coefficient as a measure of reliability. Psychological Reports 19:311. [aDVC]

J. M. Beyer (1978) Editorial policies and practices among leading journals in four scientific fields. The Sociological Quarterly 19:6888. [aDVC]

H. D. Bozarth & R. R. Roberts Jr. (1972) Signifying significant significance. American Psychologist 27:774–75. [aDVC]

J. V. Bradley (1981) Pernicious publication practices. Bulletin of the Psychonomic Society 18:3134. [aDVC]

C. Byrne (1980) Tutor marked assessments at the Open University: A question of reliability. Assessment in Higher Education 5:104–18. [DL]

J. P. Campbell (1982) Some remarks from the outgoing editor. Journal of Applied Psychology 67:691700. [LLH]

S. J. Ceci & D. Peters (1984) How blind is blind review? American Psychologist 39:1491–94. [aDVC]

I. Chalmers (1990) Underreporting research is scientific misconduct. Journal of the American Medical Association 263:1405–08. [SPL]

T. C. Chalmers C. S. Frank & D. Reitman (1990) Minimizing the three stages of publication bias. Journal of the American Medical Association 263:1392–95. [SPL]

D. E. Chubin (1982) Reform of peer review. Science 215:40. [aDVC]

D. V. Cicchetti (1976) Assessing interrater reliability for rating scales: Resolving some basic issues. British Journal of Psychiatry 129:452–56. [aDVC].

D. V. Cicchetti (1980)Testing the normal approximation and minimal sample size requirements of weighted kappa when the number of categories is large. Applied Psychological Measurement 5:101–04. [arDVC]

D. V. Cicchetti (1985) A critique of Whitehurst's “Interrater agreement for journal manuscript reviews:” De omnibus, disputandum est. American Psychologist 40:563–68. [aDVC, MED]

D. V. Cicchetti (1988) When diagnostic agreement is high, but reliability is low: Some paradoxes occurring in independent neuropsychological assessments. Journal of Clinical and Experimental Neuropsychology 10:605–22. [aDVC]

D. V. Cicchetti & A. R. Feinstein (1990) High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 43:551–68. [arDVC]

D. V. Cicchetti S. L. Aivano & J. Vitale (1976) A computer program for assessing the reliability and systematic bias of individual measurements. Educational and Psychological Measurement 36:761–64. [aDVC]

D. V. Cicchetti S. L. Aivano & J. Vitale (1977) Computer programs for assessing rater agreement and rater bias for qualitative data. Educational and Psychological Measurement 37:195201. [aDVC]

D. V. Cicchetti C. Lee A. F. Fontana & B. N. Dowds (1978) A computer program for assessing specific category-rater agreement for qualitative data. Educational and Psychological Measurement 38:805–13. [aDVC]

D. V. Cicchetti D. Showalter & P. Tyrer (1985) The effect of number of rating-scale categories upon levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement 9:3136. [aDVC]

J. Cohen (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20:3746. [aDVC, RR]

J. Cohen (1988) Statistical power analysis for the behavioral sciences, 2nd ed. Lawrence Erlbaum. [rDVC, RR]

S. Cole (1983) The hierarchy of the sciences. American Journal of Sociology 89:111–39. [aDVC], SC

A. J. Conger (1980) Integration and generalization of Kappa for multiple raters. Psychological Bulletin 88:322–28. [rDVC]

A. J. Conger (1985) Kappa reliabilities for continuous behaviors and events. Educational and Psychological Measurement 45:861–68. [rDVC]

B. J. Culliton (1984) Fine-tuning peer review. Science 226:1401–02. [aDVC, RG]

J. M. Darley & B. Latane (1968) Bystander intervention in emergencies: Diffusion of responsibility. Journal of Personality and Social Psychology 8:337–83. [AMC]

K. L. Delucchi (1983) The use and misuse of chi-square: Lewis and Burke revisited. Psychological Bulletin 94:166–76. [rDVC]

J. Diamond (1985) Variations on a theme. Nature 314:222–23. [aDVC]

K. Dickersin (1990) The existence of publication bias and risk factors for its occurrence. Journal of the American Medical Association 263:1385–89. [SPL]

J. T. Evans H. I. Nadjari & S. A. Burchell (1990) Quotational and reference accuracy in surgical journals: A continuing peer-review problem. Journal of the American Medical Association 263:1353–54. [JSA].

A. R. Feinstein & D. V. Cicchetti (1990) High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology 43:543–49. [arDVC]

R. H. Finn (1970) A note on estimating the reliability of categorical data. Educational and Psychological Measurement 30:7176. [aDVC]

D. W. Fiske & L. Fogg (1990) But the reviewers are making different criticisms of my paper!: Diversity and uniqueness in reviewer comments. American Psychologist 45:591–98. [rDVC, JSA]

J. L. Fleiss (1975) Measuring agreement between two judges on the presence or absence of a trait. Biometrics 31:651–59. [aDVC]

J. L. Fleiss & J. Cohen (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33:613–19. [aDVC]

J. L. Fleiss J. Cohen & B. S. Everitt (1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 72:323–37. [rDVC]

J. L. Fleiss J. C. M. Nee & J. R. Landis (1979) Large sample variance of kappa in the case of different sets of raters. Psychological Bulletin 86:974–77. [rDVC]

C. Garcia N. S. Rosenfield R. K. Markowitz J. H. Seashore R. J. Touloukian & D. V. Cicchetti (1987) Appendicitis in children: Accuracy of the barium enema. American Journal of Diseases of Children 141:1309–12. [rDVC]

J. M. Garfunkel R. H. Ulshen H. J. Hamrick & E. E. Lawson (1990) Problems identified by secondary review of accepted manuscripts. Journal of the American Medical Association 263:1369–71. [rDVC, SPL]

B. Gholson & B. Barker (1985) Kuhn, Lakatos, and Laudan: Applications in the history of physics and psychology. American Psychologist 40:755–69. [aDVC]

R. N. Giere (1988). Explaining science: A cognitive approach. University of Chicago Press. [MEG]

R. Gillett (1985) Nominal scale response agreement and rater uncertainty. British Journal of Mathematical and Statistical Psychology 38:5866. [rDVC]

D. W. Goodrich (1945) An analysis of manuscripts received by the editors of the American Sociological Review from May 1, 1944, to September 1, 1945. American Sociological Review1 10:716–25. [aDVC]

L. D. Goodstein & K. L. Brazis (1970) Credibility of psychologists: An empirical study. Psychological Reports 27:835–38. [aDVC, JSA]

S. D. Gottfredson (1978) Evaluating psychology research reports: Dimensions, reliability, and correlates of quality judgments. American Psychologist 33:920–34. [aDVC, RFB, JBG]

D. M. Green R. D. Luce & J. E. Duncan (1977) Variability and sequential effects in magnitude production and estimation of auditory intensity. Perception & Psychophysics 22:450–56. [DL]

D. M. Green R. D. Luce & A. F. Smith (1980) Individual magnitude estimates for various distributions of signal intensity. Perception & Psychophysics 27:483–88. [DL]

A. G. Greenwald (1976) An editorial. Journal of Personality and Social Psychology 33:17. [aDVC]

A. G. Greenwald , A. R. Pratkanis , M. R. Leippe & M. H. Baumgardner (1986) Under what conditions does theory obstruct research progress? Psychological Review 93:216–29. [aDVC]

S. T. Gross (1986) The kappa coefficient of agreement for multiple observers when the number of subjects is small. Biometrics 42:883–93. [rDVC].

W. M. Grove , N. C. Andreasen , P. McDonald-Scott , M. B. Keller & R. W. Shapiro (1981) Reliability studies of psychiatric diagnosis: Theory and practice. Archives of General Psychiatry 38:408–13. [rDVC]

G. H. Guyatt , M. Townsend & L. Berman (1987) A comparison of Likert and visual analogue scales for measuring change in function. Journal of Chronic Diseases 40:1129–33. [rDVC]

J. A. Hall (1979) Author review of reviewers. American Psychologist 34:798. [aDVC]

L. L. Hargens & J. R. Herting (1990a) A new approach to referees' assessments of manuscripts. Social Science Research 19:116. [arDVC, LLH]

L. L. Hargens & J. R. Herting (1990b) Neglected considerations in the analysis of agreement among journal referees. Scientometrics 19:91106. [aDVC, LLH]

S. Harnad (1979) Creative disagreement. The Sciences 19:1820. [aDVC]

S. Harnad (1985)Rational disagreement in peer review. Science, Technology &; Human Values 10(3):5562. [aDVC, LJS].

S. Harnad (1986)Policing the paper chase. Nature 322:2425. [aDVC, JBG].

K. Heskin (1984) The Milwaukee Project: A cautionary comment. American Psychologist 39:1316–17. [aDVC]

D. F. Horrobin (1990) The philosophical basis of peer review and the suppression of innovation. Journal of the American Medical Association 263:1438–41. [JSA]

D. L. Hull (1988) Science as a process. University of Chicago Press. [LLH]

F. J. Ingelfinger (1974) Peer review in biomedical publication. American Journal of Medicine 56:686–92. [aDVC]

W. Jesteadt , C. C. Wier & D. M. Green (1977) Intensity discrimination as a function of frequency and sensation level. Journal of the Acoustical Society of America 61:169–77. [DL]

L. M. Koran (1975b) The reliability of clinical methods, data, and judgments. New England Journal of Medicine 293:695701. [rDVC]

H. C. Kraemer (1982) Estimating false alarms and missed events from interobserver agreement: Comment on Kaye. Psychological Bulletin 92:749–54. [rDVC]

C. A. Kraus (1950) The present state of academic research. Chemical and Engineering News 28:3203–04. [aDVC]

J. Krystal , E. Giller & D. V. Cicchetti (1986) Assessment of alexithymia in post-traumatic stress disorder and psychosomatic illness: Introduction of a reliable measure. Psychosomatic Medicine 48:8494. [rDVC]

D. Laming (1984) The relativity of “absolute” judgments. British Journal of Mathematical and Statistical Psychology 37:152–83. [DL]

G. F. Lawlis & E. Lu (1972) Judgment of counseling process: Reliability, agreement, and error. Psychological Bulletin 78:1720. [aDVC]

D. LeLewis & C. J. Burke (1949) The use and misuse of the chi square test. Psychological Bulletin 46:433–89. [rDVC]

D. Lindsey (1977) Participation and influence in publication review proceedings. American Psychologist 32:379–86. [RFB]

D. Lindsey (1988) Assessing precision in the manuscript review process: A little better than a dice roll. Scientometrics 14:7582. [LLH]

R. D. Luce & D. M. Green (1978) Two tests of a neural attention hypothesis for auditory psychophysics. Perception & Psychophysics 23:363–71. [DL].

R. D. Luce , R. M. Nosofsky , D. M. Green & A. F. Smith (1982) The bow and sequential effects in absolute identification. Perception & Psychophysics 32:397408. [DL].

B. A. Maher (1978) A reader's, writer's, and reviewer's guide to assessing research reports in clinical psychology. Journal of Consulting and Clinical Psychology 46:835–38. [aDVC]

M. J. Mahoney (1977) Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy Research 1:161–75. [aDVC, LDN, PHS, SPL, JSA].

M. J. Mahoney (1985) Open exchange and epistemic progress. American Psychologist 40:2939. [ADVC, JBG, RFB, JF]

M. J. Mahoney (1990) Bias, controversy, and abuse in the study of the scientific publication system. Science, Technology, & Human Values 15:5055. [MJM]

H. W. Marsh & S. Ball (1989) The peer review process used to evaluate manuscripts submitted to academic journals: Interjudgmental reliability. Journal of Experimental Education 57:151–69. [HWM]

Q. McNemar (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12:153–57. [rDVC].

J. E. Mezzich , H. C. Kraemer , D. R. L. Worthington & G. A. Coffman (1981) Assessment of agreement among several raters formulating multiple diagnoses. Journal of Psychiatric Research 16:2939. [rDVC]

J. H. Noble (1974) Peer review: Quality control of applied social research. Science 185:916–21. [aDVC]

E. H. Patterson (1969) Evaluation of manuscripts submitted for publication. American Psychologist 24:73. [aDVC]

I. Pollack (1952) The information of elementary auditory displays. Journal of the Acoustical Society of America 24:745–49. [DL]

L. N. Reid , L. C. Soley & R. D. Wimmer (1981) Replications in advertising research: 1977, 1978, 1979. Journal of Advertising 10:313. [aDVC]

D. Rennie (1986) Guarding the guardians: A conference on editorial peer review. Journal of the American Medical Association 256:2391–92. [MJM]

W. S. Robinson (1957) The statistical measurement of agreement. American Sociological Review 22:1725. [aDVC]

R. G. Romanczyk , R. N. Kent , C. Diament & K. D. O'Leary (1973) Measuring the reliability of observational data: A reactive process. Journal of Applied Analysis 6:175–84. [JDC]

R. Rosenthal (1979) The “file drawer problem” and tolerance for null results Psychological Bulletin 86:638–41. [PHS]

R. Rosenthal & D. B. Rubin (1982) A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology 74:166–69. [RR]

R. Roy (1985) Funding science: The real defects of peer review and an alternative to it. Science, Technology, and Human Values 10:7381. [aDVC]

S. Scarr & B. L. R. Weber (1978) The reliability of reviews for the American Psychologist. American Psychologist 33:935. [aDVC, LJS]

P. H. Schönemann (1971) The minimum average correlation between equivalent sets of uncorrelated factors. Psychometrika 36:2130. [PHS]

P. H. Schönemann (1989) New questions about old heritability estimates. Bulletin of the Psychonomic Society 27:175–78. [PHS]

P. H. Schönemann & M. M. Wang (1972) Some new results on factor indeterminancy. Psychometrika 37:6191. [PHS]

D. W. Sharp (1990) What can and should be done to reduce publication bias?. Journal of the American Medical Association 263:1390–91. [SPL]

R. L. Spitzer & J. L. Fleiss (1974) A reanalysis of the reliability of psychiatric diagnosis. British Journal of Psychiatry 125:341–47. [aDVC]

J. C. Stevens & E. Tulving (1957) Estimations of loudness by a group of untrained observers. American Journal of Psychology 70:600–05. [DL]

W. E. Stumpf (1980) Letters: “Peer” review. Science 207:822–23. [aDVC]

Summary Report of Journal Operations (1989) American Psychologist 44:1070. [aDVC]

H. E. A. Tinsley & D. J. Weiss (1975) Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology 22:358–76. [LLH]

P. Tyrer , D. V. Cicchetti , P. R. Casey , K. Fitzpatrick , R. Oliver , A. Baiter , E. Ciller & L. Harkness (1984) Cross-national reliability study of a schedule for assessing personality disorders. The Journal of Nervous and Mental Disease 172:718–21. [rDVC]

J. S. Uebersax (1982) A generalized kappa coefficient. Educational and Psychological Measurement 42:181–83. [rDVC]

F. R. Volkmar , D. V. Cicchetti , E. Dykens , S. S. Sparrow , J. F. Leckman & D. J. Cohen (1988) An evaluation of the Autism Behavior Checklist. Journal of Autism and Developmental Disorders 18:8197. [rDVC]

P. C. Wason (1960) On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology, 12, 129–40. [MEG]

G. J. Whitehurst (1983) Interrater agreement for reviews for Developmental Review. Developmental Review 3:7378. [aDVC]

G. J. Whitehurst (1984) Interrater agreement for journal manuscript reviews. American Psychologist 39:2228. [aDVC, MED]

E. B. Wilson (1928) Review of “The Abilities of Man, Their Nature and Measurement,” by C. Spearman. Science 67:244–48. [PHS]

W. M. Wolff (1970) A study of criteria for journal manuscripts. American Psychologist 25:3639. [aDVC]

W. M. Wolff (1973) Publication problems in psychology and an explicit evaluation schema for manuscripts. American Psychologist 28:257–61. [aDVC]

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Behavioral and Brain Sciences
  • ISSN: 0140-525X
  • EISSN: 1469-1825
  • URL: /core/journals/behavioral-and-brain-sciences
Please enter your name
Please enter a valid email address
Who would you like to send this to? *