How Strong Is the Evidence? How Clear Are the Conclusions?

Published online by Cambridge University Press:  20 May 2002

Jeannette Ezzo
University of Maryland
Barker Bausell
University of Maryland
Daniel E. Moerman
University of Michigan – Dearborn
Brian Berman
University of Maryland
Victoria Hadhazy
Social and Scientific Systems


Objectives: The objectives of this paper were: a) to determine what can be learned from conclusions of systematic reviews about the evidence base of medicine; and b) to determine whether two readers draw similar conclusions from the same review, and whether these match the authors' conclusions.

Methods: Three methodologists (two per review) rated 160 Cochrane systematic reviews (issue 1, 1998) using pre-established conclusion categories. Disagreements were resolved by discussion to arrive at a consensual score for each review. Reviews' authors were asked to use the same categories to designate the intended conclusion. Interrater agreements were calculated.

Results: Interrater agreement between two readers was 0.68 and 0.72, and between readers and authors, 0.32. The largest categories assigned by methodologists were “positive effect” (22.5%), “insufficient evidence” (21.3%), and “evidence of no effect” (20.0%). The largest categories assigned by authors were “insufficient evidence” (32.4%), “possibly positive” (28.6%), and “positive effect” (26.7%).

Conclusions: The number of reviews indicating that the modern biomedical interventions show either no effect or insufficient evidence is surprisingly high. Intterrater disagreements suggest a surprising degree of subjective interpretation involved in systematic reviews. Where patterns of disagreement emerged between authors and readers, authors tended to be more optimistic in their conclusions than the readers. Policy implications are discussed.

Research Article
© 2001 Cambridge University Press

