Appraisal of meta-analysis

Meta-analysis is a quantitative technique for combining the results from several clinical trials to provide an objective overview of their con clusions, usually in the form of one or more global measures. Large randomised, controlled trials are still considered the gold standard in evaluating the efficacy of clinical interventions (Le Lorier et al, 1997). However, large trials are difficult to undertake especially when the disorder being studied is uncommon, and the best available evidence is often found in a series of small trials. Meta-analysis is a technique frequently used to overcome lack of power of small trials. Unlike a review, when an author appraises some of the existing trials and comes to an individual conclusion about the results, meta-analysis involves combining the data from separate trials and reanalysing these data. With its ability to summarise all information from several trials into a single result, metaanalysis appears attractive to the reader but may lead to false conclusions. Therefore, the results of meta-analyses should not be accepted un critically. This example of a critique of a metaanalysis, taken from our journal club, shows how the evidence-based approach can simplify and clarify a complex article and provide the appraiser with a framework for assessing the validity and results.

having 'no power to control things' and feeling 'mentally and physically tired'.He has been treated for psychotic illnesses in the past and has responded equally well to antipsychotics and antidepressants.
Careful clinical evaluation over time has left us uncertain about whether the diagnosis is schizophrenia or depression.During discussion of this case, the question of whether any investigations, particularly brain imaging, would help to clarify the diagnosis arose.
Meta-analysis is a quantitative technique for combining the results from several clinical trials to provide an objective overview of their con clusions, usually in the form of one or more global measures.Large randomised, controlled trials are still considered the gold standard in evaluating the efficacy of clinical interventions (Le Lorier et al, 1997).However, large trials are difficult to undertake especially when the disorder being studied is uncommon, and the best available evidence is often found in a series of small trials.Meta-analysis is a technique frequently used to overcome lack of power of small trials.Unlike a review, when an author appraises some of the existing trials and comes to an individual conclusion about the results, meta-analysis involves combining the data from separate trials and reanalysing these data.
With its ability to summarise all information from several trials into a single result, metaanalysis appears attractive to the reader but may lead to false conclusions.Therefore, the results of meta-analyses should not be accepted un critically.This example of a critique of a metaanalysis, taken from our journal club, shows how the evidence-based approach can simplify and clarify a complex article and provide the appraiser with a framework for assessing the validity and results.

Does brain imaging help distinguish mood disorders from schizophrenia? Vignette
A 36-year-old musician has been living in a rehabilitation unit for 11 months.He currently complains of 'difficulty in connecting thoughts'.

Question
For patients where the diagnosis is unclear, does computerised tomography (CT) scanning have a role in distinguishing mood disorder from schizophrenia?

Literature search
Knowing that there have been many published studies on brain imaging in schizophrenia, we began by searching the Mediine database for the years 1992 to 1996, specifically looking for a meta-analysis.We felt the best way to identify relevant papers was to limit the 'publication type' menu to meta-analysis.Often, the database will interpret the word or phrase you wish to search and decide on the most appropriate subject heading.For example, we tried to search using the phrase 'CT scan' and the database converted this to 'tomography.X-ray computed' (23350 articles found).Likewise, using the term 'depres sion' yields fewer articles than 'depressive dis order' (6964 articles found).The 'tomography, Xray computed' was limited to meta-analysis, and 12 articles were found.Combining these 12 articles with the 'depressive disorder' search produced one article.
The article identified looked relevant.However, in order to check that no important studies had been missed we scanned all the 12 abstracts of the meta-analyses identified.We then scruti nised the 33 articles identified by combining the 'tomography.X-ray computed' and 'depressive disorder' searches.This identified two compre hensive reviews, but no further meta-analysis.Finally, we repeated the search substituting 'depressive disorder' with 'schizophrenia'.
No additional relevant articles were identified.
The article selected for appraisal was entitled 'Meta-analysis of studies of ventricular enlarge ment and cortical sulcal prominence in mood disorders -comparisons with controls or pa tients with schizophrenia' (Elkis et al, 1995).
The paper was obtained from our library.
Brief outline of the article The paper states that the evidence for higher rates of ventricular enlargement and/or cortical sulcal widening is already well documented in schizophrenia.Similarly conclusive findings are not recorded for mood disorders.This is because a Vote-counting' method is often used which fails to reveal an effect size if the true effect and the sample size are no more than moderate (often the case).A meta-analytic review is more likely to detect such a difference.
To identify relevant studies to be included in the meta-analysis, the authors conducted a Mediine search for 1966 to 1994.To be included, a study had to have compared people with mood disorders with normal controls or with people with schizophrenia on some measure of ventri cular enlargement or cortical sulcal prominence.The data had to be unique and provide sufficient information for the extraction of an effect size.Thirty-three studies were selected and four metaanalytic reviews conducted (see Table 1).
Effect size was calculated as the difference between the mood-disordered group and the comparison group divided by the pooled stand ard deviation.It was deemed 'large, moderate' or 'small', depending on its value.The direction of the effect size was positive if the results supported the hypothesis that those with mood disorders have more ventricular enlargement or more sulcal prominence than the comparison group.Analysis of variance was used to assess the influence of potential moderators of effect size (year, type of control, proportion of mooddisordered subjects who were male, mean age of mood-disordered subjects, proportion of sub jects with unipolar depression, imaging modal ity, interrater reliability, and ease of effect size extraction).
Publication bias was examined graphically by using 'funnel plots'.Total sample size (ordinate) was plotted against effect size (abscissa).Large studies equate with the true population effect, with random scatter increasing about this cen tral effect as study size decreases, producing an inverted funnel.If the pattern of results does not conform to the inverted funnel shape, there is evidence that certain results are not represented.Often this occurs when small negative studies are not published.
Meta-analysis 1 and 2 yielded highly statisti cally significant composite effect sizes (P<0.001),supporting the notion that patients with mood disorders do have larger ventricles and more prominent sulci when compared with controls (Table 1).Meta-analysis 3 indicated a small but highly statistically significant effect size (P=0.002),implying that people with schizo phrenia have greater ventricular enlargement than mood-disordered subjects.There were too few studies in meta-analysis 4 to permit useful quantitative analysis.

Critical appraisal
The article was appraised using guidelines by Oxman et a! (1994).

Are the results valid?
Did the overview address a focused question?Yes.The questions addressed in this paper were clearly defined.
Were the criteria used to select articles for inclusion appropriate?Yes.The authors used strict inclusion criteria for selecting the papers.There had to be assessment of ventricular size or sulcal prominence.The studies had to include subjects with mood disorders, either compared to normal controls or individuals with schizo phrenia, and had to provide adequate statistical information.The data had to be unique, i.e. not a duplicate publication.A drawback to these rigorous criteria is that some studies that did not show significant results were rejected from the analysis because of insufficient statistical information.
Is it likely that relevant studies were missed?Yes.The authors based their analysis on papers identified by searching one electronic database (Mediine).They did not appear to contact researchers in the field for unpublished studies or attempt to search other databases.The failure to identify all studies may be an important source of bias, since small negative studies are often not published, or published in journals not referenced in Mediine.However, the funnel plot of the meta-analyses suggested that several studies with a negative effect size were identified, reducing the likelihood of publication bias.
Publication bias refers to an editorial prefer ence for studies with positive results and a metaanalysis that is unable to locate unpublished studies is prone to overestimate effect size.Metaanalyses may reach erroneous conclusions as a result of publication bias (Egger & Davey-Smith, 1998).Rigorous meta-analysis should include a comprehensive trawl for published trials and an exhaustive search for 'grey data', i.e. results of unpublished trials.Although the authors did not do this, the funnel plot suggests few studies were missed.
Was the validity of the included studies appraised?No.The authors did not give explicit information about assessment of the methodolo gical quality (and hence the validity) of the studies in the meta-analysis.

Were the assessments of studies reproducible?
No. The authors did not appear to use independent reviewers.Selection of studies for a meta-analysis is subject to bias, and should ideally be done by at least two individuals using the same selection criteria.There should be good concordance between the reviewers.
Whai are the results?What are the overall results of the meta-analyses?(a) Mood disorders compared with normal con trols The combined n in meta-analyses 1 and 2 were large: they evaluated nearly 40 studies with over 1800 and 600 subjects respectively (see Table 1).Computed effect sizes demonstrated significantly larger ventricles and more promi nent sulci in those with mood disorders com pared with controls.In both cases, the effect size of around 0.4 corresponds to a moderate difference (Cohen, 1988).In other words, assum ing a normal distribution, 50% of the depressed group would score above the mean measure ment, compared with 33% of controls.In both, vote-counting alone would not display such an effect.Despite the rigorous statistical methods used, the lack of appraisal of validity of individual studies and the potential for selection bias should be kept in mind before too much significance is attached to the results.
(b) Mood disorders compared with schizophre nia Meta-analysis 3 showed that patients with mood disorders have slightly smaller ventricles than those with schizophrenia.Although the combined effect was statistically significant, each of the nine individual studies reported this comparison as non-significant.Only three stu dies met the inclusion criteria for meta-analysis 4 and the authors did not analyse the aggregated results.
How precise are the results?The confidence intervals for the effect sizes are given for metaanalyses 1, 2 and 3 (see Table 1).These show significant effect sizes in all three cases.WiÃ-Ã-the results help me in patient care?Can the results be applied to my patient?However statistically significant these structural differences may be, they are not yet robust enough to be employed as reliable diagnostic markers to change management in the clinical setting.The authors did not set out to refine the diagnostic utility of brain imaging, and the sensitivity, specificity and likelihood ratio of this investigation as a diagnostic test are not calcu lated.Unfortunately, this paper brings us no nearer to a clear-cut diagnosis of our patient.

Were all clinically important outcomes considered?
This question is more relevant to meta-analyses on therapies.Decisions about whether to adopt a particular intervention depend on an appraisal of safety as well as efficacy.The article by Elkis et al (1995) does not consider potential drawbacks to scanning, although in reality these are likely to be minimal.

Are the benefits worth the harms and costs?
There was no explicit analysis of the costs involved.Again, this question is more applicable when a meta-analysis suggests an intervention such as a diagnostic test or a therapy.

Discussion
Accurate searching of electronic databases is a cornerstone of evidence-based medicine.It is important to use the correct medical subject headings (MeSH)and syntax (Greenhalgh, 1997).For example, searching Mediine using the word 'depression', will miss a vast amount of literature logged on the database as 'depressive disorder'.Our search strategy could have been improved by using terms such as 'systematic review' rather than limiting to meta-analysis.Also, in retro spect we should have searched the Cochrane database of systematic reviews (Bero & Rennie, 1995), which is a source of reviews carried out to specified standards.
A common method of reviewing cumulated studies addressing a single question is to employ vote-counting.In this method, a reviewer comes to a conclusion based on the number of studies showing a positive effect, a negative effect or no effect.This is unsound as it ignores differences in research design, sample size and effect size.Meta-analysis circumvents some of these diffi culties, by enabling several small comparable studies to be considered in a way that identifies relatively small effect sizes (as shown here).Despite this rigorous statistical approach, a systematic comparison with randomised clinical trials shows that some meta-analyses may have poor predictive ability (Le LorÃ¬er et al 1997).
The problem with pooling results in this way is, from a clinical standpoint, production of an 'average' population effect may not be relevant to the clinician's particular patient, although the same criticism can be levelled at single trials.Databases may be biased towards publications published in English.Citation bias may occur in that trials that are supportive of a positive effect are more easily published than negative trials.The inclusion of duplicate data may lead to an overestimation of treatment effects.Occasion ally, more than one meta-analysis exists on a particular topic, and concordance of conclusions from meta-analyses may be reassuring.
Although the evidence-based approach high lighted possible weaknesses of this article, notably identification of all relevant studies, possible selection bias and lack of appraisal of the validity of the studies analysed, this metaanalysis should not be rejected out of hand.It probably represents the best available evidence regarding CT scanning and mood disorders.An evidence-based approach certainly helped us to assess the usefulness of this study, and provided a structured method of appraising what was at first sight a complex paper.We have not been able to answer our initial question, probably because, in retrospect, the question was too ambitious, and is not yet answerable.Further more, the fact that the article we identified did not address the question of diagnostic utility in terms of validity or a likelihood ratio further hampered our ability to answer our question.
Most meta-analyses are of controlled trials of a therapeutic intervention and we felt that choos ing one of these would have made this exercise easier.However, meta-analyses of observational studies, including aetiological associations and diagnostic tests are becoming increasingly com mon (Egger et al 1998).and the principle of pooling the data remains the same.The system of appraisal in terms of validity, results and applicability is also unchanged.However.Egger et al (1998) warn that meta-analyses of observa tional studies may produce spurious results and should be treated with caution.
This journal club exercise was not a good example of the evidence-based medicine process at work.However, this is probably a reflection of an over-ambitious question and an absence of relevant articles than a flaw in the process itself.Our understanding of the appraisal of metaanalytic studies improved.

Table 1 .
Summary of results of neuroimaging studies on patients with depression selected for metaanalysis(Elkis et al, 1995) 1. Effect size is expressed as a fraction: difference between groups/pooled standard deviation.