Systematic reviews and (where appropriate) meta-analysis have potentially great value in combining evidence from primary studies to inform policy- and decision-making with more accurate evidence syntheses than those available from narrative reviews. Such methods have been extensively used to summarise treatment evidence in the fields of clinical psychology and psychiatric therapeutics. Relatively little is known of the use of such methods in non-randomised psychiatric epidemiology studies, which have a vital role in aetiological research, planning and policy-making. There are examples of reviews of the use of systematic reviews and meta-analysis in various fields parallel to that performed here, as diverse as acupuncture and animal experiments, the latter including reporting guidelines. Reference Derry, Derry, McQuay and Moore1,Reference Peters, Sutton, Jones, Rushton and Abrams2 More general guidelines for reporting of meta-analyses of observational studies are also available. Reference Stroup, Berlin, Morton, Olkin, Williamson and Rennie3
Systematic review methods have been developed for use in medical research, including observational epidemiological studies. Reference Sutton, Jones, Abrams, Sheldon and Song4 However, the field of mental and behavioural disorders may pose particular challenges to systematic reviewers because of the relatively fragile understanding of mental health outcomes and determinants compared with physical health, Reference Brugha, Bebbington and Jenkins5,Reference Kessler, Abelson, Demler, Escobar, Gibbon and Guyer6 and because biologically based gold standard measures are not available. Outcomes include, for example, anxiety (and ‘stress symptoms’), depression, functional psychosis (hallucinations, delusional beliefs) and physically unexplained somatic symptoms (unexplained pain), which make use of largely theoretically based definitions and measures, which are more difficult to assess validly in comparison with substance use and organically induced disorders. Observational studies also assess risk factors or determinants that are based on similar, theoretically based definitions and measures (adversity, personality, functioning), which could give rise to heterogeneity across different studies. The potential challenge of heterogeneity to study comparability may be due to the range of different measures used to assess such constructs. Reference Thompson and Thompson7 Differences in study design and, in particular, sample design may also limit the comparability of different studies.
We anticipated that the synthesis of results could usefully be applied to two kinds of epidemiological study: methods for synthesising associations with disorder (i.e. risk factors) and methods for synthesising prevalence estimates. Reliable inferences from syntheses of prevalence estimates may be the more difficult of these because of their potential sensitivity to differences between study contexts and methods for measuring whether diagnostic criteria for a given disorder are met, which may result in heterogeneity and caveats related to combinability.
In this paper we report findings of a systematic review of systematic reviews of studies of non-organic mental disorders that make use of representative epidemiological samples, to estimate disease frequency and/or association with potential risk factors. We aimed to review the uses – good and bad – of synthesis methods in published reviews, giving reasons with examples for the recommended use of such methods. Our objective was to review all such methods, and not all the literature in which such methods are used. Having examined systematic reviews published up to 2005, we decided that it would be more useful for a scientific article to compare that initial period with more recent reviews and to consider whether the quality of more recent systematic reviews differed from earlier ones, although this was not an original aim of our study.
We searched EMBASE, MEDLINE and PsycLIT to identify reviews of psychiatric epidemiological studies (including two or more primary population studies) that employed synthesis methods such as systematic review or meta-analysis or other forms of quantitative review. Initially, we searched the period from 1996 to July 2005 (a summary of the findings contributed to a European Public Health Action Report); 8 the second period considered reviews up to April 2009. Search terms are shown in Appendix 1, and were designed to be sensitive (potentially over-inclusive) to avoid missing any relevant articles. The search strategy was developed by one of the authors (a subject specialist, T.S.B.) and an information officer (Mary Edmunds Otter, Department of Health Sciences, University of Leicester, UK). The filter used to identify systematic reviews was adapted from two strategies recommended by the Centre for Reviews and Dissemination. Reference White, Glanville, Lefebvre and Sheldon9
Abstracts were obtained for all papers identified in the electronic searches, and reviewed independently by three authors (R.M., T.S.B., Z.M.) for inclusion, according to criteria drawn up by T.S.B. The review included studies in which the health outcomes included the functional psychoses (ICD-10, Chapter V, code F2), mood disorders (ICD-10 code F3) and neurotic disorders (ICD-10 code F4). Studies of ‘hard outcomes’ for which there are sufficiently clear and established approaches to and examples of synthesis review methods (survival, suicide, organic brain disorder such as dementia and brain damage, drug or alcohol misuse) were not included in this review, although research on these more clearly definable outcomes may also be needed. Further details of inclusion and exclusion criteria are available on request. Any disagreements were discussed by the authors undertaking data extraction, and a consensus reached on whether the paper should be included. Full-text articles were retrieved for all studies identified as potentially relevant from the abstracts, as well as for those where their relevance was unclear. Each full-text paper was reviewed by one author (R.M., Z.M. or T.H.) to establish whether it met inclusion criteria. Where it was unclear whether an article should be included, a separate reviewer (T.S.B.) also read the paper and a consensus was reached on its inclusion. The selected reviews were evaluated using guidelines drawn up by the authors, based on the literature. Reference Stroup, Berlin, Morton, Olkin, Williamson and Rennie3,Reference Sutton, Abrams, Jones, Sheldon and Song10,Reference Egger, Davey Smith, Schneider, Egger, Davey Smith and Altman11 Reviews were classified as prevalence or association studies and according to whether meta-analyses had been performed; papers using weighted averages were included with those described as ‘meta-analyses’ even where the term was not employed.
Classification of reviews
Classification of articles as studies of either prevalence or association was occasionally problematic. Two articles by Saha et al and one from Singer appeared at first glance to be concerned with the association of a mental disorder with an outside factor, and would be expected to be coded accordingly. Reference Saha, Chant, Welham and McGrath12–Reference Singer14 However, on further scrutiny it became clear that these reviews were meta-analysing prevalence data, and then performing further non-meta-analytical methods to produce summary rates. For the purposes of this review these articles have been treated and classified in the tables as prevalence studies.
Coding of reviews
During the process of extracting information from the articles (online Appendix DS1) and its inclusion in our tables there were many instances where reviews failed to mention important issues such as whether publication bias was assessed or how (and whether) study quality was assessed. In these cases the article was coded as not having mentioned that particular method. In addition, several reviews by Saha et al that re-analysed data from two previous reviews did not include this information Reference Saha, Chant, Welham and McGrath12,Reference Saha, Welham, Chant and McGrath13,Reference Saha, Chant and McGrath15 and directed the reader to the original reports. Reference McGrath, Saha, Welham, El Saadi, MacCauley and Chant16,Reference Saha, Chant, Welham and McGrath17 Therefore these reviews have been coded in the same way as original reviews that did not mention such criteria.
We identified 1153 articles from the search strategies, of which we classed 245 as potentially relevant after reading the abstracts (Fig. 1). After these 245 papers had been read in full, 103 were selected as relevant. A further 4 papers were identified through searching the reference lists of the reviewed papers. Of the 107 articles, 32 focused on prevalence only, 17 looked at prevalence and association, and 58 reported only on associations with disorder. One article by McGrath et al, Reference McGrath, Saha, Chant and Welham18 focusing on prevalence, was a review of previous reviews, Reference McGrath, Saha, Welham, El Saadi, MacCauley and Chant16,Reference Saha, Chant, Welham and McGrath17 and is discussed here but not included in the tables of results. The total number of papers therefore represented is 106. A summary of these papers is given as an online supplement.
Authors generally gave comprehensive details of search strategies employed, including details of electronic databases searched, exact search terms, dates covered by search and other methods used. Only four studies gave no details about the search strategy. One hundred and one (95%) reviews searched at least one electronic database, with eighty-six (81%) searching two or more databases (Table 1). The most common database searched was MEDLINE, although PsycLIT, PsycINFO, EMBASE, Ovid, PubMed and CINAHL were also searched frequently.
The majority of studies gave details of inclusion and exclusion criteria used to select individual studies for detailed review (96 studies, 91%). Only 35 gave details of the method used to apply these criteria; of these, 3 used a single reviewer, 28 used several reviewers, reaching a consensus where disagreements arose, and 4 took a sample of studies for consensus review. Just under half of the reviews (50 studies) gave the proportion of initially identified studies that met selection criteria.
Extraction of data and assessment of study quality
Sixty-eight (64%) studies gave details of guidelines used to abstract data, although only forty-five (42%) described the method of abstraction (Table 2). Of the 106 studies, 63 (59%) did not mention the method of abstraction: 38 studies (36%) made no mention of data abstraction at all; 25 (24%) studies mentioned data abstraction but not the method of abstraction; 45 (43%) described the method of abstraction (studies may have mentioned more than one method and therefore the total exceeds 106). Seventy (66%) studies made some mention of study quality, with nineteen formally assessing the quality of primary studies, and four carried out a sensitivity analysis excluding studies of poor quality.
Methods of synthesis
In the 48 papers concerned with meta-analysing prevalence, the most common method (13 papers) was the use of random or fixed effects meta-analysis models. Calculation of means, weighted for study size, was used in 10 studies, whereas 3 studies used a Bayesian approach to synthesis. Reference Waraich, Goldner, Somers and Hsu19–Reference Somers, Goldner, Waraich and Hsu21 Of 22 studies that stated they did not perform a meta-analysis, 13 gave a summary measure such as the median prevalence, or a range of prevalences, with 7 studies giving the prevalence for individual studies in addition. Nine studies gave individual results only.
For studies of risk factors associated with psychiatric disorder or outcome, the most commonly employed methods of analysis were fixed or random effects meta-analysis models (24 studies), with 14 further papers using approaches described as weighted averages and two others the Mantel–Haenszel pooled odds ratio. Reference Roy, Maziade, Labbe and Merette22,Reference Scott, McNeill, Cavanagh, Cannon and Murray23
|Number of studies|
|Number of databases searched|
|Five or more||22|
|Were search terms given?|
|Exact search terms||69|
|Details given, but not exact terms||18|
|No information given||19|
|Other methods used for identifying primary papers a|
|Reference lists of primary papers searched for additional papers||79|
|Contacted authors of primary papers identified in initial search||20|
|Contacted lead researchers in subject area||23|
|Searched unpublished data or websites||11|
a Total exceeds 106 as studies used more than one method. Table does not include study by McGrath et al. Reference McGrath, Saha, Chant and Welham18
Other methods used were mixed effects Poisson regression, Reference Cantor-Graae and Selten24 unconditional logistic regression, Reference Geddes, Verdoux, Takei, Lawrie, Bovet and Eagles25 weighted least squares regression, Reference Mojtabai26,Reference McLeod, Weisz and Wood27 and a Bayesian hierarchical random effects model. Reference Cole and Dendukuri28 Two studies did not specify the exact method of analysis. Reference Robertson, Grace, Wallington and Stewart29,Reference Wohl and Gorwood30 Of the 24 studies that did not employ a meta-analysis, 5 gave individual study results and 2 gave a narrative
|Number of studies|
|Method stated for abstraction||68|
|Method of abstraction a|
|Two or more independent reviewers abstracted data||39|
|Random sample abstracted by two independent reviewers||3|
|Abstracted data checked by another reviewer||3|
|Study quality a|
|Inclusion criteria included factors related to quality||40|
|Rated quality of primary studies||19|
|Discussed quality of primary studies||18|
|Sensitivity analysis excluding studies of poor quality||4|
a Studies may have used more than one method of data abstraction or of assessing quality, so the total exceeds 106. Table does not include data from McGrath et al. Reference McGrath, Saha, Chant and Welham18
summary of the results. Eleven studies gave both a narrative summary and individual study results.
Testing and exploring heterogeneity
Between-study heterogeneity is a common feature of synthesis and has important implications for inferences drawn from it.
|Number of studies|
|Did not mention heterogeneity||5|
|Discussed heterogeneity (but no formal test performed)||10|
|Tested for heterogeneity||58|
|Significant heterogeneity found a||47|
|Random effects model used||30|
|No significant heterogeneity found||11|
a More than one may apply to each study.
Only 5 of the 73 studies that employed a meta-analysis made no reference to heterogeneity. Fifty-eight studies formally tested for heterogeneity (Table 3). The most common tests used were the Q statistic and chi-squared test. Other methods used included testing whether the mean effects variance was null, Reference Cole, Bellavance and Mansour31 testing for an interaction with study design, Reference Geddes, Verdoux, Takei, Lawrie, Bovet and Eagles25 testing whether individual results differed from others, Reference Dickens, McGowan, Clark and Creed32 and the I 2 statistic. Reference King, Semlyen, Tai, Killaspy, Osborn and Popelyuk33–Reference Gilbody, Lightfoot and Sheldon35
Out of the 26 studies that synthesised prevalence, 14 found significant heterogeneity in their estimates and 5 discussed heterogeneity without formally testing for it. Of the 40 studies synthesising association only which tested for heterogeneity, heterogeneity was not statistically significant in 8. However, limitations of tests for heterogeneity due to lack of power are well known. Reference Sutton, Duval, Tweedie, Abrams and Jones36 Significant heterogeneity was found in all but 2 (Singer and Costello et al) of the 16 reviews of prevalence studies in which tests for it were reported. Reference Singer14,Reference Costello, Erkanli and Angold37 In only 4 of these reviews of prevalence studies Reference Ali, Stone, Peters, Davies and Khunti34,Reference Gaynes, Gavin, Meltzer-Brody, Lohr, Swinson and Gartlehner38–Reference Gavin, Gaynes, Lohr, Meltzer-Brody, Gartlehner and Swinson40 was heterogeneity largely eliminated, for example by providing estimates for women only, Reference Fazel and Danesh39 and by removing outliers that appeared to explain heterogeneity, Reference Ali, Stone, Peters, Davies and Khunti34,Reference Gavin, Gaynes, Lohr, Meltzer-Brody, Gartlehner and Swinson40 such as high- and low-risk samples, differences in period or point prevalence definition or in the diagnostic measure used. Reference Gaynes, Gavin, Meltzer-Brody, Lohr, Swinson and Gartlehner38 Several reviews carefully grouped studies that could be said to be homogeneous and then performed a formal meta-analysis taking account of error in each estimate. Reference Goldner, Hsu, Waraich and Somers20,Reference Gaynes, Gavin, Meltzer-Brody, Lohr, Swinson and Gartlehner38,Reference Fazel and Danesh39,Reference Abrams, Horowitz, Zweig, Rosowsky, Abrams and Zweig41,Reference DiMaggio and Galea42 Grouping more homogeneous studies in this way appeared to improve precision as reported in three of these reviews. Reference Goldner, Hsu, Waraich and Somers20,Reference Gaynes, Gavin, Meltzer-Brody, Lohr, Swinson and Gartlehner38,Reference Fazel and Danesh39 However, in one review this led to larger confidence intervals, which might be due to the limited number of studies available for inclusion, with the result that the review was less conclusive. Reference Gaynes, Gavin, Meltzer-Brody, Lohr, Swinson and Gartlehner38
Studies that detected significant heterogeneity made some allowance for this in analysis through the use of random effects models, by removing results that were outliers, or through controlling for moderator variables. Skeem et al made a thorough examination of heterogeneity by removing outliers, exploring the effects of moderator variables and using random effects models as well as performing sensitivity analyses. Reference Skeem, Edens, Camp and Colwell43
Thirty-three studies did not perform a meta-analysis; fifteen of these gave heterogeneity between individual studies as the reason for not doing so, and four stated that they were unable to assess heterogeneity and so did not combine the data in a meta-analysis. The remaining articles either did not mention heterogeneity at all or mentioned it but were unclear as to why they did not attempt a meta-analysis.
|Number of studies|
|Method of detecting publication bias|
|Publication bias detected|
|Analysis to explore bias||16|
|Steps taken to limit bias (e.g. exclusion criteria for review)||9|
|Adjusted for confounders in analysis||35|
|Steps taken to limit confounding (e.g. exclusion criteria for review)||7|
a Studies used more than one method, so total exceeds 106. Table does not include data from McGrath et al. Reference McGrath, Saha, Chant and Welham18
Just over half of the studies did not mention publication bias (Table 4). Nine studies did not assess publication bias in any formal way, but stated that it was unlikely that it would have affected their results. The most frequent reasons given for publication bias being unlikely were that unpublished studies had been included (3 studies) and that the question being asked in the review was not the main research question of the papers identified (2 studies), although in fact publication bias is not limited to the primary outcome. Three studies stated that their results were likely to be affected by publication bias, without formally assessing it. Twenty-nine studies assessed publication bias, with the majority using funnel plots, Reference Light and Pillemar44,Reference Mendelson, Rehkopf and Kubzansky45 or the fail-safe (or ‘file drawer’) method. Reference Rosenthal and Rubin46,Reference Maag and Reid47 Other methods of assessing publication bias were the Begg–Mazumdar adjusted rank correlation test, Reference Bennett, Einarson, Taddio, Koren and Einarson48,Reference Chida, Hamer and Steptoe49 the Egger test, Reference Ali, Stone, Peters, Davies and Khunti34,Reference Dragovic and Hammond50 and looking at the correlation between the variance and log odds ratio. Reference Lorant, Deliege, Eaton, Robert, Philippot and Ansseau51 Of the studies that detected publication bias, only four discussed the effects this might have had on their findings.
Bias and confounding in observational studies
Bias and confounding pose particular challenges for observational studies, and may thus affect the conclusions of meta-analyses and reviews of such studies. Around half of the studies made no mention of any bias (53 studies) or confounding factor (46 studies) that might have affected the results (Table 4). Of those that did mention bias, 9 studies took steps to avoid bias affecting their results through the use of inclusion or exclusion criteria for the papers included in their review, and 16 explored bias by looking at the effect of various factors, including sample type and type of assessment. Reference Singer14,Reference McLeod, Weisz and Wood27,Reference Dickens, McGowan, Clark and Creed32,Reference Abrams, Horowitz, Zweig, Rosowsky, Abrams and Zweig41,Reference Bennett, Einarson, Taddio, Koren and Einarson48,Reference Chida, Hamer and Steptoe49,Reference Aleman, Kahn and Selten52–Reference Porter and Haslam61 Studies adjusted for confounding factors including age, gender, education, work status, social support, severity and duration of symptoms and disability, and whether methodological factors could account for differences including source of recruitment, sample size, diagnostic criteria and type of interview. Several studies also adjusted for publication year and geographic location.
The two-phase review allowed the possibility of noting apparent trends in the quality and characteristics of meta-analyses in psychiatric epidemiology between two time periods. However, few such trends were clear-cut, with perhaps only four being worthy of note. For example, although only around a quarter of reviews (15 of 61 papers) mentioned or discussed confounding in the initial review period, more than half (25 of 45 papers) in the second period either discussed or, at the very least, made a mention of confounding. Reviews in the more recent period were also more likely to give their exact search terms, with few (3 of 45 papers) not giving this information, as opposed to around a quarter (16 of 61) in the earlier period. Reviews in the recent period were also much more likely to state the actual method of abstraction, with around three times as many reviews using two or more independent reviewers to abstract the data compared with reviews in the initial period (20% v. 60%). There were around twice as many reviews (23% v.11%) in the first period that rated the quality of primary studies compared with the recent, update period.
This review found a number of deficiencies in the conduct and reporting of systematic reviews and meta-analyses of observational psychiatric epidemiology studies that could have serious implications for inferences drawn or decisions made on the basis of these reviews. There were frequent omissions of descriptions of method of abstraction, study quality, publication bias, bias and confounding. Many of these deficiencies are simple and potentially remediable. Of the 106 studies examined, 73 performed a meta-analysis but only 58 tested for heterogeneity. There were also some terminological issues, with the most important being the description of the quantitative synthesis method adopted. Thus to the several examples described as meta-analyses by their authors (and yielding weighted average estimates of prevalences or odds ratios) need to be added investigators who calculated weighted averages but did not describe their method as a meta-analysis. Reference Folsom and Jeste62,Reference Kleintjes, Flisher, Fick, Railoun, Lund and Molteno63 As indicated above there were some dimensions of meta-analytical practice which appear to be improving between the two periods considered.
In 47 reviews heterogeneity was detected and reported; this needs to be followed by an exploration of sources of and modelling of heterogeneity. In half of the meta-analyses of studies of association with mental disorder in which heterogeneity was tested it occurred for some but not all risk factors; in a minority, heterogeneity was completely absent. All studies that detected heterogeneity made some allowance for this in analysis, through the use of random effects models, removing results that were outliers, or controlling for moderator variables.
Limitations of meta-analysis
The majority of reviews reported pooled mean prevalence estimates, in most cases with narrow confidence intervals, but the range of prevalence estimates across such heterogeneous studies was considerable (prevalence estimates ranging from 5% to over 40% were not unusual), Reference Folsom and Jeste62,Reference Friedl, Draijer and de Jonge64 with occasionally much lower rates such as in Somers et al whose estimates for individual anxiety disorders ranged from less than 1% to around 6%. Reference Somers, Goldner, Waraich and Hsu21 Use of a single summary estimate in these circumstances in planning and economic projections is not as appropriate or useful as use of a set of more specific estimates reflecting important dimensions of the heterogeneity. Such limitations call for far greater caution in the interpretation of comparisons of prevalence estimates, which are often reported in the scientific literature. Most of these reviews, if carefully studied in detail, provided some additional information on the limitations of the available data; but unless studied by readers with specialist knowledge of the data synthesis methods used, headline overall prevalence estimates could quite easily mislead policy makers and the wider community of those concerned about the burden and cost of mental disorder in the general population. McGrath et al and Saha et al also cautioned against the use of standard methods of meta-analysis compared with simpler methods of representing prevalence findings from different studies, Reference Saha, Chant and McGrath15,Reference McGrath, Saha, Chant and Welham18 for example using median values, Reference McGrath, Saha, Chant and Welham18 and graphical representations of the variation of estimates around the central value. Reference Saha, Chant and McGrath15
Sensitivity analyses have an important role in meta-analysis, for example investigating whether dropping or adding primary studies with (say) slightly non-standard disease definitions makes a difference. A major obstacle facing reviewers, which many acknowledge or discuss, is the wide variation in the use of instruments to measure outcomes and in sampling methods used. Thus quantitative reviews may sometimes be based on combining different measures which should not always be so combined – ‘apples and oranges’ in a classic of the meta-analytical literature. Reference Myers and Thompson65 One review of post-traumatic stress disorder prevalence studies found a consistent threefold difference in estimates between two commonly used diagnostic methods. Reference Friedl, Draijer and de Jonge64 Systematic reviews are usually useful and valid if well performed, but meta-analyses may be unduly common in this branch of the psychiatry literature given the considerable debate about the validity of meta-analysis in observational studies. Sometimes a single high-quality, well-reported study can be recommended instead of a statistical synthesis of heterogeneous studies.
All systematic reviews found by us, apart from our own, Reference Fryers, Brugha, Morgan, Smith, Hill and Carta66 included studies using different instruments and/or definitions of disorder, or failed to specify how outcomes were defined or measured. Heterogeneity was not significant in only seven studies of associations, four of which addressed a relatively reliable risk factor (season of birth, complications of pregnancy and labour, gender).
Need for guidelines
Currently there are no recommended guidelines for good-quality reporting of meta-analyses of observational studies specifically in psychiatric epidemiology, but more general guidelines for meta-analysis of observational studies such as the Meta-analysis of Observational Studies in Epidemiology (MOOSE) guidelines are relevant. Reference Stroup, Berlin, Morton, Olkin, Williamson and Rennie3 Systematic review and meta-analysis in this (psychiatric) field share many issues with the use of these techniques in other fields in which the results of observational as opposed to experimental primary studies are to be synthesised. In particular, although meta-analysis may improve the precision of estimates – of prevalence, or of odds ratios for association – it does so at the potential cost of conflating results of different primary studies subject to different types and degree of bias, rendering greater precision largely worthless. Reference Sutton, Abrams, Jones, Sheldon and Song67,Reference Egger, Smith and Altman68 Some of these biases may be associated with the use of varying definitions and or measures of outcomes and perhaps exposure variables; in such cases coordinated studies across several centres, using uniform approaches, will almost always be preferable if feasible. Reference Alonso, Angermeyer, Bernert, Bruffaerts, Brugha and Bryson69,Reference Demyttenaere, Bruffaerts, Posada-Villa, Gasquet, Kovess and Lepine70
Comparing the two periods of our review for trends in the quality and characteristics of meta-analyses in psychiatric epidemiology, few clear-cut trends emerged. Although our review allowed for a comparison of two periods, a possible limitation is that the second phase ran only to 2009. Therefore a secular trend of improvement in study execution and reporting since our review period may have taken place and indeed would be hoped for.
Other between-study differences arising because of different populations and contexts between studies may be of interest and importance to identify and quantify. Thus the exploration and explanation of sources of heterogeneity is important here as elsewhere; in these circumstances meta-analyses are better deployed as exploratory tools rather than to establish definitive estimates. Where they are appropriately employed, reference to guidelines and checklists for their implementation should promote quality in their execution. Reference Stroup, Berlin, Morton, Olkin, Williamson and Rennie3,Reference Sutton, Abrams, Jones, Sheldon and Song67,Reference Egger, Smith and Altman68,71,72 Consideration should be given to development of guidelines for systematic review and meta-analysis in psychiatry, developing existing guidelines with more emphasis on the issues of disease definition, measurement instruments used, and population sampling, which are especially important in psychiatry. Initial proposals for guidelines are given in online Appendix DS2. These are modified from a comparable review of the use of systematic reviews and meta-analysis and guidelines on reporting, Reference Peters, Sutton, Jones, Rushton and Abrams2 and on guidelines based on the Quality of Reporting of Meta-analyses (QUOROM) statement, Reference Moher, Eastwood, Olkin, Rennie and Stroup73 which include features of the MOOSE statement and further modifications making the guidelines more specific to our subject.
Recommendations for primary studies
The recommendation by Fryers et al also point to a need, apparently unmet, for recommendations as to the design, conduct, analysis and reporting of the primary studies on which systematic reviews or meta-analyses draw. Reference Fryers, Brugha, Morgan, Smith, Hill and Carta66 These could include:
(a) desirability of prospective registration of primary studies, to include specification of key hypotheses and analyses proposed;
(b) use of (including reporting of results in terms of) standard definitions of disease (defined by collaborative groups and networks) as well as the authors’ own;
(c) full reporting of factors for which allowance has been made in design (e.g. by restricting samples) or analysis (e.g. by inclusion as regression covariates) – or agreement on a standard list always to be used at least in secondary analyses, as in (b) above;
(d) full quantitative reporting of key results to facilitate meta-analyses, such as numbers at risk and numbers of cases for prevalences, numbers at risk and events in each group for odds ratios, etc., and (adjusted) odds ratios and confidence intervals.
Reviewers should also consider whether and when a quantitative synthesis method such as meta-analysis is the correct approach to studies using heterogeneous methods; a minority of reviewers reported their decision not to use such quantitative synthetic methods and employed alternative methods. In the case of systematic reviews of treatment studies for which meta-analytic methods are not appropriate, effects may often still be examined to provide a systematic assessment of the evidence available. 72 Some combinations of research objectives, evidence types, contexts and resources may be better matched by alternative approaches. Reference Boaz, Ashby, Denyer, Egan, Harden and Jones74 Boaz et al indicated wide-ranging interest in synthesis methods of different types in different fields, Reference Boaz, Ashby, Denyer, Egan, Harden and Jones74 and psychiatric epidemiologists might want to explore which would be useful in the challenging area of the study of mental disorders in populations.
This project was funded by the European Commission (contract ), Fondo de Investigación Sanitaria (), the Ministry of Science and Technology, Spain (), the Piedmont Region, Italy, other local agencies and by an unrestricted educational grant from GlaxoSmithKline. The European Study of the Epidemiology of Mental Disorders (ESEMeD) survey is carried out in conjunction with the World Health Organization World Mental Health Survey Initiative.
|To identify||Search terms|
|Studies of psychiatric illness||Mental disorders (exploded term) OR mental illness OR mental disease OR psychiatr*|
|Observational studies||Birth cohort OR longitudinal study OR cohort analysis OR epidemiologic methods OR follow up studies OR follow-up studies OR prospective studies OR incidence stud* OR ep* OR epidemiology OR epidemiological studies|
|Systematic reviews or studies using synthetic methods||Meta analys* OR quantitative review* OR quantitative synthes* OR review synthe* OR research synthes* OR “Systematic review” (Keyword) OR quantitative overview|