Introduction
Generalized anxiety disorder (GAD) has a significant impact on affected individuals, causing constant worry about many aspects of life and physical symptoms like restlessness and fatigue (Center for Behavioral Health Statistics and Quality, 2016). GAD typically emerges in late adolescence or early adulthood, yet often with a diagnosis delayed for over a decade due to its subtle onset (Kessler et al., Reference Kessler, Keller and Wittchen2001). Even in the absence of comorbid anxiety disorders, GAD results in compromised role functioning, social engagement and overall satisfaction (Mendlowicz and Stein, Reference Mendlowicz and Stein2000). Despite its chronicity, periods of symptom remission offer respite, although recurrence is frequent (Yonkers et al., Reference Yonkers, Bruce, Dyck and Keller2003). Early onset and untreated symptoms are linked to heightened disability, further compounded by comorbidities (Bandelow, Reference Bandelow2015). Many patients endure years of suffering from GAD before receiving a correct diagnosis, with 45% experiencing symptoms for 2 years or longer before referral to a specialist (Bandelow and Michaelis, Reference Bandelow and Michaelis2015).
Before the significant alterations to the GAD definition with the introduction of the Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III) in 1980, GAD was commonly regarded as a ‘wastebasket’ diagnosis, garnering limited academic and clinical scrutiny (Kendell, Reference Kendell2006; Ruscio et al., Reference Ruscio, Hallion, Lim, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade, Borges, Bromet, Bunting, Caldas de Almeida, Demyttenaere, Florescu, de Girolamo, Gureje, Haro, He, Hinkov, Hu, de Jonge, Karam, Lee, Lepine, Levinson, Mneimneh, Navarro-Mateu, Posada-Villa, Slade, Stein, Torres, Uda, Wojtyniak, Kessler, Chatterji and Scott2017). Since the introduction of DSM-III, there have been notable changes in diagnostic criteria, such as reducing symptom criteria items from 18 to 6 in DSM-IV and emphasizing the uncontrollability of worry (Crocq, Reference Crocq2017). In DSM-5, while the definition remains largely similar to its predecessor, one criterion highlights that GAD is typically diagnosed by excluding other anxiety disorders such as panic, phobia, social anxiety or obsessive-compulsive disorder, and it cannot stem directly from stressors or trauma, distinguishing it from adjustment disorders and post-traumatic stress disorder (PTSD; Crocq, Reference Crocq2017).
The diagnostic challenge of GAD is mirrored in global prevalence estimates, as evidenced by Ruscio et al. (Reference Ruscio, Hallion, Lim, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade, Borges, Bromet, Bunting, Caldas de Almeida, Demyttenaere, Florescu, de Girolamo, Gureje, Haro, He, Hinkov, Hu, de Jonge, Karam, Lee, Lepine, Levinson, Mneimneh, Navarro-Mateu, Posada-Villa, Slade, Stein, Torres, Uda, Wojtyniak, Kessler, Chatterji and Scott2017). Conducting a study across 29 nations and regions using the World Health Organization Composite International Diagnostic Interview (CIDI), the authors found a combined lifetime prevalence of GAD of 3.7%, a 12-month prevalence of 1.8% and a 30-day prevalence of 0.8% (Ruscio et al., Reference Ruscio, Hallion, Lim, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade, Borges, Bromet, Bunting, Caldas de Almeida, Demyttenaere, Florescu, de Girolamo, Gureje, Haro, He, Hinkov, Hu, de Jonge, Karam, Lee, Lepine, Levinson, Mneimneh, Navarro-Mateu, Posada-Villa, Slade, Stein, Torres, Uda, Wojtyniak, Kessler, Chatterji and Scott2017). The lifetime prevalence estimates varied widely, ranging from less than 1% to approximately 8% of the population being affected (Ruscio et al., Reference Ruscio, Hallion, Lim, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade, Borges, Bromet, Bunting, Caldas de Almeida, Demyttenaere, Florescu, de Girolamo, Gureje, Haro, He, Hinkov, Hu, de Jonge, Karam, Lee, Lepine, Levinson, Mneimneh, Navarro-Mateu, Posada-Villa, Slade, Stein, Torres, Uda, Wojtyniak, Kessler, Chatterji and Scott2017).
Diagnosing anxiety disorders, particularly GAD, presents challenges in both clinical settings and population-based studies aimed at estimating disease prevalence. GAD is usually diagnosed in clinical settings through a clinical evaluation by a mental health professional, following the DSM-5 diagnostic criteria (Penninx et al., Reference Penninx, Pine, Holmes and Reif2021). In research, especially in large epidemiological population studies, it is more common to use structured or semi-structured interviews and sometimes self-report questionnaires, as these methods are more practical and less costly (Davies et al., Reference Davies, Buckman, Adey, Armour, Bradley, Curzons, Davies, Davis, Goldsmith, Hirsch, Hotopf, Hübel, Jones, Kalsi, Krebs, Lin, Marsh, McAtarsney-Kovacs, McIntosh, Mundy, Monssen, Peel, Rogers, Skelton, Smith, Ter Kuile, Thompson, Veale, Walters, Zahn, Breen and Eley2022; Shabani et al., Reference Shabani, Masoumian, Zamirinejad, Hejri, Pirmorad and Yaghmaeezadeh2021; Stein et al., Reference Stein, Craske, Rothbaum, Chamberlain, Fineberg, Choi, de Jonge, Baldwin and Maj2021). These interviews and tools are considered standard references for diagnosis, but there is no single ‘gold standard’.
Several Standardized Diagnostic Interviews (SDIs) have been developed, with the most prominent ones being the CIDI, the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) and the Mini International Neuropsychiatric Interview (MINI; Verhoeven et al., Reference Verhoeven, Swaab, Carlier, van Hemert, Zitman, Ruhé, Schoevers and Giltay2017). Each of these tools reports varying levels of concordance with diagnostic evaluations using Structured Clinical Interview for DSM-5 (SCID-5), ranging from very low to excellent (Verhoeven et al., Reference Verhoeven, Swaab, Carlier, van Hemert, Zitman, Ruhé, Schoevers and Giltay2017).
Numerous self-reporting questionnaires, such as the Generalized Anxiety Disorder Scale-7 (GAD-7), Hamilton Anxiety Rating Scale (HAM-A) and Hospital Anxiety and Depression Scale (HADS), among others, are employed to assess anxiety levels, each varying in accuracy (Sapra et al., Reference Sapra, Bhandari, Sharma, Chanpura and Lopp2020). Spitzer et al. (Reference Spitzer, Kroenke, Williams and Löwe2006) introduced the GAD-7 as a concise instrument aimed at identifying potential instances of GAD. However, it is crucial to note that these screening tools alone are not intended and not sufficient for diagnosing anxiety disorders, necessitating a confirmatory diagnostic evaluation if a screening test indicates the presence of such a disorder (O’Connor et al., Reference O’Connor, Henninger, Perdue, Coppola, Thomas and Gaynes2023).
All SDIs and self-report questionnaires used to assess the prevalence of GAD exhibit varying degrees of agreement with clinical diagnoses. For example, Verhoeven et al. (Reference Verhoeven, Swaab, Carlier, van Hemert, Zitman, Ruhé, Schoevers and Giltay2017) found that the MINI questionnaire reported a 1.61 times higher prevalence of anxiety disorder compared to clinical assessments, suggesting a notable number of false positive cases in real-life datasets (n = 7,016). Additionally, the widely used cut-off score of 10 for the GAD-7 questionnaire yields a sensitivity of 0.74 (with a range of 0.61–0.84) and a specificity of 0.83 (with a range of 0.68–0.92). It indicates that while the GAD-7 is effective at identifying individuals with GAD, it may also identify some individuals who do not have GAD as positive, highlighting the importance of further investigation and clinical follow-up to ensure accurate diagnoses (Plummer et al., Reference Plummer, Manea, Trepel and McMillan2016). The varying degrees of accuracy of the tools highlight the necessity for careful scrutiny. Given the disparities between tool-based assessments and clinical diagnoses, it is crucial to approach GAD prevalence data with caution and discernment.
Despite the importance of accurately estimating the prevalence of GAD at regional or national levels, there is a lack of comprehensive research that systematically evaluates the array of tools employed for this purpose. Such an evaluation would provide an overview of the existing methods. The extent to which existing prevalence studies use SDIs or self-report tools is unclear, revealing a critical gap in research. To address this gap, more research is needed to understand the methods used in population-level prevalence studies and how they affect accuracy and reliability.
Hence, this study aims to investigate tools used for estimating the prevalence of GAD, defined by DSM-5, ICD-11 or older case definitions, at a population level, specifically examining the choice between self-reported questionnaire-based screening tools and SDIs. By providing an overview of the tools commonly used in epidemiological studies for prevalence estimation, we seek to understand the prevailing trends in research choices. Additionally, our goal is to assess whether researchers consider and report the test accuracy of their chosen tool in their publications. The findings can also inform guidelines for standardizing methodology and reporting practices for prevalence estimation studies.
Methods
We conducted a scoping review to capture the extent to which researchers use SDIs and self-report questionnaires in epidemiological studies that estimate the prevalence of GAD. This scoping review’s search and reporting followed the guidelines set forth by the Preferred Reporting Items for Systematic reviews and Meta-Analysis extension for Scoping Reviews (PRISMA-ScR; Tricco et al., Reference Tricco, Lillie, Zarin, O’Brien, Colquhoun, Levac, Moher, Peters, Horsley, Weeks, Hempel, Akl, Chang, McGowan, Stewart, Hartling, Aldcroft, Wilson, Garritty, Lewin, Godfrey, Macdonald, Langlois, Soares-Weiser, Moriarty, Clifford, Tunçalp and Straus2018) and the JBI methodology for scoping reviews (Peters et al., Reference Peters, Marnie, Tricco, Pollock, Munn, Alexander, McInerney, Godfrey and Khalil2020).
The protocol for this study was registered on the Open Science Framework and is readily accessible at https://doi.org/10.17605/OSF.IO/QUHJE. PRISMA-ScR checklist is provided in Supplementary material 2.
Search strategy
A literature search was performed in MEDLINE, Embase and PsycINFO. Employing Medical Subject Headings (MeSH terms) and keywords associated with ‘Generalized anxiety disorder’, ‘GAD’, ‘Prevalence’ and ‘General population’, the search strategies were defined and translated to each of the selected databases. No specific date range was enforced in the search to capture differences in the tools used to estimate the prevalence of GAD at the population level. In addition, reference lists of included studies were evaluated for potential studies that could also be included in the review. Full search strategies can be found in Supplementary material 1. The initial search was conducted on 8 November 2023, followed by an additional search on 23 February 2025, to identify any newly published articles.
Study/source of evidence selection
The eligibility criteria of the scoping review required studies to utilize nationally or regionally representative samples for population-level estimation of GAD prevalence, and the GAD to be defined based on DSM-5, ICD-11 or older case definitions. Detailed inclusion criteria can be found in the registered protocol. The review examined all population/national-level studies but excluded studies focusing solely on specific sub-groups of the population. Systematic reviews, meta-analyses, case reports and intervention studies were not included in the review. Two reviewers (D.Y. and P.S.M.) assessed all studies individually using Rayyan software (Ouzzani et al., Reference Ouzzani, Hammady, Fedorowicz and Elmagarmid2016) and subsequently reached a consensus on whether to include each one.
Data extraction and analysis plan
Two reviewers (D.Y. and P.S.M.) independently extracted data and collectively reached a consensus on the extracted information. The review synthesized evidence by capturing specific information, which was organized into a table format. The following information was captured: (i) authors, (ii) year of publication, (iii) country, (iv) study design, (v) diagnostic tool(s) utilized and (vi) reporting of accuracy of diagnostic tool(s).
Results
The search strategies initially yielded 537 studies, from which 161 duplicates were identified and, therefore, removed. Following a thorough screening of 376 abstracts, 37 studies were deemed eligible for full-text screening based on the predefined criteria. A total of 339 studies were excluded for not meeting the inclusion criteria, with detailed reasons recorded by reviewers in the Rayyan screening tool. Some studies were excluded for multiple reasons. Six articles were also excluded because they were not published in peer-reviewed journals. Seventeen studies were identified through a reference list screening of the articles that were included, resulting in a total of 48 articles included in the review. Figure 1 depicts the process of study identification, exclusion and inclusion in the final review in the format of the PRISMA flow diagram (Tricco et al., Reference Tricco, Lillie, Zarin, O’Brien, Colquhoun, Levac, Moher, Peters, Horsley, Weeks, Hempel, Akl, Chang, McGowan, Stewart, Hartling, Aldcroft, Wilson, Garritty, Lewin, Godfrey, Macdonald, Langlois, Soares-Weiser, Moriarty, Clifford, Tunçalp and Straus2018). Table 1 provides an overview of the included studies.

Figure 1. PRISMA flow diagram: diagnostic tools in GAD prevalence estimation.
Table 1. Characteristics of studies included in the review

The article publication year ranged from 1994 to 2024. Only one study (Ruscio et al., Reference Ruscio, Hallion, Lim, Aguilar-Gaxiola, Al-Hamzawi, Alonso, Andrade, Borges, Bromet, Bunting, Caldas de Almeida, Demyttenaere, Florescu, de Girolamo, Gureje, Haro, He, Hinkov, Hu, de Jonge, Karam, Lee, Lepine, Levinson, Mneimneh, Navarro-Mateu, Posada-Villa, Slade, Stein, Torres, Uda, Wojtyniak, Kessler, Chatterji and Scott2017) was a cross-national study conducted across 26 countries. Most studies were carried out in Europe (21/48, 43.75%). Most studies (44/48, 91.6%) were cross-sectional, while four were longitudinal. On average, studies had around 19,552 participants, ranging from 800 to 147,261.
Utilized tools
Table 2 illustrates the distribution of tools used in the included studies, while Figure 2 further categorizes these tools by year of publication. The majority of studies (37/48, 77.08%) employed diagnostic interviews for assessing GAD prevalence, with CIDI being the most frequently used tool (25/48, 52.08%), followed by MINI (7/48, 14.58%) and other clinical interviews (4/48, 8.33%). Additionally, 15 studies (16/48, 33.33%) used only self-report questionnaires to estimate prevalence or as a first step in longitudinal studies. The GAD-7 was the most commonly used self-reporting tool (7/48, 14.58%). In three longitudinal studies, self-report questionnaires were utilized in at least one stage of the study. Before 2005 (1994–2004), all studies used diagnostic interviews; however, after 2005, the popularity of self-reported tools grew rapidly. Since 2020, four of the seven population-based studies included in this review have used the GAD-7 to assess the prevalence of GAD, as shown in Figure 2.

Figure 2. Use of tool types for GAD estimation over time.
Table 2. Type of the tool used in the studies

a The percentages presented in the table were calculated using the total number of articles (48) as the denominator.
b CIS-R and AUDADIS-IV.
c CID-S, Bristol Social Adjustment Guides, self-reported questionnaire with three questions, PSWQ-A, SADS, HLEQ and GMS-AGECAT.
Test accuracy reporting
Out of 48 articles included in the review, 30 (62.5%) studies commented or reported on test accuracy in methods, discussion or both sections, but the remaining 18 (37.5%) studies did not. These studies typically mentioned the validity and reliability of the tools, often using measures like the kappa statistic, or they provided references to respective validation studies. In non-English speaking countries, 17 (35.42%) studies also addressed the translation and adaptation of assessment tools to local languages to varying extents. Additionally, four studies acknowledged the imperfect accuracy of the tools in their discussion sections, noting potential limitations such as reduced accuracy of prevalence estimates in certain populations.
All six studies employing GAD-7 as their screening tool for prevalence estimation reported test accuracy in the methods section. Additionally, five out of six studies that used GAD-7 provided explanations for the chosen cut-off score and the sensitivity and specificity of the tool at that cut-off.
Discussion
This scoping review aimed to summarize the methods used in population-level GAD prevalence studies and assess how often researchers reported the accuracy of the tools they used. Among the 48 included studies, a clear methodological trend was observed: while SDIs were commonly used for prevalence estimation, there has been a noticeable shift in recent years towards the use of self-report questionnaires or shorter diagnostic formats.
One of the driving factors behind the popularity of self-report tools, such as the GAD-7 and GAD-2, is their practicality. These tools are quick, low cost and easy to administer, making them attractive for large-scale studies and especially for use in primary care, where time constraints are common (Salinas et al., Reference Salinas, Crenshaw, Gilder and Gray2022). Our findings reflect this shift, showing increased use of self-report questionnaires in more recent studies. However, it is important to emphasize that a positive screen on a self-report tool does not equate to a clinical diagnosis. These tools are designed for screening purposes and should ideally be followed by a structured clinical interview for confirmation. Without this, prevalence estimates based solely on self-reported symptoms risk being either inflated or underestimated.
Test accuracy, particularly sensitivity and specificity, remains a crucial yet inconsistently reported aspect. Although more studies are beginning to mention these metrics, especially when using self-report tools, very few incorporate them analytically. For example, the GAD-7 has a sensitivity of 0.74 at the conventional cut-off of 10 (Plummer et al., Reference Plummer, Manea, Trepel and McMillan2016), suggesting a significant proportion of true cases may go undetected. Moreover, these tools may also capture symptoms of other mental health conditions, such as depression, PTSD or panic disorders (Craske et al., Reference Craske, Stein, Eley, Milad, Holmes, Rapee and Wittchen2017; O’Connor et al., Reference O’Connor, Henninger, Perdue, Coppola, Thomas and Gaynes2023), further muddying the accuracy of prevalence estimates. Therefore, studies must more clearly report and account for the diagnostic limitations of the tools they use.
The COVID-19 pandemic accelerated the use of self-report questionnaires, likely due to the need for remote data collection and limited in-person interaction. Of the five studies published after 2020, four used the GAD-7 and only one used a structured interview (MINI). While this shift was perhaps necessary in a crisis context, it underscores the ongoing reliance on tools whose diagnostic precision is limited. Notably, none of these recent studies adjusted their analyses for test inaccuracies, despite acknowledging the limitations of their tools.
A recent article by Kohrt et al. (Reference Kohrt, Gurung, Singh, Rai, Neupane, Turner, Platt, Sun, Gautam, Luitel and Jordans2025) highlights the global challenges of accurately diagnosing mental health conditions, particularly in low- and middle-income countries. They argue that relying heavily on self-report tools, which tend to produce high false-positive rates, can lead to inflated and misleading prevalence figures. Kohrt et al. (Reference Kohrt, Gurung, Singh, Rai, Neupane, Turner, Platt, Sun, Gautam, Luitel and Jordans2025) recommend combining statistical adjustment of self-report data with structured clinical interviews and adopting contextually relevant ‘good-enough’ diagnostic categories. This approach aligns with our findings. While many of the studies in our review continue to use SDIs, we also observed a growing reliance on self-report questionnaires, often without addressing their diagnostic limitations. These limitations should not only be acknowledged but also incorporated into the analysis, reinforcing the need for more rigorous and methodologically sound approaches in prevalence research.
This need is further supported by methodological work of Fischer et al. (Reference Fischer, Zocholl, Rauch, Levis, Benedetti, Thombs, Rose and Kostoulas2023), which demonstrates the value of incorporating test accuracy into prevalence estimation using Bayesian latent class models. Applying such methods could help generate more valid and nuanced estimates of mental disorder prevalence. We therefore advocate for both reporting test accuracy and incorporating it into analytical frameworks. Strengthening this aspect of methodology would improve the reliability of findings and provide more actionable evidence for mental health policy and practice.
Limitations
Our scoping review has some limitations. Firstly, despite efforts to create a comprehensive search strategy, it is possible that some relevant studies may have been missed, especially if they were published in non-indexed journals or were not available in the selected databases. Additionally, restricting the search to specific databases may have omitted relevant studies indexed elsewhere. Even though there was no restriction on our search strategy in terms of the geographical region or language used in the publication, most of the studies included in the review were conducted in middle- to high-income countries, which can also affect the choice of the tool used to capture the prevalence of GAD.
Conclusion
This scoping review provides a comprehensive overview of the tools utilized in assessing the prevalence of GAD at a population level. While SDIs are still widely utilized, there is a noticeable trend towards using self-report questionnaires, potentially attributable to their convenience and cost-effectiveness. However, despite the growing number of prevalence studies, few assess or incorporate the test accuracy of the tools into their analyses.
It is imperative to acknowledge that SDIs and self-report questionnaires may yield inaccurate GAD prevalence estimates compared to clinical diagnosis settings and that the agreement between the two types of diagnostic tools varies across different populations. This review underscores the necessity for transparent reporting of test accuracies to enhance the reliability and validity of prevalence estimates. Moreover, where possible, researchers should consider applying methods to incorporate test accuracy into prevalence estimation, particularly when self-report tools are used. Future research should prioritize standardized reporting of test characteristics to improve comparability across studies and inform more accurate public health strategies for GAD prevention and intervention.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S2045796026100432.
Availability of data and materials
All extracted data are presented in the article, and the full search strategy can be accessed in Supplementary material 1.
Acknowledgements
Not applicable.
Author contributions
D.Y., C.I. and T.K. conceptualized the study. D.Y. and P.S.M. conducted the review. D.Y. prepared the first draft of the manuscript, including the tables and figures. All authors reviewed and edited the manuscript.
Financial support
This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.
Competing interests
D.Y. reports receiving funds from the German Academic Exchange Service (DAAD). P.S.M. and C.I. have no potential conflict to disclose. Outside of the work of this review, T.K. reports having received research grants from the Gemeinsamer Bundesausschuss (G-BA – Federal Joint Committee, Germany) and the Bundesministerium für Gesundheit (BMG – Federal Ministry of Health, Germany). He has further received personal compensation from Eli Lilly and Company, Novartis, the BMJ and Frontiers.
Ethical standards
This study is a scoping review, which involves synthesizing information from previously published literature and does not include any new data collection or direct involvement of human participants. As such, approval from an Ethics Committee or Institutional Review Board was not required. Therefore, the Human Ethics and Consent to Participate declarations are not applicable.



