Hostname: page-component-6d856f89d9-nr6nt Total loading time: 0 Render date: 2024-07-16T08:23:24.507Z Has data issue: false hasContentIssue false

Benchmarking Routine Psychological Services: A Discussion of Challenges and Methods

Published online by Cambridge University Press:  24 October 2012

Jaime Delgadillo*
Leeds Community Healthcare NHS Trust, UK
Dean McMillan
University of York, UK
Chris Leach
South West Yorkshire Partnership NHS Foundation Trust and University of Huddersfield, UK
Mike Lucock
South West Yorkshire Partnership NHS Foundation Trust and University of Huddersfield, UK
Simon Gilbody
University of York, UK
Nick Wood
Leeds Community Healthcare NHS Trust, UK
Reprint requests to Jaime Delgadillo, Leeds Community Healthcare NHS Trust - Primary Care Mental Health, The Reginald Centre, Second Floor, 263 Chapeltown Road, Leeds LS7 3EX, UK. E-mail:


Background: Policy developments in recent years have led to important changes in the level of access to evidence-based psychological treatments. Several methods have been used to investigate the effectiveness of these treatments in routine care, with different approaches to outcome definition and data analysis. Aims: To present a review of challenges and methods for the evaluation of evidence-based treatments delivered in routine mental healthcare. This is followed by a case example of a benchmarking method applied in primary care. Method: High, average and poor performance benchmarks were calculated through a meta-analysis of published data from services working under the Improving Access to Psychological Therapies (IAPT) Programme in England. Pre-post treatment effect sizes (ES) and confidence intervals were estimated to illustrate a benchmarking method enabling services to evaluate routine clinical outcomes. Results: High, average and poor performance ES for routine IAPT services were estimated to be 0.91, 0.73 and 0.46 for depression (using PHQ-9) and 1.02, 0.78 and 0.52 for anxiety (using GAD-7). Data from one specific IAPT service exemplify how to evaluate and contextualize routine clinical performance against these benchmarks. Conclusions: The main contribution of this report is to summarize key recommendations for the selection of an adequate set of psychometric measures, the operational definition of outcomes, and the statistical evaluation of clinical performance. A benchmarking method is also presented, which may enable a robust evaluation of clinical performance against national benchmarks. Some limitations concerned significant heterogeneity among data sources, and wide variations in ES and data completeness.

Research Article
Copyright © British Association for Behavioural and Cognitive Psychotherapies 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Blais, M. A., Sinclair, S. J., Baity, M. R., Worth, J., Weiss, A. P., Ball, L. A., et al. (2012). Measuring outcomes in adult outpatient psychiatry. Clinical Psychology and Psychotherapy, 19, 203213.CrossRefGoogle ScholarPubMed
Chambless, D. L., Baker, M. J., Baucom, D. H., Beutler, L. E., Calhoun, K. S., Crits-Christoph, P., et al. (1998). Update on empirically validated therapies, II. The Clinical Psychologist, 51, 316.Google Scholar
Chambless, D. L. and Ollendick, T. H. (2001). Empirically supported psychological interventions: controversies and evidence. Annual Review of Psychology, 52, 685716.Google Scholar
Clark, D. M., Layard, R., Smithies, R., Richards, D. A., Suckling, R. and Wright, B. (2009). Improving access to psychological therapy: initial evaluation of two UK demonstration sites. Behaviour Research and Therapy, 47, 910920.CrossRefGoogle ScholarPubMed
Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 101129.CrossRefGoogle Scholar
Cohen, J. (1998). Statistical Power Analysis for the Behavioural Sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
Das-Munshi, J., Goldberg, D., Bebbington, P. E., Bhugra, D. K., Brugha, T. S. and Dewey, M. E. (2008). Public health significance of mixed anxiety and depression: beyond current classification. British Journal of Psychiatry, 192, 171177.Google Scholar
Dowrick, C., Leydon, G. M., McBride, A., Howe, A., Burgess, H., Clarke, P., et al. (2009). Patients’ and doctors’ views on depression severity questionnaires incentivised in UK quality and outcomes framework: qualitative study. British Medical Journal, 338, b663.Google Scholar
Evans, C., Margison, F. and Barkham, M. (1998). The contribution of reliable and clinically significant change methods to evidence-based mental health. Evidence-based Mental Health, 1, 7072.Google Scholar
Franklin, M. E. and DeRubeis, R. J. (2006). Are efficacious laboratory-validated treatments readily transportable to clinical practice? In Norcross, J. C., Beutler, L. E. and Levant, R. F. (Eds.), Evidence-Based Practices in Mental Health: debate and dialogue on fundamental questions (pp. 375383). Washington DC: American Psychological Association.Google Scholar
Gilbody, S., Richards, D. and Barkham, M. (2007). Diagnosing depression in primary care using self-completed instruments: UK validation of PHQ–9 and CORE–OM. British Journal of General Practice, 57, 650652.Google Scholar
Glover, G., Webb, M. and Evison, F. (2010). Improving Access to Psychological Therapies: a review of the progress made by sites in the first rollout year. Stockton on Tees: North East Public Health Observatory.Google Scholar
Gyani, A., Shafran, R., Layard, R. and Clark, D. M. (2011). Enhancing Recovery Rates in IAPT Services: lessons from analysis of the Year One data. London: University of Reading, London School of Economics and Kings College London.Google Scholar
Higgins, J. P. T., Thompson, S. G., Deeks, J. J. and Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327, 557560.Google Scholar
IAPT National Programme Team (2011). The IAPT Data Handbook: guidance on recording and monitoring outcomes to support local evidence-based practice. Version 2.0. London: National IAPT Programme Team.Google Scholar
Jacobson, N. S. and Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 1219.Google Scholar
Kessler, R. C., Berglund, P., Demler, O., Jin, R., Koretz, D., Merikangas, K. R., et al. (2003). The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). Journal of the American Medical Association, 289, 30953105.CrossRefGoogle ScholarPubMed
Kroenke, K., Spitzer, R. L. and Williams, J. B. W. (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606613.Google Scholar
Kroenke, K., Spitzer, R. L., Williams, J. B. W., Monahan, P. O. and Löwe, B. (2007). Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Annals of Internal Medicine, 146, 317325.CrossRefGoogle ScholarPubMed
Kroenke, K., Spitzer, R. L., Williams, J. B. W. and Löwe, B. (2010). The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General Hospital Psychiatry, 32, 345359.Google Scholar
Larsen, R. J. and Fredrickson, B. L. (1999). Measurement issues in emotion research. In Kahneman, D., Diener, E. and Shwarz, N. (Eds.), Well-being: the foundations of hedonic psychology (pp.4060). New York: Russell Sage Foundation.Google Scholar
Lueger, R. J. and Barkham, M. (2010). Using benchmarks and benchmarking to improve quality of practice and services. In Barkham, M., Hardy, G. E. and Mellor-Clark, J. (Eds.), Developing and Delivering Practice-Based Evidence. Chichester: Wiley.Google Scholar
McAleavey, A. A., Nordberg, S. S., Kraus, D. and Castonguay, L. G. (2012). Errors in treatment outcome monitoring: implications for real-world psychotherapy. Canadian Psychology, 53, 105114.Google Scholar
McCaffrey, R. J. and Westervelt, H. J. (1995). Issues associated with repeated neuropsychological assessments. Neuropsychology Review, 5, 203221.Google Scholar
McManus, S., Meltzer, H., Brugha, T., Bebbington, P. and Jenkins, R. (2009). Adult Psychiatric Morbidity in England, 2007: results of a household survey. Retrieved September 7, 2010 from Google Scholar
McMillan, D., Richards, D. and Gilbody, S. (2010). Defining successful treatment outcome in depression using the PHQ-9: a comparison of methods. Journal of Affective Disorders, 127, 122129.Google Scholar
Minami, T., Wampold, B. E., Serlin, R. C., Kircher, J. C. and Brown, G. S. (2007). Benchmarks for psychotherapy efficacy in adult major depression. Journal of Consulting and Clinical Psychology, 75, 232243.Google Scholar
Minami, T., Serlin, R. C., Wampold, B. E., Kircher, J. C. and Brown, G. S. (2008). Using clinical trials to benchmark effects produced in clinical practice. Quality and Quantity, 42, 513525.Google Scholar
Minami, T., Wampold, B. E., Serlin, R. C., Hamilton, E. G., Brown, G. S. and Kircher, J. C. (2008). Benchmarking the effectiveness of psychotherapy treatment for adult depression in a managed care environment: a preliminary study. Journal of Consulting and Clinical Psychology, 76, 116124.Google Scholar
National Institute for Health and Clinical Excellence (2007a). Anxiety (amended): management of anxiety (panic disorder, with or without agoraphobia, and generalized anxiety disorder) in adults in primary, secondary and community care. London: NICE.Google Scholar
National Institute for Health and Clinical Excellence (2007b). Depression (amended): management of depression in primary and secondary care. London: NICE.Google Scholar
National Institute for Health and Clinical Excellence (2011). Common Mental Health Disorders: identification and pathways to care. London: National Collaborating Centre for Mental Health.Google Scholar
National Screening Committee (2003). The UK National Screening Committee's Criteria for Appraising the Viability, Effectiveness and Appropriateness of a Screening Programme. London: HMSO.Google Scholar
Newnham, E. A. and Page, A. C. (2010). Bridging the gap between best evidence and best practice in mental health. Clinical Psychology Review, 30, 127142.Google Scholar
Orkin, F. (2010). Risk stratification, risk adjustment, and other risks. Anesthesiology, 113, 10011003.Google Scholar
Richards, D. A. and Suckling, R. (2009). Improving access to psychological therapies: Phase IV prospective cohort study. British Journal of Clinical Psychology, 48, 377396.Google Scholar
Richards, D. A. and Borglin, G. (2011). Implementation of psychological therapies for anxiety and depression in routine practice: two year prospective cohort study. Journal of Affective Disorders, 133, 5160.Google Scholar
Roth, A. and Fonagy, P. (2004). What Works for Whom? A critical review of psychotherapy research (2nd edn). New York: Guilford Press.Google Scholar
Roth, A. D. and Pilling, S. (2007). The Competences Required to Deliver Effective Cognitive and Behavioural Therapy for People with Depression and with Anxiety Disorders. London: Department of Health. Retrieved February 10, 2012 from Google Scholar
Royal College of Psychiatrists (2011). National Audit of Psychological Therapies for Anxiety and Depression, National Report 2011.Google Scholar
Shimokawa, K., Lambert, M. J. and Smart, D. W. (2010). Enhancing treatment outcome of patients at risk of treatment failure: meta-analytic and mega-analytic review of a psychotherapy quality assurance system. Journal of Consulting and Clinical Psychology, 78, 298311.Google Scholar
Siev, J., Huppert, J. and Chambless, D. L. (2009). The dodo bird, treatment technique, and disseminating empirically supported treatments. The Behavior Therapist, 32, 6975.Google Scholar
Spitzer, R., Kroenke, K., Williams, J. B. W. and Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine, 66, 10921097.Google Scholar
Wahl, I., Meyer, B., Löwe, B. and Rose, M. (2010). Measurement of patient reported outcomes in psychotherapy research. Journal of Psychosomatic Research, 68, 676.Google Scholar
Weersing, V. R. and Weisz, J. R. (2002). Community clinic treatment of depressed youth: benchmarking usual care against CBT clinical trials. Journal of Consulting and Clinical Psychology, 70, 299310.Google Scholar
Submit a response


No Comments have been published for this article.