Community interventions for people with complex emotional needs that meet the criteria for personality disorder diagnoses: systematic review of economic evaluations and expert commentary

Background Diagnoses of personality disorder are prevalent among people using community secondary mental health services. Identifying cost-effective community-based interventions is important when working with finite resources. Aims To assess the cost-effectiveness of primary or secondary care community-based interventions for people with complex emotional needs who meet criteria for a diagnosis of personality disorder to inform healthcare policy-making. Method Systematic review (PROSPERO: CRD42020134068) of databases. We included economic evaluations of interventions for adults with complex emotional needs associated with a diagnosis of personality disorder in community mental health settings published before 18 September 2019. Study quality was assessed using the CHEERS statement. Results Eighteen studies were included. The studies mainly evaluated psychotherapeutic interventions. Studies were also identified that evaluated altering the setting in which care was delivered and joint crisis plans. No strong economic evidence to support a single intervention or model of community-based care was identified. Conclusions Robust economic evidence to support a single intervention or model of community-based care for people with complex emotional needs is lacking. The strongest evidence was for dialectical behaviour therapy, with all three identified studies indicating that it is likely to be cost-effective in community settings compared with treatment as usual. More robust evidence is required on the cost-effectiveness of community-based interventions on which decision makers can confidently base guidelines or allocate resources. The evidence should be based on consistent measures of costs and outcomes with sufficient sample sizes to demonstrate impacts on these.

Globally, it is estimated that approximately 8% of the population experience complex emotional needs that meet the diagnostic criteria for personality disorder, which is described broadly as an enduring and pervasive pattern of emotional and cognitive difficulties that affect the way in which a person relates to others or understands themselves. 1 The diagnosis is often associated with high rates of psychiatric comorbidity, 2 high levels of service use 3 and high treatment costs. 4 In European and North American community secondary mental healthcare settings, the prevalence of personality disorder diagnoses is estimated to be above 40%. 5 A range of psychological therapies show some evidence of effectiveness; dialectical behaviour therapy (DBT) and mentalisation-based treatment (MBT) have the most well-established evidence base. 6,7 Although establishing clinical effectiveness of psychological therapies and models of care is crucial, it is also important for decision makers to consider their value for money. Health and social care resources are limited and there are competing demands for scarce resources. It is therefore a growing requirement that assessments of new treatments and therapies include an economic evaluation. 8 Through robust economic evaluation decision makers can consider the opportunity cost of funding one intervention over another, as for every potential gain from a funded intervention, given a limited funding pool, there are potential losses from the next best alternative option that is forgone. 9 Economic evaluation seeks to compare the costs of an intervention against its outcomes. The main approaches are costeffectiveness analysis (CEA), cost-benefit analysis (CBA), costconsequences analysis (CCA) and cost-utility analysis (CUA). 10 CEA compares the incremental cost of an intervention with incremental changes in outcomes usually using a condition-specific measure. For complex emotional needs this may include an assessment of functioning or distress. CBA measures outcomes using monetary units and is uncommon in healthcare evaluations, partly owing to the challenge of valuing health effects in monetary terms. CCA presents costs against a number of outcome measures to support multi-criterion decision-making. Finally, CUA is a type of CEA and it compares the incremental cost of an intervention with changes in a measure of health that allows comparisons across illness areas. The measure used is usually the quality-adjusted life-year (QALY), where one QALY is equivalent to 1 year of life in perfect health. Results can be compared against a willingness to pay (WTP) threshold which indicates how much a society will pay for a QALY. The National Institute for Health and Care Excellence (NICE) uses a threshold of up to £30 000 in England. 11 We are aware of two systematic reviews of economic evaluations of interventions for personality disorder. 3 The scope of these reviews was limited to interventions for people with a diagnosis of borderline personality disorder only and the most recent papers included were published in 2014. The previous reviews included cost comparisons and interventions delivered in non-community settings. The aim of this paper is to assess the cost-effectiveness of community-based interventions for people with complex emotional needs that meet criteria for diagnoses of personality disorder compared with usual care (as defined by each study) or other active interventions and to include more recent evaluations. To do this we systematically review and assess the quality of relevant economic evaluations. The review forms part of a wider programme of work on complex emotional needs funded by the National Institute for Health Research in England. As is standard practice with our programme of work, we have included two independent commentaries on the review from people with lived experience.

Method
This systematic literature review was carried out in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist. 12 A protocol for the search strategy and methods has been registered with PROSPERO (CRD42020134068). We use 'complex emotional needs' as our preferred terminology rather than 'personality disorder'. This choice seeks to recognise that, although some people find the diagnostic term helpful, many find it to be invalidating and stigmatising. However, we do use the term personality disorder when referring to original papers and search strategy inclusion criteria (diagnostic inclusion criterion for the service model described), as appropriate.

Eligibility criteria
Inclusion criteria (detailed in the Appendix near the end of this paper) required that studies: (a) included an economic evaluation where both costs and outcomes are reported and where a formal or informal link between costs and outcomes is made; (b) included adults attending a general or specialist community mental health service with complex emotional needs that meet the diagnostic criteria for personality disorder (other than antisocial personality disorder); (c) evaluated community-based services, treatments and interventions; and (d) compared cost-effectiveness with usual care or another active intervention.
Publication date was limited to the period from 1 January 1990 to 18 September 2019, and language of publication to English. Studies of interventions for adults with diagnosed antisocial personality disorder were excluded, as there are often high levels of forensic service use, which would limit or prevent the implementation of community-based interventions. Cost-minimisation studies were excluded as they require interventions to have an equivalent effect, which would be incredibly unlikely in this setting. Communitybased services, treatments and interventions are defined as any services, treatments and interventions provided outside of an in-patient setting and excluding pharmacological treatments.

Selection process and data collection
The search was first run on 18 September 2019. Three authors (J.B., T.S. and R.S.) independently screened titles and abstracts against the inclusion criteria using the systematic review web application Rayyan QCRI (Rayyan Systems Inc., State of Qatar; see www. rayyan.ai), which has a masking ('blinding') feature. A fourth author (P.M.) independently reviewed and resolved disagreements. Full-text screening was conducted by J.B. and T.S. independently (with P.M. resolving disagreements) using eligibility forms pertaining to each aspect of the inclusion criteria. Reviewers were not masked to authors, institutions or journals. We did not conduct a formal test for interrater reliability.
Data were extracted into standard forms by J.B.. Duplicate extraction was completed by T.S. for 25% of papers and P.M. resolved disagreements. Data were collected on the characteristics, methods and results of each study. The study characteristics data included the country, funder, intervention (including levels of contact), the comparator, study design, economic approach (i.e. whether a CEA or CUA was used), a summary of the inclusion and exclusion criteria, the sample size and sample characteristics (mean age, proportion female and proportion employed).
Data on the study methods included the follow-up period, perspective, costing (approach to recording service use, valuing the intervention, sources of unit costs and discounting) and outcomes (main economic outcome, QALY derivation if relevant and discounting). The perspective is the range of costs included in an analysis. A narrow perspective would only include costs relevant to the service provider. A societal perspective would include costs to other parts of society, such as the criminal justice system, social services, a patient's employer or voluntary/informal care. Inclusion of all impacts would be a perspective that is rarely achieved. In this paper, we generally use the term 'broad' to define a perspective that is beyond the health and social care sector. Resource use measurement methods include reviewing hospital records or conducting patient interviews/surveys. The source of unit costs includes national routine data, information from local providers, and fees or tariffs. The outcome is the measure of effectiveness used in the evaluation. This can be specific to a condition, such as an improvement on a condition-specific scale, or generic, such as QALYs. For both costs and outcomes, data on any discounting has been recorded. Discounting is used to reflect the perceived lower value attached to future cost and outcomes, owing to the widely accepted view of positive time preference. It is conventional to discount only costs and outcomes that occur beyond a 1-year period. 9

Data analysis
Meta-analysis was not undertaken as the economic evaluations are context specific and the review does not focus on one particular outcome. Narrative synthesis was undertaken to describe the main study findings based on the reporting of the data items described above, including outcomes and costs for the intervention and comparator groups, and the cost-utility or cost-effectiveness results. Results were often expressed as incremental costeffectiveness or cost-utility ratios. These ratios are typically derived from mean values and, owing to variation around the mean, consideration should be given to uncertainty in these estimates.

Quality assessment
Study quality was assessed using the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement, which sets out a standard for the reporting of economic evaluations. 13 The statement evaluates the quality of reporting rather than the quality of evidence. Two statements relate to the introduction, sixteen to the methods, five to the results, and there is one statement each for the title, abstract, sources of funding and conflicts of interest. J.B. appraised 50% of papers and A.C. appraised the remaining 50%. T.S. provided an independent appraisal of 25% of papers and any disagreements were resolved by P.M. Each paper is assigned a score, which corresponds to how many of the 28 statements on the CHEERS checklist the paper complies with (0, does not comply; 0.5, partially complies; 1, fully complies). Some of the statements in the checklist relate to the quality and fullness of reporting, which can be influenced by publications (e.g. the structure of the abstract) rather than the quality of the research itself. A breakdown of each paper's score is reported, so scores for methodology and results can be assessed independently. Figure 1 shows the PRISMA flowchart pertaining to the review; 18 papers were included in the review. One paper presented both CEAs and CUAs for each of two separate interventions. 14 We report on these interventions separately although both were compared with the same control group. P.M. resolved 14 disagreements between J.B. and T.S. during title and abstract screening; 2 during full-text screening; 14 over data extraction; and disagreements over 4 papers in the quality assessment. Table 1 and Table 2 show the characteristics of the studies included in the review.

Study characteristics
Studies were conducted in the UK, Europe and Australia, with sample sizes ranging from 34 to 642 (mean 195.2, s.d. = 167.7). The youngest sample had a mean age of 28.4 years and the oldest sample's mean was 39.2 years. The percentage of women in the samples ranged from 25 to 95%. The lowest rate of employment in a sample was 9.5% and the highest rate of employment was 68.5%, though seven studies did not report this.
Five of the studies included patients with any personality disorder diagnosis. 17,21,22,25,30 Three of the evaluations looked at cluster C personality disorders, described as avoidant, dependent or obsessive-compulsive, 14,28 and one study looked at cluster B personality disorders, described as antisocial, borderline, histrionic or narcissistic. 27 Seven studies only recruited participants with a diagnosis of borderline personality disorder. 16,18,19,23,26,29,31 Nine of the studies required a recent episode of self-harm, attendance at an emergency department or in-patient stay. 15,[17][18][19][20][21]23,29 Two studies only included patients with other comorbid severe mental illness 24 and one of the two also required substance dependence. 25 One study included a general sample of which less than half had complex emotional needs that met personality disorder diagnostic criteria. 20 The interventions evaluated in the 18 studies are described in the supplementary Table 1. Psychotherapeutic interventions included dialectical behaviour therapy (n = 3), types of cognitive therapy (n = 3), nidotherapy (n = 2), schema-focused therapy (n = 2), psychoeducation with problem-solving (n = 1) and mentalisation-based therapy delivered in a day-hospital setting (MBT) (n = 1). Other interventions involved altering the setting in which care was delivered (n = 2), adopting a stepped-care approach (n = 3) and developing joint crisis plans (n = 1). Treatment as usual was the comparative intervention in eight of the included papers. Other comparators included psychotherapy provided by out-patient services (n = 4), psychotherapy provided by day-hospital services (n = 2), care provided by assertive outreach services (n = 2), transference-focused psychotherapy (n = 1), and baseline data (n = 1). More than three-quarters of the included papers employed a randomised controlled trial (RCT) design (n = 14), two used Markov models, one a wait-list controlled trial and one a quasi-experimental design.
In total, 33 different outcome measures were used across the 18 studies (Table 3), with inconsistent approaches to discounting outcomes (5 studies applied a discount rate of between 3 and 5% and 13 studies applied no discounting). Over half of the papers employed a CUA (n = 10). Other approaches used were CCAs (n = 5) and CEAs (n = 5).
Outcomes and costs reported by all studies are reported in Table 4.

Dialectical behaviour therapy
Each of the three studies of DBT used a different method of economic evaluation. Murphy et al conducted a quasi-experimental study using a CUA, 15 Pasieczny & Connor adopted a CCA for their waiting-list controlled trial 16  Two of the studies 16,17 reported significant improvements in clinical outcomes. Pasieczny & Connor reported fewer suicide attempts, emergency department visits, admissions and in-patient days. 16 Priebe et al reported a reduction in self-harm events (rate ratio RR = 0.78; 95% CI 0.76-0.80). 17 Pasieczny & Connor and Murphy et al reported DBT to be less costly than the control condition, although neither reported statistical significance. 15 Murphy et al reported an incremental cost-effectiveness ratio (ICER) for DBT of €1965 per QALY, with the sensitivity analysis suggesting a 62% probability that DBT was cost-effective at a threshold (i.e. how much society is considered to value a QALY) of €45 000 EUR per QALY. Priebe et al reported the ICER for DBT to be £36 per percentage point decrease in the incidence of self-harm. 17 Cognitive therapies Only Tyrer et al reported a change in outcome that favoured cognitive therapies: the incidence rate of a first episode of selfharm per person year was 0.584 in the MACT group and 0.71 in the treatment as usual (TAU) group. 20 The frequency of selfharm events remained relatively unchanged (2.84 per year for MACT versus 2.54 for TAU, after outliers had been excluded). Davidson et al and Palmer et al reported the difference in costs between CBT and TAU, although they were not found to be Community interventions for people with personality disorder significant. Palmer et al found CBT to be less costly but also less effective and reported an ICER of £6376 per QALY. When the WTP is below the ICER, CBT has a higher probability of being cost-effective, but when the WTP is above the ICER, TAU has a higher probability of being cost-effective. At the UK threshold of £30 000 per QALY, the probability of CBT being cost-effective was only ∼25%, whereas the probability of TAU being cost-effective was ∼75%, making TAU the more cost-effective choice. Neither  The model of stepped care varied between studies. Grenyer et al considered a brief intervention clinic delivered within 36 h of a crisis presentation, followed by longer-term community-based psychological therapy. A significant reduction in in-patient days was reported. Sinnaeve et al evaluated 3 months of DBT delivered as a residential intervention, followed by 6 months of out-patient DBT, and compared this with 12 months of standard out-patient DBT. Sinnaeve et al reported QALYs with mean (s.d.) utility scores; these were higher for the step-down DBT care group than the comparator group, but this difference was not statistically significant (mean 0.65 (s.d. = 0.33) for stepped-care DBT versus mean 0.62 (s.d. = 0.28) for out-patient DBT). Kvarstein et al evaluated intensive psychotherapy in day hospitals followed by out-patient individual and group therapy. Lower improvements in functioning were reported (measured on the Global Assessment of Functioning scale, GAF) compared with standard out-patient care.
None of the studies reported significant differences in costs. Sinnaeve

Nidotherapy
Two studies reported economic evaluations of nidotherapy using data from a single pilot RCT in a population with difficult-tomanage needs, personality disorder diagnoses and other comorbid severe mental illness. Ranger et al evaluated the pilot using a CEA. 24 Tyrer et al later evaluated nidotherapy in a subgroup of patients with substance misuse problems, employing a CCA using data from the same trial. 25 The studies used a broad perspective and a follow-up period of 12 months. Ranger et al 24 did not observe any significant effects on outcomes, with a trend towards reduced symptoms in the intervention group. Tyrer et al 25 reported a significant reduction in bed-days in secondary subgroup analyses (54 days compared with 139 for the comparator) but found costs not to be significantly different. Ranger et al still found nidotherapy to be dominant (i.e. it resulted in lower costs and better secondary outcomes than the comparator, although these results were not significant) and even at a threshold of £0 per point of improvement on the Brief Psychiatric Rating Scale it had a 60% likelihood of being cost-effective.

Schema therapy
Two studies evaluated schema therapy using RCTs, and both conducted a CUA and CEA from a broad perspective. 14 Both studies found that schema therapy was dominant with regard to cost per recovered patient and cost per QALY. Bamelis et al showed schema therapy to have an 80% probability of being cost-effective at a WTP threshold of 0, for both cost per recovered patient and cost per QALY. van Asselt et al reported that the likelihood of schema therapy being cost-effective, in terms of recovered patients and QALYs, was over 90% when WTP = 0. Both Bamelis et al and van Asselt et al observed that, as the threshold for recovered patients increased, the probability of schema therapy being costeffective also increased; however, as the threshold for QALYs increased, the likelihood of schema therapy being cost-effective decreased.

Interventions defined by setting
Two evaluations conducted by Soeteman et al employed Markov models to evaluate the effect of care being provided in different settings for samples with either cluster B or cluster C 'personality disorder' diagnoses, 27 using data from the SCEPTRE trial. 32,33 Markov models were used in these CUAs comparing out-patient, day-hospital and in-patient care. One of the studies varied the duration of day-hospital and in-patient care between long and short term. 28 Both studies adopted a broad perspective, as well as repeating the models taking a narrower, health service provider perspective.
Soeteman et al reported that for a cluster B personality disorder group, care at a day hospital was associated with the greatest QALY gains, and for cluster C personality disorder group, short-term inpatient care produced the greatest estimated number of QALYs. For cluster B patients, out-patient care was the least costly and also dominated other settings (being less costly and more effective). For cluster C patients, short-term day-hospital treatment was the least costly option and dominated all long-term options. Estimated cost-effectiveness acceptability curves (CEAC) for cluster B showed out-patient care to be 84% likely to be cost-effective when the threshold was €0 per QALY. For cluster C patients the estimated CEAC showed short-term in-patient care to be 60% likely to be cost-effective at a high WTP threshold of €80 000 per QALY compared with short-term day-hospital care.

Other interventions
Four studies were identified which looked at other interventions: joint crisis plans, 29 psychoeducation with problem-solving, 30 clarification-oriented psychotherapy (COP) 14 and mentalisation-based treatment delivered in a day hospital (MBT). 31 All of them used RCT data and employed a CUA, and two of these studies (Blankers et al 31 and Bamelis et al 14 ) also conducted a CEA. All of these studies took a broad perspective.
Borschmann et al 29 did not report a significant difference in costs or outcomes. However, their economic analysis found joint crisis plans to be dominant over usual care, with over 80% probability of being cost-effective when the threshold was £0 per QALY. This is not unusual as even in the absence of a significant clinical effect, there can still be a finding of cost-effectiveness. Cost-effectiveness combines point estimates of cost and outcome differences and in some cases, as here, can show high probabilities of interventions being cost-effective. McMurran et al 30 found that psychoeducation with problem-solving was also dominant, having a 64% chance of being more cost-effective than usual care with a threshold of £20 000-£30 000 per QALY.
Bamelis et al 14 found COP to be inferior to TAU and schemafocused therapy. Blankers et al 31 found that MBT was dominated by TAU in the CUA but also that in the CEA the ICER per patient in remission was €29 314. At a threshold of €45 000 the chance of MBT being cost-effective (when considering 'remission' as an outcome) was only slight (55%).

Quality appraisal of studies
Study quality, based on the CHEERS checklist, is detailed in Table 5. There was substantial variation between studies. Of the 18 studies, 10 met over 80% of checklist items. On the basis of this metric, the highest-quality studies were the two that evaluated schema therapy, with both having 95% of items met. The three that evaluated DBT were more varied, with the Priebe et al study 17 having the highest rating. This was also the case for the three that evaluated CBT, with Palmer et al 19 scoring highest. The stepped-care evaluations scored the lowest in terms of items met. Overall, studies that scored low tended to do so particularly for reporting of methods.
Community interventions for people with personality disorder

Summary of main findings
This paper has reviewed economic evaluations of community-based interventions for people with complex emotional needs meeting criteria for personality disorder diagnoses. A diverse range of interventions were identified, with no strong evidence found for the cost-effectiveness of any single intervention or model of care.
The strongest evidence was for dialectical behaviour therapy (DBT) delivered in community settings: all three identified studies indicated that the intervention is likely to be cost-effective compared with treatment as usual (TAU). However, consideration should be given to the limited pool of economic evidence when interpreting this finding. The review also identified evidence to support the use of schema-focused therapy, joint crisis plans, stepped care, nidotherapy, psychoeducation with problem-solving, and manual-assisted cognitive therapy (MACT). The authors with lived experience (E.B. and T.J.) highlighted that the review did not identify any economic evaluations that considered patient-led interventions or workforce development interventions, such as those that focus on therapeutic alliance.
Across all 18 studies the evidence was weakened by small sample sizes or poor quality of evidence. Of the 12 studies that found evidence to favour the intervention evaluated, only 5 reported a statistically significant effect (however, even non-significant effects when combined with cost differences can indicate cost-effectiveness). Of the five studies with significant effects, there were limitations regarding reliability of evidence in at least three (Grenyer et al, 21 Pasieczny & Connor 16 and Tyrer et al 25 ). Grenyer et al 21 took a narrow health provider perspective to evaluate their stepped-care intervention, limiting relevance for health policy makers. The drivers for observed differences between groups were unclear and although significant differences in bed-days were observed, there was no difference in admission rates overall (however, reduced bed-days alone may be an important effect). The authors also acknowledge that as there were staff transfers between sites in the cluster RCT design, the reliability of evidence may be limited. Pasieczny & Connor 16 used a non-randomised trial design, which can lead to biased estimates of effect. Tyrer et al 25 relied on a subsample of trial participants: unplanned subgroup analysis of trial data can lead to unreliable results by increasing the risk of chance findings.

Quality of evidence
Although DBT, CBT and stepped care have been the most extensively researched, the numbers of economic evaluations for each of these interventions are relatively few and provide insufficient evidence on which decision makers can confidently base guidelines or allocate resources. This contrasts with other areas, such as depression or schizophrenia, where reasonable agreement about interventions exists. The review found that interventions were sometimes poorly described, limiting reproducibility and usefulness for implementation decisions. Several studies used data from the same trials (BOSCOT 18,19 and SCEPTRE 27,28 ) to report subsequent subgroup analysis or longer-term follow-up. This approach weakens evidence as risk of chance findings is increased and bias may be repeated across more than one study.

Comparison with other reviews
Although not specifically a review of economic evaluations, Brazier et al 34 reviewed the literature on treatments for borderline personality disorder and identified six trials from which they then derived cost-effectiveness estimates. In each analysis costs were combined with a parasuicide outcome and in four analyses they also combined costs with QALYs (three of which used QALYs that they mapped from depression scores). A time horizon of 12 months was applied to their models and both a healthcare and broad perspective were taken. Three of the models produced by They concluded that there were mixed results for DBT, that MBT had promising results and that MACT could not be considered to be cost-effective. These analyses were interesting and helpful, but now are relatively dated. A review of both partial (where only costs are compared) and full (costs and outcomes are compared) economic evaluations was conducted by Brettschneider et al. 4 They included the analyses of Brazier et al 34 as well as some of the studies included here. Overall, they came to similar conclusions as us, i.e. that the evidence base is limited and that the strongest indications are for DBT.
In a review by Meuldijk et al, 3 30 studies were included but some of these were cost comparisons and others were evaluations of hospital-based care. As regards the evaluations (rather than the cost comparisons), they conclude that psychological therapies were likely to be cost-effective although they do recognise that the amount of evidence is greatest for DBT.
Meuldijk et al did include a PhD thesis that was not picked up in our search. This was by Heard and evaluated DBT. 35 The evaluation consisted of two studies. In the first, DBT was compared with usual care and after 1 year did not differ significantly in terms of total costs or cost-effectiveness. In the second, DBT was compared with 'stable psychotherapy'. Again, total costs and cost-effectiveness were not significantly different between the groups.

Limitations
Our search strategy included terms related to community and outpatient care and this may have excluded some relevant studies that did not mention the setting in the title or abstract or where care was delivered in a day service. However, there was good agreement with the other reviews that had been conducted. As has been found in the two previous reviews, 3,4 there was significant heterogeneity in the included studies, which prevented meta-analysis of findings. The results are therefore reported in narrative form, making interpretation of findings more complex. In addition, the comparator was most often 'usual care', which is context specific and rarely described in detail, limiting generalisability. The review scope did not intend to include interventions for people with a diagnosis of antisocial personality disorder. However, some of the identified studies considered samples with 'any personality disorder diagnosis', another required that patients had other comorbid severe mental illness, and another sample included individuals with complex emotional needs meeting personality disorder diagnostic criteria as well as those not meeting these criteria. Given the general nature of the samples, and because these studies met all other inclusion criteria, they were included in the review. However, this presents a key limitation for identifying and interpreting findings specific to the population of interest and highlights that diagnostic categorisation, although essential for literature searching, is a limitation of the review scope. Finally, to simplify reporting, we assigned a quality assessment score to each paper, which may risk oversimplifying quality assessment, where it is important to understand the areas of strength and weakness. To address this limitation, we have provided detail on areas of quality assessed (Table 5).

Recommendations for future research
The majority of included studies relied on data from small RCTs. RCTs are often seen as the gold standard of evidence on the efficacy of treatment. Evidence on effectiveness can also be generated through trials but these usually need to be sufficiently large to capture real-world effects. Inclusion of these effects can be informative for decision-making as results are often more generalisable. Economic modelling can also support decision-making and allow for uncertainty in results to be readily examined. Given the scarcity of economic evidence in this area, observational studies alongside the delivery of community services would be valuable. We did not identify such studies in our review but some may exist in the form of reports to specific organisations.
The review found that, although a range of economic approaches were used, studies using CUAs applied the most rigorous and transferable methods. The follow-up periods were longer, the perspectives were broader and resource use was more often collected through a combination of patient report and patient records, improving the accuracy of healthcare utilisation estimates. The main benefit of CUAs is that they enable comparison across disease areas by measuring effect in QALYs. All CUAs in this review used EuroQuol's EQ-5D to derive QALYs (in line with NICE guidelines). The EQ-5D has been criticised for not picking up all important aspects of health, and so for all health conditions consideration must be given to its reliability (does it produce consistent results?), validity (does it measure what it intends to measure?) and sensitivity (does it identify genuine changes in health states?). The EQ-5D has been shown to be moderately responsive to change in symptoms in individuals with personality disorder diagnoses. 36 There is a lack of evidence on its validity in capturing all important aspects of health in this population. 34 Nonetheless, owing to the measure's simplicity and because it can be used to generate QALYs, it has been recommended as appropriate for use in this population. 36 A limitation of CUAs is that QALYs singularly focus on health benefits, which prevents comparisons across sectors (e.g. comparing whether an education intervention may be more cost-effective than a health intervention) and may underestimate benefits where interventions have wider outcomes such as employment. The majority of studies included in this review took a broad perspective in measuring costs and effects. Five studies also used CCAs, presenting costs alongside a number of outcomes. Although this can make interpretation challenging, CCAs may be justified, given the multifaceted nature of the conditions being studied and the potential breadth of effects. A broad perspective and presenting multiple outcomes alongside QALYs may therefore be appropriate where interventions aim to improve outcomes beyond health gain and where cost implications may fall outside of the health and care budget.
Future research should aim to co-produce studies with people with lived experience of diagnoses of personality disorder or complex emotional needs to help ensure that important outcomes, as well as costs, are captured from a broad perspective. Choice of outcome measures should also be informed by previous studies and local guidelines. Consistency in measurement and reporting will support the development of a stronger evidence base for community-based interventions as information can be pooled across multiple studies. Researchers may wish to consult the ICHOM Standard Set for Personality Disorders when selecting measures. 37 Future studies should only evaluate therapies that are well developed and hold face validity with people who are using and delivering services, and they should be based on sound theoretical foundations. High-quality research is also needed on models of care, including the intensity and duration of interventions, with clear description of how services are configured to enable reproducibility. Researchers may wish to refer to a forthcoming paper by Finamore et al 'Systematic scoping of community-based service models for people with personality disorder' (personal communication, 2021; further details are available from P.M. on request) to support a standardised description and understanding of models of care. The logic models articulated in that paper may also support the identification of relevant outcome measures. Finally, the review identified a gap in the types of intervention evaluated, with no patient-led interventions or workforce interventions identified. These two areas should be considered key areas for future economic research.

Lived experience commentary by Eva Broeckelmann
'Having spent years struggling to access suitable treatment for CEN [complex emotional needs], I am pleased to see the lack of robust economic evidence to support a single intervention or model of community-based care. Especially considering the inherently heterogeneous nature of any given sample with a 'PD' [personality disorder] diagnosiswhere e.g. two people with 'BPD' [bipolar personality disorder] may have no more than one highly subjective trait in commonthere simply is no 'one-size-fits-all' approach. Therefore, any study results must be treated with utmost caution before making generic policy decisions that indiscriminately apply to everyone with this label. Despite being recommended by NICE, I do not consider the EQ-5D to be an appropriate outcome measure for this population. CEN symptoms can fluctuate so frequently that any snapshot in time on such a rudimentary measure is virtually meaningless for assessing long-term recovery, and it will be crucial to develop more nuanced alternatives for future studies. Ultimately, the policy aim to prioritise specific interventions based on cost-effectiveness neglects the fact that strong therapeutic relationships with trusted clinicians are considerably more important for treatment success than the particular modality used. Thus, such comparisons are of limited value, whereas it could eventually be much more cost-effective to focus research and resources on improved training for clinicians instead.'

Lived experience commentary by Tamar Jeynes
'It is interesting that in life much interacts with and mirrors itself. This systematic review endeavoured to be robust: 18 economic evaluations in 19.5 years fit the inclusion criteria. Nine different interventions, each with their own access criteria for service users.
Collecting robust economic data involves resource. This luxury is afforded to better funded interventions, which then make a case for future funding. Many service users do not meet criteria for these interventions.
It is interesting to identify what is missing. There are no economic evaluations of survivor-led or co-produced interventions, such as co-facilitation of therapies, survivor-led networks or crisis houses. These are more likely to not have inclusion criteria, reaching people that other interventions cannot. They often do not have the resource to conduct economic evaluations. Many have ceased to exist. Excluded service users only have access to costly emergency services during crisis. Being excluded depletes the internal resources needed to value their worth. Many cease to exist. It is interesting that in life much interacts with and mirrors itself. When an intervention cannot demonstrate its worth, it cannot access funding, which includes resource to measure its worth. When the only interventions that can demonstrate their worth are ones with inclusion criteria, excluded service users remain without a service.
between costs and outcomes is made. This will include trials, observational studies and models (b) The population of interest are adults with complex emotional needs who are in contact with community mental health services. This will include evaluations of services for those with a diagnosis of personality disorder (excluding antisocial) or services focused specifically on trauma-related care (c) Evaluations of community-based services/treatments/ interventions (d) Evaluations with usual care (which may include 'do nothing' or waiting lists) or another active intervention as a comparator Exclusion criteria (a) Studies that are only reported as conference abstracts, letters, protocols, reviews or editorials (b) Evaluations of interventions only provided in in-patient or forensic settings (c) Evaluations of pharmacological interventions (d) Studies that do not report costs (e) Studies that do not report outcomes (f) Studies with no comparator (g) Studies not reported in English