Violent offenders with personality disorders (PDs) can cause immense harm to victims and society. They commit more serious and violent crimes (Blackburn, Reference Blackburn, Cooke, Forth and Hare1998; Johnson et al., Reference Johnson, Cohen, Smailes, Kasen, Oldham, Skodol and Brook2000), more repeat offenses, and receive longer sentences (Hart, Webster, & Menzies, Reference Hart, Webster and Menzies1993) than other offenders. Antisocial, narcissistic, borderline, and paranoid PDs are the most prevalent PDs in correctional settings (Blackburn, Logan, Donnelly, & Renwick, Reference Blackburn, Logan, Donnelly and Renwick2003; Lindsay et al., Reference Lindsay, Hogue, Taylor, Mooney, Steptoe, Johnston and Smith2006; Nijman, Cima, & Merckelbach, Reference Nijman, Cima and Merckelbach2003), and are associated with high recidivism. Offenders with psychopathy, the severest subgroup of antisocial PD, are at particularly high risk for reoffending (Serin, Reference Serin1996). Incarceration is estimated to cost $80 billion per year in the US (Lewis & Lockwood, Reference Lewis and Lockwood2019), where per capita incarceration is nine times higher than in the Netherlands (Walmsley, Reference Walmsley2013). Reducing recidivism in offenders with PDs and aggression could therefore have significant economic as well as societal benefits (Settumba, Chambers, Shanahan, Schofield, & Butler, Reference Settumba, Chambers, Shanahan, Schofield and Butler2018).
Treatment is used increasingly to lower recidivism risk (Larsen, Jalava, & Griffiths, Reference Larsen, Jalava and Griffiths2020; Papalia, Spivak, Daffern, & Ogloff, Reference Papalia, Spivak, Daffern and Ogloff2019; Wilson, Reference Wilson2014). However, standard treatments for offenders with PD and aggression, usually based on cognitive-behavioral therapy (CBT), only have modest effects (Ross, Quayle, Newman, & Tansey, Reference Ross, Quayle, Newman and Tansey2013; Wilson, Reference Wilson2014; Wong, Gordon, Gu, Lewis, & Olver, Reference Wong, Gordon, Gu, Lewis and Olver2012). Moreover, their focus is on controlling aggressive behavior (Timmerman & Emmelkamp, Reference Timmerman and Emmelkamp2005), but not ameliorating PDs associated with (violent) recidivism. In fact, offenders with severe PDs, especially psychopathic ones, are often assumed impossible to treat (Rice, Harris, & Cormier, Reference Rice, Harris and Cormier1992), despite the lack of methodologically sound randomized clinical trials (RCTs) to test this hypothesis (D'Silva, Duggan, & McCarthy, Reference D'Silva, Duggan and McCarthy2004). Recent meta-analyses and reviews suggest that violent offenders (Papalia et al., Reference Papalia, Spivak, Daffern and Ogloff2019), patients with antisocial PD (Wilson, Reference Wilson2014), and patients with psychopathy (Larsen et al., Reference Larsen, Jalava and Griffiths2020), may be treatable. A few pilot studies of evidence-based PD treatments, such as mentalization-based therapy (Bateman, Fonagy, & Campbell, Reference Bateman, Fonagy, Campbell, Livesley and Larstone2018), dialectical behavior therapy (Linehan et al., Reference Linehan, Comtois, Murray, Brown, Gallop, Heard and Lindenboim2006), and schema therapy (ST) (Young, Arntz, & Giesen-Bloo, Reference Young, Arntz and Giesen-Bloo2006), have shown promising results in forensic patients (Berzins & Trestman, Reference Berzins and Trestman2004; Chakhssi, Kersten, de Ruiter, & Bernstein, Reference Chakhssi, Kersten, de Ruiter and Bernstein2014; Moulden, Mamak, & Chaimowitz, Reference Moulden, Mamak and Chaimowitz2020; Ware, Wilson, Tapp, & Moore, Reference Ware, Wilson, Tapp and Moore2016). However, rigorous RCTs of these treatments in forensic populations are lacking.
Furthermore, questions have been raised about the effectiveness of mandated (i.e. coerced) forensic treatment. In a meta-analysis of 129 diverse offender treatment studies (Parhar, Wormith, Derkzen, & Beauregard, Reference Parhar, Wormith, Derkzen and Beauregard2008), the mean treatment effect was small, d = 0.15, with more coercive treatments being less effective. For the highest levels of coercion, where, as in our RCT, institutional treatment was mandated as a condition of offenders' sentences, there was no mean effect of treatment, d = 0 (Parhar et al., Reference Parhar, Wormith, Derkzen and Beauregard2008).
In the Netherlands, individuals convicted of serious offenses which are (partly) attributable to mental disorders, including PDs, can be sentenced to mandated treatment in high-security hospitals, known as ‘TBS clinics’ (‘Terbeschikkingstelling,’ abbreviated as TBS, means ‘at the discretion of the state’). The TBS sentence protects the public while providing treatment and lowering recidivism risk. This sentence is re-evaluated every 1–2 years and can be indefinite. Patients must show evidence of lowered risk before beginning ‘rehabilitation’, involving gradual re-entry into the community, first supervised leave, and then unsupervised leave. Leave can be withdrawn if patients exhibit high-risk behaviors (e.g. addiction, aggression). TBS patients may refuse or withdraw from therapy; however, this may have consequences (e.g. losing privileges or transfer to another clinic). In non-randomized studies, TBS treatment was superior to incarceration for reducing recidivism (Bregman & Wartna, Reference Bregman and Wartna2010; Hildebrand, Hesper, Spreen, & Nijman, Reference Hildebrand, Hesper, Spreen and Nijman2005).
Schema Therapy (ST; Young, Klosko, and Weishaar, Reference Young, Klosko and Weishaar2003) is an integrative therapy for PDs that has demonstrated effectiveness in non-forensic patients with borderline PD (Giesen-Bloo et al., Reference Giesen-Bloo, Van Dyck, Spinhoven, Van Tilburg, Dirksen, Van Asselt and Arntz2006; Nadort et al., Reference Nadort, Arntz, Smit, Giesen-Bloo, Eikelenboom, Spinhoven and van Dyck2009) and cluster C PDs, narcissistic, histrionic, and paranoid PDs (Bamelis, Evers, Spinhoven, & Arntz, Reference Bamelis, Evers, Spinhoven and Arntz2014). We adapted ST for forensic PD patients (Bernstein, Arntz, & de Vos, Reference Bernstein, Arntz and Vos2007), with the aims of motivating and engaging patients, ameliorating PD symptoms, lowering risks and building strengths (i.e. protective factors), and facilitating re-entry into the community (Bernstein, Clercx, & Keulen-De Vos, Reference Bernstein, Clercx and Keulen-De Vos2019; Chakhssi, Bernstein, & De Ruiter, Reference Chakhssi, Bernstein and De Ruiter2014). ST focuses on changing ingrained patterns of cognition and emotion (early maladaptive schemas; EMS) and maladaptive emotional states (schema modes) (Young et al., Reference Young, Klosko and Weishaar2003), constructs associated with personality disorders in forensic (Keulen-de Vos et al., Reference Keulen-de Vos, Bernstein, Clark, de Vogel, Bogaerts, Slaats and Arntz2017) and non-forensic (Bamelis et al., Reference Bamelis, Evers, Spinhoven and Arntz2014) populations. Consistent with the general aggression model (Gilbert & Daffern, Reference Gilbert and Daffern2011), EMS and schema modes involving attachment insecurity, insufficient self-control, and over-compensatory coping are associated with increased risk for aggression and criminal behavior (Keulen-de Vos et al., Reference Keulen-de Vos, Bernstein, Vanstipelen, de Vogel, Lucker, Slaats and Arntz2016), Cluster B personality disorders and recidivism (Keulen-de Vos et al., Reference Keulen-de Vos, Bernstein, Clark, de Vogel, Bogaerts, Slaats and Arntz2017), sex offending (Chakhssi, De Ruiter, & Bernstein, Reference Chakhssi, De Ruiter and Bernstein2013), and psychopathy (Chakhssi et al., Reference Chakhssi, Bernstein and De Ruiter2014; Keulen-de Vos et al., Reference Keulen-de Vos, Bernstein, Vanstipelen, de Vogel, Lucker, Slaats and Arntz2016).
We conducted a 3-year RCT to test the effectiveness of ST v. treatment-as-usual (TAU) for offenders with PD and aggression in 8 TBS clinics. To our knowledge, this is the first RCT of long-term, intensive therapy for violent offenders with PDs. Patients at each clinic were randomly assigned to ST or another form of individual therapy (TAU), typically CBT, eclectic/integrative therapy, or systemic therapy, which constituted usual therapy at the institution. Patients in both treatment conditions received the full range of other therapeutic modalities (e.g. group therapy, creative arts therapies) that they would normally receive. The primary aims of treatment were to reduce PD symptoms and facilitate rehabilitation into the community. We also assessed several secondary outcomes, including recidivism risks, strengths, institutional violence, EMS, and schema modes.
One-hundred-three male TBS patients with DSM-IV (American Psychiatric Association, 2000) antisocial, borderline, narcissistic, or paranoid PD or Cluster B PD-not-otherwise-specified (PD-NOS, meeting 5 or more Cluster B PD criteria) were randomized to receive ST (N = 54) or TAU (N = 49). Eight of the 12 TBS facilities in the Netherlands participated: de Rooyse Wissel, Venray (N = 19), de Rooyse Wissel, Maastricht (N = 6), van der Hoeven (N = 17), Oostvaarders (N = 7), Mesdag (N = 16), Veldzicht (N = 11), Kijvelanden (N = 18), and FPK Assen (N = 9). We only recruited male patients, because they constitute the vast majority in TBS (Van Gemmert, Van Schijndel, Gordeau, & Casanova, Reference Van Gemmert, Van Schijndel, Gordeau and Casanova2015). Exclusion criteria were (a) current psychotic symptoms, (b) schizophrenia or bipolar disorder, (c) current drug or alcohol dependence (but not abuse), (d) full-scale IQ <80, (e) serious neurological impairment, (f) autistic spectrum disorder, and (g) exclusive pedophilia (excluded because these problems suggested the need for other treatment methods).
The characteristics of the patients are given in Table 1. Nearly all were sentenced for physically violent offenses (54%), sexually violent offenses (26.2%), threats/coercion (16.5%), or arson (2.9%). Eighty-three percent (N = 88) were at high risk for (violent) recidivism (Historic-Clinical-Risk Management Scheme-20-Version 2 (HCR-20V2); Webster, Douglas, Eaves, & Hart, Reference Webster, Douglas, Eaves and Hart1997). Their most prevalent PDs were antisocial, 60%, narcissistic, 21%, and borderline, 17.1% (Structured Interview for DSM-IV Personality Disorders (SIDP-IV, Pfohl, Blum, & Zimmerman, Reference Pfohl, Blum and Zimmerman1997). Fifty-four percent had significant traits of psychopathy (PCL-R ⩾ 25; Psychopathy Checklist-Revised, Hare & Hart, Reference Hare and Hart1993), and 22% were highly psychopathic (PCL-R ⩾ 30).
This study was approved by Maastricht University's Medical Ethical Committee (MEC 06-3-066). Patients were screened and recruited by research assistants at each hospital between January 2007 and March 2012. Patients were free to refuse the study or to withdraw at any time, without consequences. Dropout was defined as withdrawing from the study at the patient's request, and pushout (De Boer, Boon, De Haan, & Vermeiren, Reference De Boer, Boon, De Haan and Vermeiren2016) as withdrawing at treatment or research staff's request (Fig. 1). No incentives for participation were provided. After giving written informed consent and completing baseline assessments, patients were randomized to ST or TAU by a central research assistant with no access to patient information, using an adapted biased urn algorithm (Schouten, Reference Schouten1995), stratified by site. The algorithm assigned patients at random to the two conditions at each site, while ensuring that the overall proportion between conditions was balanced.
Patients were assessed by trained research assistants or diagnostic specialists at baseline and every 6 months for 3 years. In a 3-year study, we could not keep raters, staff, or patients blind to treatment conditions. However, we measured most outcomes using methods (e.g. decisions of leave advisory boards, informant ratings, risk assessments, incident registers) that were independent of patients' self-reports to mitigate response biases (Schretlen & Arkowitz, Reference Schretlen and Arkowitz1990). At the time that we initiated our study, the Dutch RCT register, which is recognized by the World Health Organization, was new and used mainly for medical RCTs, and not psychotherapy RCTs. When we registered our study on 16 January 2008, we had already enrolled the first 28 patients, for a duration of 1–11 months.
ST was initially given twice per week, the recommended frequency for severe PDs (Young et al., Reference Young, Klosko and Weishaar2003), and found effective in borderline PD (Giesen-Bloo et al., Reference Giesen-Bloo, Van Dyck, Spinhoven, Van Tilburg, Dirksen, Van Asselt and Arntz2006; Nadort et al., Reference Nadort, Arntz, Smit, Giesen-Bloo, Eikelenboom, Spinhoven and van Dyck2009). ST sessions were usually reduced to once per week after patients attained leave, mainly for practical reasons, as they spent more time outside of the hospital. The TAU patients at seven of the eight hospitals received individual therapy once per week as their primary therapy, which is usual practice at most TBS clinics. The TAU patients at one hospital (Kijvelanden) received group therapy as the primary therapy because this was the usual practice at this hospital.
The hours of primary, individual therapy per patient differed between the treatment conditions at each time-point between baseline and 18 months, and overall (ST 149.37 [s.d. = 73.66], TAU 101.33 [s.d. = 72.38] U = 770.0, p < 0.001). See online Supplementary Fig. 1 for therapy hours per time-point. However, TAU patients received more hours of auxiliary individual therapies (e.g. creative arts therapies) than ST patients from 12 to 18 months (ST = 3.96 [s.d. = 7.75], TAU = 11.09 [s.d. = 24.66], U = 797.5, p = 0.047). Combining all therapy hours, there were no differences between the treatment conditions at any time-point nor overall (ST = 260.51 [s.d. = 133.97], TAU = 230.35 [s.d. = 157.69], U = 1083.5, p = 0.114 n.s.).
There were no treatment condition differences in medication use. Seventy-six percent (N = 78) of patients were given psychotropic medications, with antipsychotics (N = 41, 39.8%) and antidepressants (N = 34, 33.0%) most commonly prescribed. Thirty-three patients received other medications, including anxiolytics, mood stabilizers, libido-reducing medications, and methadone.
Twenty-two ST therapists and 19 TAU therapists participated, with separate therapists per condition. All therapists held Master's degrees in Psychology; 61% (N = 25) were female, and 36 (87.8%) completed a post-masters clinical specialization. Most TAU therapists were cognitive-behavioral (57.9%, N = 11), integrative/eclectic (26.3%, N = 5), or systemic in orientation (10.5%, N = 2). Therapists reported equivalent years of previous clinical experience (TAU, 13.95 [s.d. = 9.92], ST 11.82 [s.d. = 8.15], t = .755, p = 0.455 n.s.); years of experience with PDs (TAU, 10.87 [s.d. = 8.44], ST 9.80 [s.d. = 7.37], t = .434, p = 0.666.n.s.), number of patients treated (TAU, 160.44 [s.d. = 171.24] v. ST, 71.76 [s.d. = 84.08], U = 138.5, p = 0.154 n.s.), and number of patients with PD treated (TAU, 61.72 [s.d. = 63.52], ST 60.90 [s.d. = 72.23], U = 178.0, p = 0.953 n.s.).
ST therapists received 6–8 days of training by Bernstein and Kersten; attended biweekly supervision groups for 2 h, and practiced with one or two patients before submitting two randomly selected videotaped sessions to an independent rater. Therapists who received mean scores of four or higher on the six-point Schema Therapy Rating Scale (Young et al., Reference Young, Arntz and Giesen-Bloo2006) were deemed competent to treat ST patients for the study. TAU therapists usually also participated in some form of supervision or peer-supervision group, typically for 1 h weekly or biweekly. To check treatment fidelity, we randomly selected 31 tapes, scored blind to treatment condition by two masters-level psychologists, using the Schema Therapy Integrity Scale (Bernstein, Reference Bernstein2007). The interrater reliability for 15 double-scored tapes was ICC = 0.86. ST therapists scored significantly higher than the TAU therapists (t = 7.036, p < 0.001).
Baseline diagnostic assessment included the Structured Clinical Interview for DSM-IV Axis I (SCID-1, First et al., Reference First, Spitzer, Gibbon and Patient1998; Schouten, Reference Schouten1995) and Structured Interview for DSM-IV Personality Disorders (SIDP-IV, Pfohl et al., Reference Pfohl, Blum and Zimmerman1997) to establish DSM-IV-TR Axis I and II disorders, respectively, and the PCL-R (Hare, Reference Hare2003), to assess psychopathy. Mean inter-rater reliability for the SCID-I diagnoses in 20 randomly selected patients was κ = 0.93 with no diagnosis lower than κ = 0.77. In 32 randomly selected patients, the inter-rater reliabilities (ICCs) for the SIDP-IV main diagnoses were antisocial PD, 0.91, borderline PD, 0.79, narcissistic PD, 0.93, paranoid PD, 0.96, and Cluster B PD-NOS, 0.79. Inter-reliability for the PCL-R total scores was ICC = 0.91 in 29 patients.
Primary outcome variables
Rehabilitation was operationalized as obtained permission for supervised or unsupervised leave. By law, patients applying for leave were evaluated with the help of structured risk assessments, usually the HCR-20V2 (Webster et al., Reference Webster, Douglas, Eaves and Hart1997) or a similar Dutch instrument (Hildebrand et al., Reference Hildebrand, Hesper, Spreen and Nijman2005). Leave requests must be approved by a ‘Leave Advisory Board’ within each hospital, with the final decision being made by a committee at the Ministry of Security and Justice, which usually follows the clinic's advice.
PD symptoms were assessed with the patient self-report and informant-report of the Schedule for Nonadaptive and Adaptive Personality (Clark & Vanderbleek, Reference Clark, Vanderbleek and Saklofske2017; Melley, Oltmanns, and Turkheimer, Reference Melley, Oltmanns and Turkheimer2002), adapted and validated for forensic patients (SNAP-FV, Keulen-de-Vos et al., Reference Keulen-de Vos, Bernstein, Clark, Arntz, Lucker and de Spa2011). The SNAP-FV consists of four PD scales (antisocial, narcissistic, borderline, and paranoid PD) and three temperament scales (positive and negative temperament, disinhibition). Patients completed the self-report version of the SNAP-FV. Three staff members per patient completed the informant version at each time point, typically the patients' primary psychotherapists, art therapists, psychiatric nurses, or treatment managers. Cronbach's α ranged from 0.91 to 0.93 for the PD scales and 0.82 to 0.89 for the temperament scales for the self-report SNAP-FV. ICCs ranged from 0.68 to 0.74 for the PD scales and 0.56 to 0.64 for the temperament scales for the informant SNAP-FV.
Secondary outcome variables
Violence risk was assessed with the Historic-Clinical-Risk management scheme-20 (HCR-20, 2nd Edition, Webster et al., Reference Webster, Douglas, Eaves and Hart1997), completed for risk level outside of the hospital setting (De Vogel & De Ruiter, Reference De Vogel and De Ruiter2006; Douglas & Reeves, Reference Douglas, Reeves, Otto and Douglas2010; Douglas, Ogloff, Nicholls, & Grant, Reference Douglas, Ogloff, Nicholls and Grant1999), and Short-Term Assessment of Risk and Treatability (START, Braithwaite, Charette, Crocker, & Reyes, Reference Braithwaite, Charette, Crocker and Reyes2010), which measures short-term dynamic vulnerabilities as well as strengths inside the hospital. For this study, we used the HCR-20V2 total quantitative risk rating, and the START total strengths and vulnerabilities scores. Research-assistants double-scored 31 patient files, which had been blinded to treatment condition. Inter-rater reliabilities between the original (non-blinded) raters and the blinded raters were satisfactory to good: HCR-20V2 total risk score, ICC = 0.82, START strength score, ICC = 0.71, start risk score, ICC = 0.69.
Institutional incidents were registered according to four incident categories: verbal aggression, threats, physical aggression, and violation of hospital rules (i.e. drug use). We created a total incidents score, which was a weighted sum (by severity) of all incidents occurring in any given 6-month-period. Because hospital staff recorded incidents on a daily basis in an electronic database, it was not possible to determine the inter-rater reliabilities for these scores.
Schema modes and EMS were assessed with the Schema Mode Inventory-Revised (SMI-R, Lobbestael, van Vreeswijk, Spinhoven, Schouten, & Arntz, Reference Lobbestael, van Vreeswijk, Spinhoven, Schouten and Arntz2010) and the Young Schema Questionnaire-Short version (YSQ-S, Young, 1998), respectively. We used the total scores for maladaptive modes and healthy modes on the SMI-R, which had reliabilities of α = 0.94 and 0.96, respectively, and the YSQ-S total score, with α of 0.97, in our sample.
General syndrome psychopathology was assessed with the total score of the Symptom Checklist-90-Revised (Derogatis & Unger, Reference Derogatis and Unger2010), which had reliability of α = 0.96.
A-priori power analysis estimated an N of 114, for the power of 0.80 with a two-tailed α = 0.05 and medium effect size (based on previous clinical trials of ST: Giesen-Bloo et al., Reference Giesen-Bloo, Van Dyck, Spinhoven, Van Tilburg, Dirksen, Van Asselt and Arntz2006; Nadort et al., Reference Nadort, Arntz, Smit, Giesen-Bloo, Eikelenboom, Spinhoven and van Dyck2009). Due to a reduced rate of referrals to TBS clinics during the last 3 years of recruitment, the attained N was slightly less. Outcomes were analyzed with (generalized) linear mixed models [(G)LMM; also called multilevel analysis or mixed regression] with the site as random intercept (if this did not create estimation problems) and the appropriate distributional model and link dependent on the dependent variable: binomial with complementary log-log link for survival analysis, negative binomial with log-link for counts (with over-dispersion), gamma with log-link for skewed dimensional variables, and normal with identity link for normally distributed variables. Intention-to-treat analyses were performed for all outcomes, applying (G)LMM on all available data. There was very little missing data for rehabilitation, START, HCR-20, and incidents, because these were independent of patients' self-reports, and staff scoring continued when patients left the study.
Baseline characteristics of the patients are presented in Table 1. There were very few significant differences at baseline following randomization, except for a trend in differences between total baseline SCL-90 score, with the ST group scoring slightly higher on total symptoms than the TAU group (t = 1.687, p = 0.09).
*Notes with Table ST2.
Time was coded 0, 1, 2, 3, 4, 5, 6 thus 1 unit = 6 months. For GLMM survival analysis time was a factor with 6 months intervals (supervised and unsupervised leave) or 1-year intervals (treatment retention). Treatment was coded −0.5 (TAU) and 0.5 (ST). In case of log-link transformed scales, results are in transformed scale. Unless otherwise indicated, LMMs based on a normal distribution were used. Significance levels <0.05 of treatment or treatment × time effects are indicated in bold. Where the main effect of treatment is not reported, this is because it reflects the difference between treatments at time = 0 (i.e. at baseline) which is not relevant for the hypothesis and not of interest because participants were randomized over treatments.
a GLMM survival analysis with a complementary log-log link. Estimated chances are in the original scale (range 0–1).
b Effect size Cohen's d = change over 3 years (unless otherwise indicated) from fixed part divided by s.d. based on residual and random intercept variance from GLMM.
c LMM piecewise regression with the knot (the point where there is a change in the slope) at 1.5 years.
d GLMM gamma regression with log link. Results are in transformed scale.
e GLMM negative binomial regression for counts with log link. Results are in transformed scale.
f Effect size r = √(dfn*F/(dfn*F + dfd)).
ST was significantly superior to TAU (treatment condition F = 3.90, p = 0.048, treatment*time F = 9.40, p < 0.001, r = 0.36). The significant treatment by time interaction was related to a particularly strong difference during the first 6 months in year 3 (p = 0.004). Cumulative proportions of attained supervised leave permission were higher in ST than in TAU (year 1, ST 56.9%, TAU 51.0%; year 2, ST 85.0%, TAU 76.5%; year 3, ST 96.7%, TAU 90.5%).
The significant treatment-by-time interaction, F(5472) = 3.45, p = 0.004, r = 0.19, was related to a significantly higher chance to get permission in ST than in TAU in the first half-year, t(472) = 3.84, p < 0.001. Cumulative proportions of attained unsupervised leave permission were higher in ST than in TAU (year 1, ST 15.2%, TAU, 6.9%; year 2, ST 42.7%, TAU 37.8%; year 3, ST 67.4%, TAU 59.4%).
SNAP-FV personality disorder scores (Keulen-de-Vos et al., Reference Keulen-de Vos, Bernstein, Clark, Arntz, Lucker and de Spa2011)
ST showed a significantly steeper decrease in SNAP-FV PD scores over time, compared to TAU, t(1387) = −2.85, p = 0.005. The within-group effect sizes were d = 0.56 (TAU) and d = 0.78 (ST); the difference between treatments was d = 0.22. Patients reported significantly lower SNAP-FV PD scores than informants, but the change over time was parallel over patients and informants, without interaction with treatment. Post-hoc tests showed that paranoid PD scores were significantly higher than the other three PD scales (main effect of PD scale). Moreover, borderline and paranoid PD scores showed a significantly steeper decrease over time than narcissistic and antisocial PD scores (PD scale by time interaction), with no interaction with treatment.
SNAP-FV temperament scales (Keulen-de-Vos et al., Reference Keulen-de Vos, Bernstein, Clark, Arntz, Lucker and de Spa2011)
The treatment-by-time interaction showed a steeper decrease in SNAP-FV temperament scores in ST than in TAU, t(1014) = 2.60, p = 0.01. The within effect sizes were d = 0.38 (TAU) and d = 0.63 (ST). Post-hoc tests showed that patients reported lower (i.e. less negative) temperament scores than informants, but the change over time was parallel over raters, without interaction with treatment. Reversed positive emotions scores were generally higher than for the other temperament scales, and showed a smaller change over time, whereas negative emotions showed the largest change.
HCR-20V2 total risk score (Webster et al., Reference Webster, Douglas, Eaves and Hart1997)
The piecewise LMM analysis showed reduced scores over time, with a trend towards faster HCR-20V2 reduction in the first 1.5 years in ST than in TAU, treatment-by-time t(446) = −1.91, p = 0.057; d = 0.40 (ST) v. 0.20 (TAU) at 1.5 years.
START vulnerabilities and strength scores (Braithwaite et al., Reference Braithwaite, Charette, Crocker and Reyes2010)
The analysis showed significant treatment-by-time and treatment-by-time-squared interactions, with ST showing steeper and earlier improvements, compared to TAU. However, at 3 years the TAU patients had ‘caught up’ and there was no longer a difference between treatments. The within treatment effect sizes at 1 year were d = 0.23 (TAU) and d = 0.41 (ST), differential d = 0.18; at 1.5 years, d = 0.37 (TAU) and d = 0.56 (ST), differential d = 0.19. At 3 years, the change compared to baseline was d = 0.85.
The effect of time on weighted incidents was highly significant with no interaction of time with treatment condition, indicating that both treatments showed a comparable reduction. At 3 years change compared to baseline was large, d = 0.97.
Early maladaptive schemas (Young & Brown, Reference Young and Brown1998)
The effect of time on the EMS total score was highly significant, and there was a significant treatment-by-time interaction, indicating a stronger reduction in ST than in TAU, d = 0.28. The within treatment effect sizes were d = 0.33 (TAU) and d = 0.61 (ST).
Schema modes (Lobbestael et al., Reference Lobbestael, van Vreeswijk, Spinhoven, Schouten and Arntz2010)
The effect of time on healthy and maladaptive schema mode scores was highly significant, as were the treatment-by-time, and treatment-by-time-squared, interactions. At 1 year, the change in healthy modes in ST was d = 0.40; in TAU, d = 0.06; at 1.5 years, in ST, d = 0.50, in TAU, d = 0.15. At 1 year, the change in maladaptive modes in ST was d = 0.48; in TAU, d = 0.14; at 1.5 years, in ST, d = 0.64; in TAU, d = 0.28. Differences disappeared at 3 years.
SCL-90 total score (Derogatis & Unger, Reference Derogatis and Unger2010)
The effect of time on SCL-90 total scores was highly significant, but there was no interaction of time-by-treatment. At 3 years change from baseline was (ST d = 0.44; TAU d = 0.30; differential d = 0.14).
Retention was high in both conditions and dropouts were rare (Fig. 1) with a significant interaction of time-by-treatment condition (online Supplementary Fig. 2). Follow-up analyses showed a significant difference in year 1 with higher retention in ST (93%) than in TAU (80%), t(264) = 2.066, p = 0.004. At 3 years, 75% of ST patients and 68% of TAU patients were retained.
Our RCT is the first to demonstrate the effectiveness of long-term, intensive psychotherapy for rehabilitating violent offenders with PDs. Both ST and TAU were effective, producing moderate to large improvements in outcomes from baseline to 36 months, and contradicting the usually pessimistic views of the treatability of these patients. The vast majority in both treatment conditions eventually attained supervised leave (ST = 96.7%, TAU = 90.5%) and about two-thirds of unsupervised leave (ST = 67.4%, TAU = 59.4%). These results are particularly impressive given this high-risk sample. These findings are considerably better than in most studies of mandated treatments, which often failed to show effectiveness compared to no-treatment or wait-list control groups (Parhar et al., Reference Parhar, Wormith, Derkzen and Beauregard2008). Consistent with the ‘risk-need-responsivity’ principles of forensic treatment (Andrews et al., Reference Andrews, Zinger, Hoge, Bonta, Gendreau and Cullen1990; Andrews, Bonta, & Wormith, Reference Andrews, Bonta and Wormith2006), our findings suggest that PD offenders need high intensity, long-term treatment that matches their severity.
ST showed significantly faster improvements than TAU on both of the primary outcomes – rehabilitation (i.e. supervised/unsupervised leave) and PD symptoms – and six of nine secondary outcomes. Differential (between-condition) effect sizes were small to medium. ST also retained significantly more patients than TAU, although retention in both conditions was high. ST moved patients through rehabilitation more rapidly than TAU, with medium effects for supervised leave (r = 0.36), and small to medium for unsupervised leave (r = 0.19), with significant differences in unsupervised leave only in the first 6 months. Thus, ST patients had more opportunity to practice coping with risky situations, and building strengths, in the community. ST was quicker than TAU to lower vulnerabilities and promote strengths on the START, which may have facilitated rehabilitation. Studies are needed to examine if ST also leads to greater cost-effectiveness, relative to TAU.
ST patients showed large improvements from baseline to 36 months in PD symptoms (d = 0.78), while TAU patients showed moderate ones (d = 0.56). A previous study of standard CBT for PD patients in TBS showed less personality improvement relative to baseline (d = 0.23–0.44) over a similar duration (Timmerman & Emmelkamp, Reference Timmerman and Emmelkamp2005). Both ST and TAU groups showed reductions in all SNAP-FV PD and temperament scales, with the strongest reductions for borderline and paranoid PD, and negative temperament. ST had modest advantages over TAU (d = 0.22) in improving traits such as self-control and self-regulation, which are protective factors against recidivism (DeLisi, Hochstetler, Higgins, Beaver, & Graeve, Reference DeLisi, Hochstetler, Higgins, Beaver and Graeve2008; Malouf et al., Reference Malouf, Schaefer, Witt, Moore, Stuewig and Tangney2014). Both ST and TAU were highly effective in reducing incidents, with no significant differences between them.
ST's effectiveness in our study was lower than in previous RCTs of ST in (mostly female) outpatients with borderline (Giesen-Bloo et al., Reference Giesen-Bloo, Van Dyck, Spinhoven, Van Tilburg, Dirksen, Van Asselt and Arntz2006; Nadort et al., Reference Nadort, Arntz, Smit, Giesen-Bloo, Eikelenboom, Spinhoven and van Dyck2009) or Cluster C and other PDs (Bamelis et al., Reference Bamelis, Evers, Spinhoven and Arntz2014). This could be due to the institutional, mandated treatments they received (Parhar et al., Reference Parhar, Wormith, Derkzen and Beauregard2008), and the severity of emotion regulation and externalizing behavior problems in male, antisocial patients (Berke, Reidy, & Zeichner, Reference Berke, Reidy and Zeichner2018). Moreover, the greatest effects of ST were in the first 2 years, perhaps due to ST's ability to facilitate a therapeutic bond (Giesen-Bloo et al., Reference Giesen-Bloo, Van Dyck, Spinhoven, Van Tilburg, Dirksen, Van Asselt and Arntz2006). ST patients showed rapid, curvilinear improvements in START vulnerabilities and strengths scores and schema modes in the first two years, after which their scores plateaued, while the TAU group ‘caught up’. A possible explanation is that, because the ST patients attained leave more quickly, they experienced more setbacks, as they were exposed to more risks (e.g. drugs and alcohol, antisocial peers, family and work stress), outside the hospital. Furthermore, ST patients initially received twice per week individual sessions, which may have boosted ST's early effectiveness.
We went to considerable lengths to ensure high methodological quality. However, there were some limitations. First, as is the case in nearly all studies of long-term psychotherapy, we cannot rule out halo effects because of the lack of blindness to treatment allocation. Although we used objective measures and multiple raters whenever possible, the lack of blindness could have biased the findings in favor of our hypotheses. For example, the Leave Advisory Boards were not blind to patients' treatment status and occasionally included study personnel. On the other hand, the results were consistent across many different outcome variables, measured in different ways, and the study was implemented across many centers, patients, and therapists.
Next, we chose not to equate the intensity of the individual therapies in this study, because our goal was to compare ST to usual TBS treatment, where individual therapy is typically given once per week. However, while ST patients received more individual therapy hours than TAU patients, there were no significant differences in the total therapy hours received, combining individual and auxiliary therapies, at any time point, or overall. Nevertheless, we cannot rule out that differences in effectiveness were due to the intensity of the individual therapy. The first step in validating ST was to compare it to high quality, usual TBS practice; the next step is to compare it to another specific, evidence-based treatment for PDs (e.g. Berzins & Trestman, Reference Berzins and Trestman2004; Ware et al., Reference Ware, Wilson, Tapp and Moore2016) while equating for individual therapy intensity. We will also augment individual ST with milieu-based ST approaches (van Wijk-Herbrink, Arntz, Broers, Roelofs, & Bernstein, Reference van Wijk-Herbrink, Arntz, Broers, Roelofs and Bernstein2019), to see if it enhances ST's effectiveness. Further, we did not compare the two conditions on recidivism, because arrests were uncommon during the study. We will examine recidivism at post-treatment follow-up. Finally, for ethical reasons, we did not use a no-treatment control group; thus, some of the improvements in both conditions could have been unrelated to treatment.
Finally, our findings can be most conservatively generalized to forensic hospital settings. We do not know if treatment retention and other outcomes would have differed in outpatient settings or in patients selected for specific diagnoses or offenses. Implementation studies are needed in other settings (e.g. outpatient clinics and prisons); populations (e.g. female offenders, youth offenders); and levels of coercion (e.g. voluntary treatments). Such studies would help in developing indications for treatment and adapting ST to patients with specific characteristics.
Our findings contradict typical notions regarding the treatability of offenders with PD and aggression. Intensive psychotherapy promoted rehabilitation and reduced PD symptoms in these hospitalized patients, with both ST and TAU showing evidence of effectiveness. ST produced more rapid improvements than TAU with modest but consistent advantages across most outcomes. These findings need replication in studies that equate for individual treatment intensity. Recidivism, the ultimate criterion in forensic populations, must also be investigated, before drawing any definite conclusions.
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291721001161.
The authors wish to express their appreciation to the directors, staff, therapists, and patients of the participating forensic hospitals – Forensic Psychiatric Centers de Rooyse Wissel (Venray and Maastricht locations), van der Hoeven, Oostvaarders, Mesdag, Veldzicht, Kijvelanden, and Forensic Psychiatric Clinic Assen – whose dedication and financial and other support made this project possible. We thank past project coordinators and database managers funded by the EFP – Annette Löbbes, Lieke H.C. Bouts, Lotte C. de Geus, and Jacomina Gerbrandij – and former research assistants Eva de Spa, Ellen de Jonge, Merel van Vliet, Antoine Sint Fiet, Thijs Kanters, and Marloes Hartkoorn.
We gratefully acknowledge the financial and other support of the Expertise Center for Forensic Psychiatry (EFP), and its past and current directors and staff; the program ‘Quality in Forensic Care’ (‘Kwaliteit Forensische Zorg’; KFZ); and grant support (D. Bernstein, Principal Investigator) from the Netherlands Ministry of Security and Justice, the ‘Scientific Research and Documentation Center’ (‘Wetenschappelijk Onderzoek en Documentatiecentrum’; WODC), and Maastricht University's Faculty of Psychology and Neuroscience.
Financial and other conflict of interest disclosure
The authors D. Bernstein, G. Kersten, and A. Arntz give trainings and supervision in Schema Therapy, for which they often receive financial compensation. For A. Arntz, the compensation goes to the university to support research. They have no other financial or other conflicts of interest. None of the other authors have financial or other conflicts of interest. In conducting this study, we have adhered to all pertinent ethical standards for maintaining anonymity and confidentiality, and respecting the rights of our patients.