Development of competence in cognitive behavioural therapy and the role of metacognition among clinical psychology and psychotherapy students

Abstract Background: There is a paucity of research on therapist competence development following extensive training in cognitive behavioural therapy (CBT). In addition, metacognitive ability (the knowledge and regulation of one’s cognitive processes) has been associated with learning in various domains but its role in learning CBT is unknown. Aims: To investigate to what extent psychology and psychotherapy students acquired competence in CBT following extensive training, and the role of metacognition. Method: CBT competence and metacognitive activity were assessed in 73 psychology and psychotherapy students before and after 1.5 years of CBT training, using role-plays with a standardised patient. Results: Using linear mixed modelling, we found large improvements of CBT competence from pre- to post-assessment. At post-assessment, 72% performed above the competence threshold (36 points on the Cognitive Therapy Scale-Revised). Higher competence was correlated with lower accuracy in self-assessment, a measure of metacognitive ability. The more competent therapists tended to under-estimate their performance, while less competent therapists made more accurate self-assessments. Metacognitive activity did not predict CBT competence development. Participant characteristics (e.g. age, clinical experience) did not moderate competence development. Conclusions: Competence improved over time and most students performed over the threshold post-assessment. The more competent therapists tended to under-rate their competence. In contrast to what has been found in other learning domains, metacognitive ability was not associated with competence development in our study. Hence, metacognition and competence may be unrelated in CBT or perhaps other methods are required to measure metacognition.


Introduction
While cognitive behavioural therapy (CBT) has substantial research support, training of CBT therapists has received limited attention in research (Becker and Stirman, 2011;Fairburn and Cooper, 2011;Rakovshik and McManus, 2010;Shafran et al., 2009), especially regarding the effects of extensive training programmes (i.e. programmes in higher education, spanning at evaluation (Schraw, 1998;Schraw and Moshman, 1995). CBT therapists continuously plan, monitor and evaluate cognitive processes as well as their own performance, for example in self-reflection (Bennett-Levy, 2006). Unfortunately, self-assessments have been prone to biases. Kruger and Dunning (1999) suggested that incompetence is not only related to poor skills, but an inability to self-assess those skills, resulting in overconfidence among poor performers. The Dunning-Kruger effect has been observed in various areas (e.g. Ehrlinger et al., 2008). Previous findings are inconclusive regarding therapists' ability to self-evaluate accurately. Brosan et al. (2008) found that cognitive therapists, especially those less competent, over-rated their competence compared with observers, in line with the Dunning-Kruger effect. In contrast, McManus et al. (2012) found that more competent therapists under-rated their competence compared with supervisors, a tendency referred to as undue modesty (Dunning et al., 2003).
Metacognition is associated with learning and performance in various domains (e.g. mathematics, science and reading; Ohtani and Hisasaka, 2018;Perry et al., 2019). Training in psychological treatments may be improved by targeting therapists' MC skills, also suggested by Fauth et al. (2007). Yet, to our knowledge, no studies have explored the role of metacognition in learning psychological treatments, or for that matter, treatment of any kind.
The purpose of the present study was to investigate to what extent clinical psychology and psychotherapy students acquire competence in CBT following an extensive training programme, and what role metacognition plays in the development of CBT skills.

Design and setting
Using a longitudinal observational design, participants were assessed for CBT competence and MC activity before receiving CBT training and after their third semester of CBT training (M = 1.39 years between assessments). The CBT training corresponds to a total of 38.5-52.5 European credits, provided within master-level programmes in psychology or psychotherapy. The psychology programme in Sweden spans five years full-time (300 credits) where CBT training is introduced after 3.5 years. The psychotherapy programme spans three years halftime (90 credits), where all students have a previous master-level degree in a health care profession, at least two years of clinical experience, and are required to simultaneously work clinically (i.e. providing psychotherapy), at least half-time. The CBT training of both programmes consists of lectures, tests, a thesis, workshops, and clinical practice under close supervision. Two universities took part in the study, Karolinska Institutet (KI) and Stockholm University (SU) in Stockholm, Sweden.
Students received verbal and written information and written informed consent was obtained. Participation was voluntary and could be retracted at any time without the need to state a reason. All data were pseudonymised and stored securely, to which only the first and last authors had access. The study was conducted in accordance with the World Medical Association Declaration of Helsinki.

Participants
Participants were students at the clinical psychology (n = 50) and psychotherapy (n = 23) programmes at KI and SU, starting their CBT training from August 2016 to September 2017. Among 208 eligible students, 74 enrolled and provided written informed consent. No exclusion criteria were applied. One participant dropped out due to a family crisis, before providing any data and was therefore not included in the analysis. Among the 73 participants who completed pre-assessment, 64 completed post-assessment. See Fig. 1 for participant flow through the study, including reasons for attrition. The mean age of participants was 33 years preassessment, 68% were female (33 of 50 psychology students and 17 of 23 psychotherapy students) and 76% studied at KI (35 psychology students and 20 psychotherapy students). See Table 1 for participant characteristics.

Assessment and procedure
At pre-and post-assessment, participants engaged in a CBT role-play session, immediately followed by an MC think-aloud session. The sessions included a standardised patient, that is, an actor playing a patient with a specific set of presenting problems, in this case, indicative of a principal diagnosis of social anxiety disorder with co-morbid depression. The patient was a 32-year-old female called 'Anna', with mild social anxiety (e.g. being shy and self-critical), which had escalated during her parental leave from a high-achieving position. She was now on long-term sick leave, socially isolated, and in a strained marriage. Actors were psychology students in their first or second year, who had not met the participants previously. All assessments were video recorded. However, two MC recordings were lost, and another two were partially lost, due to technical malfunction of recording during the pre-assessment. 2) 0-14 CBT sessions weekly pre 0 0 11.0 (7.7) 0-25 3.5 (6.7) 0-25 CBT sessions weekly post 1.5 (2.9) 0-20 11.7 (4.0) 4-20 4.4 (5.6) 0-20

Role-play of a cognitive behavioural therapy session
Before the role-play, participants were provided with written background information on the patient and the treatment up to this point, including the home assignment for the session. The participant then conducted a complete 45-minute session of CBT, designed to constitute the ninth treatment session. Standardised patients had previously received three hours of training and the first author monitored adherence during their first role-plays. The actors followed a detailed written instruction, including scripted responses to various potential therapist behaviours. To ensure stability of actors' behaviours, they were to re-read the manuscript before each role-play and regularly monitored throughout the study.
Competence in cognitive behavioural therapy CBT competence was assessed using the Cognitive Therapy Scale-Revised (CTS-R; James et al., 2001), which is a standard tool for the assessment of CBT competence (Muse and McManus, 2013). It measures skills general to psychotherapy (e.g. feedback, collaboration and pacing) and specific to CBT (e.g. eliciting key cognitions, guided discovery, and homework setting). The CTS-R contains 12 items rated on a scale from 0 to 6 points which correspond to six levels of competence (incompetent, novice, advanced beginner, competent, knowledgeable, and expert). The total score ranges from 0 to 72 points. Commonly, a score of 36 points is used as a cut-off for competence , giving an average score of 3 points per item. CTS-R has demonstrated excellent internal consistency (Cronbach's α = .92-.97) and adequate inter-rater reliability (intra-class correlation coefficient [ICC] = .86 for pairs of raters; Blackburn et al., 2001).
As part of the present study and prior to the pre-assessment, two CBT experts participated in a two-day workshop on the use of the CTS-R, followed by repeated calibration sessions during the two-year rating period to minimise rater drift. Regular inter-rater reliability checks, six times during the two-year rating period, showed an adequate to excellent inter-rater reliability throughout the study (ICC = .64-.95). One rater was a psychologist, the other a nurse, and both were licensed psychotherapists and CBT supervisors, with extensive experience in CBT training and clinical practice. Raters were independent of the study, and blinded to participants, training programmes, and assessment points.
Metacognitive self-assessment of performance MC ability to monitor and self-evaluate one's performance was assessed using a participant survey administered after the role-play. Participants rated their CBT competence in the role-play (i.e. their perceived performance as CBT therapists) on a visual analogue scale. These selfassessment scores were made to range from 0 to 72 points, allowing for comparison with observer-ratings of CBT competence based on CTS-R total scores.

Metacognitive task
Metacognition was assessed using a think-aloud methodology, by which the participant is asked to say whatever comes to mind, that is, think aloud, while performing a task. The expressed chain of thoughts is then transcribed and coded as a source of information on MC activity. The standard procedure was employed (Ericsson and Simon, 1993). An independent research assistant gave the participant instructions on how to think aloud, followed by a brief exercise to make sure the participant understood. The MC task included an enactment of six common clinical situations, designed to subject the participant to a clinical challenge and presumably mobilise MC activity (e.g. implied suicide-risk, therapy doubt, questioning of in-session exercises). The procedure was as follows: a research assistant handed over a written instruction, such as 'Find out how Anna is doing'. The participant acted accordingly, for example, by asking 'How are you today, Anna?' to which the patient ('Anna') gave a standardised reply, covering various information; 'Not well. The kids have been ill. I just want to disappear. What if therapy doesn't work?'. The patient then turned silent, allowing the research assistant, if necessary, to prompt the participant to 'Please think aloud'. The participant would then say whatever came to mind, reflecting on what the patient had just said. The verbal reports of the MC task were recorded, transcribed verbatim, and coded according to an MC taxonomy.

Metacognitive taxonomy
Because we could not identify any measures of metacognition within a psychotherapeutic or other clinical setting, we created a taxonomy for MC regulation within CBT, inspired by a validated taxonomy for coding of MC in various non-clinical learning domains (e.g. science, history) by Meijer et al. (2006). As suggested by Flavell (1979) and Meijer et al. (2006), we included three types of MC regulation, that is, planning, monitoring and evaluating, each including several categories. A pilot version of the role-play was tested with three CBT therapists, resulting in minor adjustments to ensure both role-play and taxonomy usability. The final version of the MC taxonomy included six categories to be coded, with no hierarchy of importance or advancement between them. See Fig. 2 for an overview of the taxonomy.
In Fig. 2, Organisation refers to the handling of available information, e.g. to summarise or interpret information, such as 'Anna might have suicidal thoughts'. Planning entails reflecting on possible future activities and choosing between strategies, e.g. 'First, a suicide risk assessment'. Information monitoring means checking whether one has information needed in the situation, e.g. 'I'm confused, I don't know if : : : '. Structural monitoring means checking the therapeutic framing regarding time or content, e.g. 'We're getting off track'. Evaluation of strategy refers to assessing the results of one's activities, such as 'She did not like my suggestion, I should have : : : '. Evaluation of difficulty entails the assessment of difficulty, such as 'This is a tough case'.
A coding manual was created, with CBT relevant examples of each category. Consistent with Meijer et al. (2006), all verbal reports were coded as manifestations of underlying MC regulatory activity. The reports were divided into units, which were coded. A unit was defined to start when a thought was introduced and end when this thought was fully expressed, or when a new thought was introduced. Coding was conducted by the first author who was blinded to participants, training programmes and assessment points. Results indicated a strong (McHugh, 2012) intra-rater reliability (Cohen's kappa = .84, p<.001) when 20% (n = 27) of the reports were re-coded three weeks later.

Statistical analysis
The SPSS (version 26, SPSS Inc., Chicago, IL, USA) was used for the analyses. Frequencies of MC regulation categories were calculated. Linear mixed models (LMM) were used to estimate the effect of time on CBT competence and investigate whether effects were moderated by group (psychology or psychotherapy students) and metacognition (frequency of MC categories). Repeated measures of the outcome (i.e. CTS-R total scores at pre-and post-assessments) were nested within individuals. We chose LMM as it is recommended for nested data with repeated measures and for handling missing data appropriately (e.g. Gueorguieva and Krystal, 2004). The maximum likelihood method was used to estimate model parameters. We started with a basic model including a fixed intercept. Then we successively added random parameters (intercept and slope), and finally a time by condition interaction term to the model. Each model's fit to the observed data was evaluated using the likelihood ratio test, with significance set at .05. A model with a significantly better fit than a previous model was retained. The standardised effect size for between-and within-group effects at post-assessment was calculated as Cohen's d for LMM based on the formula recommended by Feingold (2015; Equation 1), using the pre-assessment pooled standard deviation for the entire sample, and for each subsample, respectively. For model-based d, 95% confidence intervals (CI) were calculated using the formulas provided in Feingold (2015; Equations 7 and 8). Accuracy of self-assessed CBT competence as a measure of MC ability was calculated as the difference between self-assessed competence relative to observer-assessed competence using the CTS-R. A Pearson correlation test was used to examine if CBT competence (i.e. CTS-R total scores) was associated with self-assessment accuracy (i.e. self-assessed CBT competence relative to CTS-R total scores). In addition, to compare self-assessment accuracy in groups with different levels of CBT competence (i.e. bottom, second, third and top quartiles of CTS-R total scores), we used the non-parametric Kruskal Wallis test, because the assumption of normality was not met for the quartiles.

Development of competence in cognitive behavioural therapy
Competence in CBT at pre-and post-assessment is presented in Table 2. Overall, CBT competence (i.e. CTS-R total score ≥36 points) was achieved by 12 participants (18.8%) at pre-assessment and by 46 participants (71.9%) at post-assessment. None of the initially competent participants deteriorated below the cut-off, which means that 34 participants (53.1%) improved above the competence threshold. Among the psychology students, CBT competence was achieved by two participants (4.3%) at pre-assessment and 31 participants (67.4%) at post-assessment. Among psychotherapy students, CBT competence was achieved by 10 participants (55.6%) at pre-assessment and 15 participants (83.3%) at post-assessment.
A model including a random intercept, fixed effect of time, fixed effect of group (psychology or psychotherapy students), and a time by group interaction term provided the best fit. There was a statistically significant main effect of time, F 1,63.98 = 95.64, p<.001, d = 1.94, 95% CI [1.64, 2.25], indicating large improvements in CBT competence for the whole sample from pre-to postassessment. There was also a significant main effect of group, F 1,122.38 = 60.63, p<.001, d = 1.76, 95% CI [1.31, 2.21], indicating large differences in competence between psychology and psychotherapy students with the latter group of students being more competent at both pre-and post-assessment. There was a significant time by group interaction, F 1,63.98 = 16.27, p<.001, d = -1.13, 95% CI [-1.70, -0.57], suggesting a larger CBT competence improvement in psychology students, who initially had lower competence scores and therefore more room for change. Based on models separated by groups, the main effect of time on CBT competence for psychology students was F

Accuracy in self-assessment of CBT competence
We assessed the relationship between observer-assessed CBT competence (i.e. CTS-R total scores) and accuracy of self-assessed CBT competence (i.e. a difference score). A Pearson correlation test showed a negative correlation between the variables at both pre-assessment, r 71 = -.47, p<.001, and post-assessment, r 62 = -0.50, p<.001. This correlation was also significant among psychology and psychotherapy students respectively, r between -.60 and -.49, p<.05 across assessment points. Thus, higher CBT competence was correlated to lower accuracy in self-assessment.
To compare quartiles of CBT competence, a Kruskal Wallis test was conducted. There was a significant difference in self-assessment accuracy between quartiles at pre-assessment, H 3 = 15.80, p = 0.001, and post-assessment, H 3 = 12.86, p = 0.005. Post hoc pairwise comparisons showed significant differences between Q1-Q4, Q2-Q4 and Q1-Q3 at both time points, and at preassessment Q2-Q3 (p = .001-.047). On average, the 25% most competent students underestimated their competence with 11.22 points (SD = 10.10) at pre-assessment and 13.43 points (SD = 9.67) at post-assessment; see Fig. 3 for observer-and self-assessed CBT competence. Meanwhile the 25% least competent were rather accurate in their self-assessments and only slightly overrated their competence at pre-assessment, M = 2.41, SD = 11.70, and at postassessment, M = 0.81, SD = 13.11.
The role of metacognition in CBT competence development MC activity, both total and its categories, in participants at pre-and post-assessment is presented in Table 2. To investigate the role of MC in CBT competence development, we added MC total score to the previous model (i.e. an LMM with a random intercept, fixed effects of time, group, MC total score, and interaction terms of time by group, time by MC total score, and time by group by MC total score). Model fit significantly improved, but neither main nor interaction effects for MC total score were statistically significant. For exploratory purposes, we conducted the same analysis separately for each of the six MC categories (i.e. replaced MC total score with an MC category score). Again, each MC variable significantly improved model fit, but neither main nor interaction effects were significant, except for a main effect of evaluation of difficulty, F 1,133.15 = 6.57, p = .011, and an interaction effect of evaluation of difficulty by group, F 1,133.15 = 6.17, p = .014. However, MC evaluation of difficulty did not predict competence development over time. Thus, MC activity did not predict CBT competence development in the participants.

Discussion
The purpose of the present study was to investigate to what extent psychology and psychotherapy students acquire competence in CBT following an extensive training programme and the role of metacognition for competence development.
Most students had achieved CBT competence post-assessment. CBT competence was achieved by 83.3% of the psychotherapy students at post-assessment, which is on par with previous findings where 80-82 % of practising therapists were found competent after extensive CBT training (Liness et al., 2019). The finding that three psychotherapy students were below the threshold can be considered a cause of concern; however, it should be noted that they only needed an additional 0.5 to 2 points to reach the competence level and still have 1.5 years of CBT training as part of their programme.
CBT competence was achieved by 67.4% of the psychology students. Students without clinical experience are targeted in most extensive training programmes, yet we have found no studies evaluating their competence development. No psychology students had any previous experience of CBT or clinical practice, most treated only 1-3 patients during training, and the programme is followed by a year of supervised clinical practice. Thus, there is room for  improvement, but the results are encouraging, and it is likely that more students will reach the competence threshold later. Overall, we found CBT competence had improved for psychology and psychotherapy students at post-assessment. We also found a significant time by group interaction, suggesting larger improvement for psychology students, who had lower initial competence ratings and thus, more room for change. As can be expected, psychotherapy students were older, with more clinical experience and weekly CBT practice. However, therapist variables such as age, previous clinical experience, or weekly CBT practice, did not moderate competence development, unlike results in some previous studies (e.g. McManus et al., 2010).
Explorative analyses showed that more competent therapists were less accurate in their selfassessment of CBT competence. There were significant differences in self-assessment accuracy between groups of high and low CBT competence at both pre-and post-assessment, where the top 25% under-rated their competence, while the bottom 25% were accurate or slightly over-rated their competence. Our results provide support for the Dunning-Kruger effect regarding undue modesty among top performers, which has been observed in other areas (Dunning et al., 2003), but not regarding the less competent. So far, the findings concerning CBT have been inconclusive; for example, Brosan et al. (2008) found that especially less competent therapists over-rated their competence, whereas McManus et al. (2012) found more competent trainees to under-estimate their competence.
While MC ability has been related to learning in other fields (e.g. Ohtani and Hisasaka, 2018;Perry et al., 2019), it was not related to competence development in our study. If metacognition plays a role in CBT competence development, we were not able to detect it. Psychotherapy may differ from other fields of learning, in involving social interaction and being less straightforward, without one clinically 'correct' response for each situation. Moreover, students of CBT are already trained in (self-)reflection and higher order thinking, which is why they may already possess a high degree of MC ability, in contrast to the often younger, more novice participants in other studies of metacognition and learning. Thus, students of CBT may differ in the quantity, as well as the quality of MC activity. Another possible explanation is that we developed an MC taxonomy for CBT, based on previous research in other areas, which has yet to be validated.

Limitations of the study
First, although it is likely that increased competence was the result of training, as the present study did not have an experimental design (e.g. a randomised controlled design), no inferences about causality can be drawn. Thus, improvements from pre-to post-assessment can be due to other factors than training, such as the passage of time or raters' expectations of improvement. However, expectations were controlled for as raters were blinded to participants, programme and assessment points. While an experimental design could demonstrate causality, it was not feasible in the context of the present study.
Second, one may question the validity and reliability of an artificial simulation to assess CBT competence (Muse and McManus, 2013) and competence has been suggested to vary over sessions (Webb et al., 2010). Therefore, we made efforts to ensure that the role-play resembled a session with an actual patient and that actor's performance was stable across participants and time points, thorough training of standardised patients, pilot testing, and providing background information. Standardised patients can be designed to promote a range of skills in a single session, be replicated pre-and post-assessment, and have other practical advantages over real patient sessions (Muse and McManus, 2013). Moreover, role-plays with standardised patients seem to be a feasible, valid and reliable method to assess clinical competence (Edwards et al., 2016;Goodie et al., 2021;Hodges et al., 2014). One concern has been the perceived authenticity of the standardised patients (Edwards et al., 2016;Hodges et al., 2014), which we did not monitor and may impact external validity. Yet, in another study therapists reported their role-play performance to resemble their clinical performance (Cooper et al., 2017b) and standardised role-plays have been used to measure competence development following CBT training successfully (Cooper et al., 2017b;Harned et al., 2013;Kobak et al., 2017;Puspitasari et al., 2017).
Third, CTS-R is considered the gold standard in assessing CBT competence, but expert ratings are resource-consuming and inter-rater reliability remains an issue (Muse and McManus, 2013), which in the present study was adequate to excellent (ICC = .64-.95). The competence threshold has not been validated for the CTS-R, only for the original version of the scale (Muse and McManus, 2013), hence any conclusions on participants passing the threshold should be interpreted with caution.
Fourth, we used think-aloud methodology which relies on the participant's ability to verbalise their MC activities. This seems to have a minimal impact on the participant's MC activity (Ericsson and Simon, 1993); however, a recent review reported that some studies observed that some prompts may have a positive impact (Double and Birney, 2019), which we tried to avoid by keeping prompts general (i.e. 'please think aloud') and to a minimum. Instead, retrospective self-reports could be used, but are less valid and poorly associated to observations of performance on MC tasks (Craig et al., 2020), why observational methods, such as a thinkaloud protocol, are recommended .
Finally, we wanted to investigate metacognition as originally defined by Flavell (1979), and were inspired by a taxonomy validated in other fields (Meijer et al., 2006). Perhaps certain MC abilities are more relevant to CBT competence, than this broader conceptual framework. Furthermore, our participants frequently engaged in organisation and planning, but less in monitoring and evaluating categories (see Table 2). While this may reflect a true tendency, it is plausible the MC task failed to mobilise certain MC skills, or the participants found these particularly difficult to verbalise. Perhaps a more specified taxonomy is needed, targeting more clinically relevant MC activities.

Conclusions
Properly trained and qualified health care professionals are essential in the dissemination of evidence-based psychological treatments. Our study found that CBT competence improved and that most students had achieved competence after three terms of CBT training. Thus, competence improved among clinicians as well as novice students, without prior clinical experience and with limited CBT training practice. However, some students had not achieved CBT competence at post-assessment, which demonstrates the need for their upcoming supervised clinical practice and subsequent assessments. Routine assessments with standardised instruments may be integrated into educational programmes, and results used to improve training, e.g. by targeting specific skill areas.
Higher CBT competence was correlated to lower accuracy in self-assessment, where the more competent therapists under-estimated their competence. These results, along with large withingroup variation, indicate that self-assessments of CBT competence are not reliable and prone to bias, as previously suggested.
To our knowledge, this is the first study to investigate MC in the context of learning CBT, and, indeed, treatment of any kind. We did not find that MC activity predicted CBT competence, and hence it may not need additional attention in the learning of CBT. However, we think it is plausible that we were just unable to detect it and suggest that our taxonomy may need further revision, to sufficiently differentiate the higher quality of MC ability among psychology and psychotherapy students.
Data availability statement. The data that support the findings of this study are available on request from the last author, B.B. The data are not publicly available due to ethical/privacy restrictions.