Hostname: page-component-6766d58669-88psn Total loading time: 0 Render date: 2026-05-19T10:14:01.999Z Has data issue: false hasContentIssue false

A core component of psychological therapy causes adaptive changes in computational learning mechanisms

Published online by Cambridge University Press:  08 June 2023

Quentin Dercon*
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK UCL Institute of Mental Health, University College London, London, UK
Sara Z. Mehrhof
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
Timothy R. Sandhu
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK Department of Psychology, University of Cambridge, Cambridge, UK
Caitlin Hitchcock
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Australia
Rebecca P. Lawson
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK Department of Psychology, University of Cambridge, Cambridge, UK
Diego A. Pizzagalli
Affiliation:
Department of Psychiatry, Harvard Medical School, Boston, MA, USA Center for Depression, Anxiety, and Stress Research, McLean Hospital, Belmont, MA, USA
Tim Dalgleish
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK Cambridgeshire and Peterborough NHS Foundation Trust, Cambridgeshire, UK
Camilla L. Nord
Affiliation:
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
*
Corresponding author: Quentin Dercon; Email: quentin.dercon.22@ucl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Background

Cognitive distancing is an emotion regulation strategy commonly used in psychological treatment of various mental health disorders, but its therapeutic mechanisms are unknown.

Methods

935 participants completed an online reinforcement learning task involving choices between pairs of symbols with differing reward contingencies. Half (49.1%) of the sample was randomised to a cognitive self-distancing intervention and were trained to regulate or ‘take a step back’ from their emotional response to feedback throughout. Established computational (Q-learning) models were then fit to individuals' choices to derive reinforcement learning parameters capturing clarity of choice values (inverse temperature) and their sensitivity to positive and negative feedback (learning rates).

Results

Cognitive distancing improved task performance, including when participants were later tested on novel combinations of symbols without feedback. Group differences in computational model-derived parameters revealed that cognitive distancing resulted in clearer representations of option values (estimated 0.17 higher inverse temperatures). Simultaneously, distancing caused increased sensitivity to negative feedback (estimated 19% higher loss learning rates). Exploratory analyses suggested this resulted from an evolving shift in strategy by distanced participants: initially, choices were more determined by expected value differences between symbols, but as the task progressed, they became more sensitive to negative feedback, with evidence for a difference strongest by the end of training.

Conclusions

Adaptive effects on the computations that underlie learning from reward and loss may explain the therapeutic benefits of cognitive distancing. Over time and with practice, cognitive distancing may improve symptoms of mental health disorders by promoting more effective engagement with negative information.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press
Figure 0

Figure 1. Reinforcement learning task design, transdiagnostic factor derivation, and comparison of factor scores to previous samples. a. The probabilistic selection task (PST) in the present study consisted of six blocks of sixty trials (training phase) where participants were instructed to choose between Hiragana characters presented as three pairs (AB, CD, EF; twenty of each per block), and given feedback (‘Correct!’ or ‘Incorrect.’). One character in each of the pairs was consistently more likely to be correct (reward probabilities of 0.8/0.2, 0.7/0.3, and 0.6/0.4 for A/B, C/D, and E/F respectively. 497 participants (49.9%) were randomised to a self-distancing intervention, and additionally received a prompt to ‘Distance yourself…’ with the fixation cross at the start of each training trial. The training phase was followed by a sixty-trial test phase without feedback where the twelve other possible character combinations were added (i.e. four of each of the fifteen pairs). All participants saw the same six characters, but the pairs themselves were randomised for each participant, and the order of the pairs was counterbalanced across trials. One of three affect questions was asked after each trial, with unlimited time to answer. b. Multi-target lasso regression with five-fold cross-validation was used to predict the three transdiagnostic dimension factor scores from different subsets of the 209 questions in the original dataset (Gillan et al., 2016, study 2). 78 questions were found to predict the three factor scores with high predictive accuracy (colours correspond to heatmap labels). c. Heatmap of five-fold cross-validated multi-target lasso regression coefficient weights for each of the included 78 questions across eight psychiatric questionnaires used to predict the three transdiagnostic symptom dimensions (i.e. weights for all other questions fixed at zero; see online Supplementary Methods for full details). d. Comparison between the factor score distributions for the n = 935 non-excluded participants in the present study (darker colours), and those previously obtained by Gillan et al. (2016) in n = 1413 participants (lighter colours). Inset plots show the predictive validity of the subset of 78 questions in predicting these scores in the original dataset.

Figure 1

Figure 2. Model comparison, parameter distributions, and raw training and test phase performance. a. Difference in numerical fit metrics between the dual and single learning rate models fit to training data alone, or training-plus-test, by distancing group. ELPD is the expected log posterior density (presented here as the difference between the dual and single learning rate models, where positive differences indicate a better model), and LOOIC is the leave-one-out information criterion (lower indicates a better model). b–c. Distributions of individual-level posterior means for learning parameters from the fits to training (b) and test (c) data. d. Raw training performance (cumulative probability of choosing the higher-probability symbol A/C/E), by group and stimulus pair, lagged by twenty trials (i.e. block-lagged, as each pair is presented twenty times per block. e. Raw test phase performance (% correct, where ‘correct’ is choosing the option in each pair which was most often correct during training) by test type, plus test phase performance on individual training pairs and novel pairs including symbols C or E.

Figure 2

Table 1. Demographic characteristics of the sample, by group.

Figure 3

Figure 3. Associations between reinforcement learning parameters and transdiagnostic psychiatric symptom dimensions. a–d. Coefficient posterior distributions from Bayesian GLMs (adjusted for age, gender, digit span and distancing status) reflecting the estimated percentage change in the learning rate parameter or the estimated mean change in the inverse temperature for a unit increase in anxiety/depression (a), social withdrawal (b), and compulsive behaviour (c–d) transdiagnostic factor scores. Parameters were derived from single (only compulsive behaviour presented here, C) or dual learning rate (a–b & d) Q-learning models fit to training alone or training-plus-test (parameters from fits to training-plus-test denoted prime). In all plots, boxplot boxes denote 95% HDI, and lines denote 99% HDI.

Figure 4

Figure 4. Model-derived comparisons between distanced and non-distanced participants. a–b. Coefficient posterior distributions from Bayesian GLMs (adjusted for age, gender, and digit span) reflecting the estimated percentage change in the learning rate parameter α (a) or αreward/αloss (b), and the estimated mean change in the inverse temperature β, comparing distanced and non-distanced participants. Parameters were estimated from Q-learning models with single (a) or dual (b) learning rates, fit to training alone or training-plus-test (parameters from fits to training-plus-test denoted with prime). c. Individual-level posterior mean parameter estimates for αreward, αloss, and β, for models including increasing numbers of training blocks. d. Parameter differences between distanced and non-distanced participants, estimated from dual learning rate models fit to increasing numbers of training blocks (sixty trials per block). In all plots, boxplot boxes denote 95% HDIs, and lines denote 99% HDIs.

Supplementary material: File

Dercon et al. supplementary material

Dercon et al. supplementary material
Download Dercon et al. supplementary material(File)
File 1.1 MB