Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-06T13:50:37.076Z Has data issue: false hasContentIssue false

Reinforcement learning in women remitted from anorexia nervosa: Preliminary examination with a hybrid reinforcement learning/drift diffusion model

Published online by Cambridge University Press:  03 February 2025

Christina E. Wierenga*
Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Amanda Bischoff-Grethe
Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Carina S. Brown
Affiliation:
San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, USA
Gregory G. Brown
Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
*
Corresponding author: Christina E. Wierenga; Email: cwierenga@ucsd.edu
Rights & Permissions [Opens in a new window]

Abstract

Objective:

Altered reinforcement learning (RL) and decision-making have been implicated in the pathophysiology of anorexia nervosa. To determine whether deficits observed in symptomatic anorexia nervosa are also present in remission, we investigated RL in women remitted from anorexia nervosa (rAN).

Methods:

Participants performed a probabilistic associative learning task that involved learning from rewarding or punishing outcomes across consecutive sets of stimuli to examine generalization of learning to new stimuli over extended task exposure. We fit a hybrid RL and drift diffusion model of associative learning to model learning and decision-making processes in 24 rAN and 20 female community controls (cCN).

Results:

rAN showed better learning from negative outcomes than cCN and this was greater over extended task exposure (p < .001, ηp2 = .30). rAN demonstrated a reduction in accuracy of optimal choices (p = .007, ηp2 = .16) and rate of information extraction on reward trials from set 1 to set 2 (p = .012, ηp2 = .14), and a larger reduction of response threshold separation from set 1 to set 2 than cCN (p = .036, ηp2 = .10).

Conclusions:

rAN extracted less information from rewarding stimuli and their learning became increasingly sensitive to negative outcomes over learning trials. This suggests rAN shifted attention to learning from negative feedback while slowing down extraction of information from rewarding stimuli. Better learning from negative over positive feedback in rAN might reflect a marker of recovery.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of International Neuropsychological Society
Figure 0

Figure 1. Probabilistic associative learning task (copied with permission from (Mattfeld et al., 2011)). Note: The task required participants to determine on a trial-by-trial basis whether fractal images are associated with one of two categories “A” or “B” indicated with a button response (Bodi et al., 2009; Mattfeld et al., 2011). Four images were used for each set, with two images randomly assigned to be “reward” stimuli, and two images assigned to be “punishment” stimuli. Reward-learning trials and punishment-learning trials were intermixed within the task. The participant’s running point tally was displayed at the bottom of the screen on each trial and was set to 500 points at the start of the experiment. For reward trials, the selection of the optimal category typically produced feedback (a smiley face) and a gain of 25 points, whereas selection of the non-optimal category typically produced no feedback and no gain of points. For punishment trials, the selection of the optimal category typically produced no feedback and no loss of points whereas selection of the non-optimal category typically produced feedback (frowning face) and a loss of 25 points. Thus, the task involves gaining 25 points when choosing the optimal response on reward trials (coded 1), but losing 25 points when choosing the non-optimal response on punishment trials (coded -1;(Bodi et al., 2009; Mattfeld et al., 2011; Myers et al., 2013)). Other outcomes (non-optimal response on reward trials, optimal response on punishment trials) involved no change in points (coded 0). The task was administered using two stimulus sets containing different stimuli to examine differences in learning from rewarding or aversive feedback with additional exposure to these types of feedback. The order of stimulus sets (A or B) was counterbalanced across participants; set 1 refers to the first set presented (A or B) and set 2 refers to the second set presented (A or B). Each set involved 160 trials, divided into four 40-trial blocks. Within each block, each of the four stimuli (two “reward” stimuli, two “punishment” stimuli) appeared 10 times; 8 times the optimal response was associated with the more favorable outcome (i.e., 80% of trials), whereas 2 times the non-optimal response was associated with the more favorable outcome (i.e., 20% of trials) to facilitate probabilistic learning. Trial order was randomized within a block for each participant. Participants responded with the right hand only, using the index and middle fingers, and responses were captured using a Current Designs four button response box. If the participant did not respond, no feedback was provided. Trials lasted until the participant responded, or 4 sec if the participant did not respond, and were separated by a variable inter-trial interval (4 sec), during which time the screen was blank. On each trial, the computer recorded whether the participant made the optimal response, in addition to the actual outcome and response time on that trial. The task took about 32 minutes to complete, was programmed in presentation software, and administered on a Dell Inspiron PC.

Figure 1

Table 1. Demographic and clinical characteristics of the final sample

Figure 2

Figure 2. Observed and modeled response times by stimulus set and trial type.Note: Significance levels for simple main effects tests.

Figure 3

Figure 3. Observed and modeled accuracy by diagnostic group, stimulus set, and trial type.Note: Accuracy is the proportion of trials an optimal choice was selected. Significance levels for simple main effects tests.

Figure 4

Table 2. Model parameters: meaning and summary of results

Figure 5

Figure 4. Learning rates by diagnostic group, stimulus set and prediction error valence.

Figure 6

Figure 5. Mean drift rate by diagnostic group, stimulus set and trial type.

Figure 7

Figure 6. Base boundary separation by diagnostic group and stimulus set.

Supplementary material: File

Wierenga et al. supplementary material

Wierenga et al. supplementary material
Download Wierenga et al. supplementary material(File)
File 20.2 MB