Hostname: page-component-6766d58669-fx4k7 Total loading time: 0 Render date: 2026-05-19T19:07:51.461Z Has data issue: false hasContentIssue false

Altered Reinforcement Learning from Reward and Punishment in Anorexia Nervosa: Evidence from Computational Modeling

Published online by Cambridge University Press:  29 November 2021

Christina E. Wierenga*
Affiliation:
University of California, San Diego, CA, USA
Erin Reilly
Affiliation:
Hofstra University, Hempstead, NY, USA
Amanda Bischoff-Grethe
Affiliation:
University of California, San Diego, CA, USA
Walter H. Kaye
Affiliation:
University of California, San Diego, CA, USA
Gregory G. Brown
Affiliation:
University of California, San Diego, CA, USA
*
*Correspondence and reprint requests to: Christina E. Wierenga, Ph.D., Professor of Psychiatry, UCSD Eating Disorder Research and Treatment Program UCSD Department of Psychiatry, University of California, Chancellor Park, 4510 Executive Dr., Suite 315, San Diego, CA, 92121, USA. E-mail: cwierenga@ucsd.edu
Rights & Permissions [Opens in a new window]

Abstract

Objectives:

Anorexia nervosa (AN) is associated with altered sensitivity to reward and punishment. Few studies have investigated whether this results in aberrant learning. The ability to learn from rewarding and aversive experiences is essential for flexibly adapting to changing environments, yet individuals with AN tend to demonstrate cognitive inflexibility, difficulty set-shifting and altered decision-making. Deficient reinforcement learning may contribute to repeated engagement in maladaptive behavior.

Methods:

This study investigated learning in AN using a probabilistic associative learning task that separated learning of stimuli via reward from learning via punishment. Forty-two individuals with Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 restricting-type AN were compared to 38 healthy controls (HCs). We applied computational models of reinforcement learning to assess group differences in learning, thought to be driven by violations in expectations, or prediction errors (PEs). Linear regression analyses examined whether learning parameters predicted BMI at discharge.

Results:

AN had lower learning rates than HC following both positive and negative PE (p < .02), and were less likely to exploit what they had learned. Negative PE on punishment trials predicted lower discharge BMI (p < .001), suggesting individuals with more negative expectancies about avoiding punishment had the poorest outcome.

Conclusions:

This is the first study to show lower rates of learning in AN following both positive and negative outcomes, with worse punishment learning predicting less weight gain. An inability to modify expectations about avoiding punishment might explain persistence of restricted eating despite negative consequences, and suggests that treatments that modify negative expectancy might be effective in reducing food avoidance in AN.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © INS. Published by Cambridge University Press, 2021
Figure 0

Fig. 1. (A) Rather than setting all expectancy values, V, to zero on the first trial a stimulus, sj, is presented, as in the No Bias model, they are set either to a bias value, bias(sj), or to zero in the First Choice Bias model. The bias(sj) values are sampled from a normal distribution with mean zero, indicating no bias, and a precision = 10, where precision = 1/variance. If the sampled bias value for stimulus sj is positive, the choice that would yield the optimal long-term outcome is favored and its expectancy value for trial 1, V1(cOpt|sj), is set to the sampled bias value, bias(sj), whereas the expectancy value for the nonoptimal response, V1(cNonOpt|sj), is set to zero. If the sampled bias value is negative the nonoptimal choice is favored and the expectancy value for the nonoptimal choice is set to the absolute value of the bias, whereas the expectancy value for the optimal choice is set to zero. For the First Choice Bias (Singlet) model, the bias parameters for each stimulus is set to the same estimated value bias(s.). (B) The expectancy value for trial t + 1 associated with the choice ci made to stimulus sj on trial t, Vt+1(ci|sj), is the expectancy value on trial t updated by the product of a learning rate with the prediction error. Different learning rates, ηp|n, are estimated for positive or negative prediction errors, PEp|n. Learning rates are sampled from a beta distribution using values of the α and β parameters listed in Table 2 (Also see Supplement). A logistic equation maps the differences between the expectancy value of the choice made on trial t, Vt(ci|sj), and the value of the choice not made, $${{\rm{V}}_{\rm{t}}}({{\rm{\overline c}}_i}|{{\rm{S}}_j})$$, to the probability Pt(ci|sj) of making the chosen response ci given that stimulus sj was presented on trial t. The logistic regression weight β is sampled from a gamma distribution using values of the shape and rate parameters presented in Table 2 (Also see Supplement).

Figure 1

Table 1. Demographic and clinical characteristics of the sample

Figure 2

Fig. 2. Probabilistic associative learning task (copied with permission from (Mattfeld et al., 2011)).

Figure 3

Table 2. Parameters estimated for each of the four models and their prior distributions

Figure 4

Fig. 3. Plots of the observed and predicted mean probability of selecting the optimal choice for AN and HC groups across the four blocks by trial type (reward, punishment) and picture set. We calculated for each participant the predicted block means for reward and punishment trials based on the participant’s full First Choice Bias model parameter estimates and present the average of these means for AN and HC groups for the two picture sets as black squares. As can be seen, in every instance the model derived means are within the 95% confidence interval of the observed means, and most cover the data means, supporting the prediction model. (A) For observed data, on reward trials, results indicate improved performance over time across all participants, consistent with learning, [main effect of Block, F(3,225) = 41.482, p < .001, η2p = .356], and the HC group had a greater learning rate overall than the AN group [Group × Block interaction, F(3,225) = 5.771, p = .001, η2p = .071]. However, AN performed better than HC on Set 1 and worse than HC on Set 2 [Group × Set interaction, F(1,75) = 5.556, p = .021, η2p = .069]. No other main effects or interactions were significant for reward trials, ps > .3. No other main effects or interactions were significant for reward trials, ps > .3. (B) On punishment trials, performance improved over time across all participants [main effect of Block, F(3,225) = 3.711, p = .012, η2p = .047], and HC performed better than AN [main effect of Group, F(1,75) = 6.833, p = .011, η2p = .083]. No other main effects or interactions were significant for punishment trials, ps > .1.

Figure 5

Table 3. Reinforcement learning model generated parameters by group and set

Figure 6

Fig. 4. (A) Plot of the mean learning rate by prediction error type and group collapsed across set demonstrating the main effect of Group resulting from the Group × Set × PE type ANOVA. The main effect of Group indicated that AN learn more slowly than HC following both positive PEs and negative PEs. A main effect of PE type revealed faster learning rates following positive PEs compared to negative PEs across the entire sample. Neither the main effect of Set nor any interactions were significant (all η2p< .039). (B) Plot of explore-exploit values by group and set showing a main effect of Group. AN had lower β values than HC. Smaller values imply individuals are still exploring stimulus-response-outcome hypotheses and are less certain about exploiting learned rules. The main effect of Set was not significant, nor was the interaction of Group x Set (all η2p< .030). (C) Plot of the change in BMI from admission to discharge with size of negative PE on punishment trials of Set 1. Error bars represent standard error of the mean; *p < .05, **p < .01, ***p < .001.

Supplementary material: File

Wierenga et al. supplementary material

Wierenga et al. supplementary material

Download Wierenga et al. supplementary material(File)
File 17.4 MB