Reinforcement learning in women remitted from anorexia nervosa: Preliminary examination with a hybrid reinforcement learning/drift diffusion model

Christina E. Wierenga; Amanda Bischoff-Grethe; Carina S. Brown; Gregory G. Brown

doi:10.1017/S1355617725000013

Reinforcement learning in women remitted from anorexia nervosa: Preliminary examination with a hybrid reinforcement learning/drift diffusion model

Published online by Cambridge University Press: 03 February 2025

Christina E. Wierenga

Amanda Bischoff-Grethe

Carina S. Brown and

Gregory G. Brown

Show author details

Christina E. Wierenga*: Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Amanda Bischoff-Grethe: Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Carina S. Brown: Affiliation:
San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA, USA
Gregory G. Brown: Affiliation:
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
*: Corresponding author: Christina E. Wierenga; Email: cwierenga@ucsd.edu

Article contents

Abstract
Objective:
Methods:
Results:
Conclusions:
Introduction
Methods
Results
Discussion
Conclusions
Supplementary material
Competing interests
Funding statement
References

Rights & Permissions

Abstract

Objective:

Altered reinforcement learning (RL) and decision-making have been implicated in the pathophysiology of anorexia nervosa. To determine whether deficits observed in symptomatic anorexia nervosa are also present in remission, we investigated RL in women remitted from anorexia nervosa (rAN).

Methods:

Participants performed a probabilistic associative learning task that involved learning from rewarding or punishing outcomes across consecutive sets of stimuli to examine generalization of learning to new stimuli over extended task exposure. We fit a hybrid RL and drift diffusion model of associative learning to model learning and decision-making processes in 24 rAN and 20 female community controls (cCN).

Results:

rAN showed better learning from negative outcomes than cCN and this was greater over extended task exposure (p < .001, ηp2 = .30). rAN demonstrated a reduction in accuracy of optimal choices (p = .007, ηp2 = .16) and rate of information extraction on reward trials from set 1 to set 2 (p = .012, ηp2 = .14), and a larger reduction of response threshold separation from set 1 to set 2 than cCN (p = .036, ηp2 = .10).

Conclusions:

rAN extracted less information from rewarding stimuli and their learning became increasingly sensitive to negative outcomes over learning trials. This suggests rAN shifted attention to learning from negative feedback while slowing down extraction of information from rewarding stimuli. Better learning from negative over positive feedback in rAN might reflect a marker of recovery.

Keywords

Anorexia nervosa prediction error reinforcement learning decision-making drift diffusion model probabilistic associative learning

Information

Type: Research Article
Information: Journal of the International Neuropsychological Society , Volume 31 , Issue 2 , February 2025 , pp. 127 - 137

DOI: https://doi.org/10.1017/S1355617725000013 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of International Neuropsychological Society

Introduction

Anorexia nervosa (AN) is a medically-dangerous eating disorder characterized by extreme dietary restriction, an intense fear of weight gain, and disturbed body-related experience, resulting in severe weight loss and sustained low body weight (American Psychiatric Association, 2013). A growing body of research supports altered decision-making (Guillaume et al., Reference Guillaume, Gorwood, Jollant, Van den Eynde, Courtet and Richard-Devantoy2015; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2016) and reinforcement learning (Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023; Ritschel et al., Reference Ritschel, Geisler, King, Bernardoni, Seidel, Boehm, Vettermann, Biemann, Roessner, Smolka and Ehrlich2017) in AN, which are thought to underlie core cognitive and behavioral symptoms, including the persistence of dietary restriction and compensatory weight loss behaviors (e.g., purging, excessive exercise) despite negative consequences. However, less is known about how decision-making and reinforcement learning may interact in AN in ill and remitted states.

The ability to flexibly adapt to changing environments is based on both learning to maximize rewarding outcomes and to avoid aversive outcomes and requires outcome valuation and cognitive flexibility to update information and arbitrate between options (Dayan & Daw, Reference Dayan and Daw2008; Frank et al., Reference Frank, Seeberger and O’Reilly2004). Reinforcement learning (RL) models are based on the notion that the rate of learning is driven by violations of expectations, or prediction errors (PE), which reflect the difference between received and expected outcome (Pearce & Hall, Reference Pearce and Hall1980; Rescorla & Wagner, Reference Rescorla and Wagner1972). Learning from experience occurs through updating expectations about the outcome in proportion to PE, so that the expected outcome converges to the actual outcome. Deficits in reward processing in ill and remitted AN have been frequently observed (Haynos et al., Reference Haynos, Lavender, Nelson, Crow and Peterson2020; Kaye et al., Reference Kaye, Wierenga, Bischoff-Grethe, Berner, Ely, Bailer, Paulus and Fudge2020; O’Hara et al., Reference O’Hara, Schmidt and Campbell2015; Wierenga et al., Reference Wierenga, Bischoff-Grethe, Melrose, Irvine, Torres, Bailer, Simmons, Fudge, McClure, Ely and Kaye2015), with emerging evidence of altered aversive processing and increased punishment sensitivity in ill states (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018; Bischoff-Grethe et al., Reference Bischoff-Grethe, McCurdy, Grenesko-Stevens, (Zoe) Irvine, Wagner, Wendy Yau, Fennema-Notestine, Wierenga, Fudge, Delgado and Kaye2013; Harrison et al., Reference Harrison, O’Brien, Lopez and Treasure2010; Jonker et al., Reference Jonker, Glashouwer and de Jong2022; Jonker et al., Reference Jonker, Glashouwer, Hoekzema, Ostafin, de Jong and Hadjikhani2020; Monteleone et al., Reference Monteleone, Monteleone, Esposito, Prinster, Volpe, Cantone, Pellegrino, Canna, Milano, Aiello, Di Salle and Maj2017), thought to interfere with the ability to learn from experience. Consistent with this view, studies of RL for reward in AN tend to demonstrate decreased learning accuracy in acutely ill and weight-restored individuals (Foerde et al., Reference Foerde, Daw, Rufin, Walsh, Shohamy and Steinglass2021; Foerde & Steinglass, Reference Foerde and Steinglass2017; Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). In contrast, increased rates of punishment learning during RL tasks that involved changing rules (e.g., reversal learning) or outcome probabilities (e.g., adaptive learning tasks) have been reported in adolescents acutely ill with AN (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018) and adults remitted from AN (Bernardoni et al., Reference Bernardoni, King, Geisler, Ritschel, Schwoebel, Reiter, Endrass, Rössner, Smolka and Ehrlich2021; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014; Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023), although one study reported learning decreased following a rule change (Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014). Model simulations conducted to help explain this observed decrease in learning showed that while increasing a punishment sensitivity parameter explained observed accelerated initial category learning in remitted AN, observed deficits in set shifting were explained by altering parameters representing changes in rule selection and flexibility, highlighting that both punishment sensitivity and cognitive flexibility may impact associative learning in AN (Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014), consistent with other findings of worse cognitive control during RL in women remitted from AN (Ritschel et al., Reference Ritschel, Geisler, King, Bernardoni, Seidel, Boehm, Vettermann, Biemann, Roessner, Smolka and Ehrlich2017).

We previously used a well-studied two-choice, feedback-based, probabilistic associative learning task (PALT; (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011)) to investigate the influence of rewarding and punishing outcomes on instrumental RL over extended task exposure in individuals acutely ill with AN (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Performance on the PALT paradigm has provided insights into parkinsonism (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009), posttraumatic stress disorder (Myers et al., Reference Myers, Moustafa, Sheynin, VanMeenen, Gilbertson, Orr, Beck, Pang, Servatius and Boraud2013), major depressive disorder (Herzallah et al., Reference Herzallah, Moustafa, Natsheh, Abdellatif, Taha, Tayem, Sehwail, Amleh, Petrides, Myers and Gluck2013) and aging (Sojitra et al., Reference Sojitra, Lerner, Petok and Gluck2018), supporting the clinical validity of the PALT.

The PALT relies on the contingency between a participant’s response and outcome (i.e., whether or not they won or lost points) to facilitate learning (i.e., to select the optimal reward-based stimuli and avoid the non-optimal punishment-based stimuli), and unlike reversal learning tasks or adaptive learning tasks, outcome contingencies and learning rules do not change over the course of the task (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Myers et al., Reference Myers, Moustafa, Sheynin, VanMeenen, Gilbertson, Orr, Beck, Pang, Servatius and Boraud2013). Using a value-based computational model of RL, we found that individuals with acute AN had reduced learning rates when either positive (better than expected) or negative (worse than expected) PE occurred. Individuals with AN were also less likely than healthy controls to exploit what they had learned suggesting they may less decisively make choices (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). A larger magnitude of negative PE was associated with worse treatment outcome, suggesting that poorer loss-related learning may be a mechanism of AN persistence.

Typically, PE models of RL only account for accuracy data (Pedersen et al., Reference Pedersen, Frank and Biele2017) and use a choice rule that involves a single parameter, which encompasses all of the decision processes that led to a choice. However, because many cognitive processes other than PE learning, such as decision-making, underlie performance on RL tasks, computational models that incorporate both response time and accuracy data from these tasks, such as a hybrid RL and drift diffusion model (DDM; (Pedersen & Frank, Reference Pedersen and Frank2020; Pedersen et al., Reference Pedersen, Frank and Biele2017)) may better determine what cognitive processes account for abnormal functioning. Notably, drift diffusion models track learning as changes in accuracy, response time or a tradeoff of both. The RLDDM includes the PE learning rule in its architecture (Rescorla & Wagner, Reference Rescorla and Wagner1972), but models choice probability using the DDM, which accounts for the time to complete non-decision processes, such as stimulus encoding and response execution, and time to complete decisions. The DDM assumes that during the decision time information is sequentially sampled from a noisy stimulus until one of two thresholds is reached. The choice associated with the exceeded threshold is then made. Decisions are determined by such processes as choice bias, information sampling rate, and the spread of the response thresholds for the two choices. Thus, the RLDDM provides a more thoroughly articulated account of cognitive processes underlying decision-making than does the

single-parameter choice rule (Pedersen & Frank, Reference Pedersen and Frank2020; Pedersen et al., Reference Pedersen, Frank and Biele2017). Pedersen et al. (Reference Pedersen, Frank and Biele2017) found model parameters were strongly to very strongly related to medication effects in attention deficit hyperactivity disorder supporting the clinical validity of RLDDM parameters.

The current study aimed to clarify two key issues. First, to examine whether our previous findings of poorer learning following both positive and negative prediction errors reflect state-related correlates of the illness or are also present in remitted AN, we employed the same reinforcement learning paradigm in a sample of women remitted from AN. Second, we sought to better characterize cognitive processes related to decision-making that might impact RL by modeling both PALT accuracy and response time data using a hybrid RLDDM. Moreover, the hierarchical RLDDM explored whether psychological processes underlying RL and decision-making were altered by greater task exposure as reflected in parameter changes across consecutively presented sets of stimuli. Lastly, we explored whether the learning rate and decision-making model parameters were associated with clinical variables for the remitted AN group.

Methods

Participants

Twenty-five women remitted from DSM-5 anorexia nervosa (rAN; 16 pure restricting type; 9 binge eating/purging type, with regular purging but no binge-eating behavior) and 22 community control women (cCN) participated. Consistent with prior work, remission from AN was defined as maintaining above 85% of ideal body weight, regular menstrual cycles, and the absence of binge eating, purging (including excessive exercise), and restrictive eating for at least 1 year prior to study, with no current psychological symptoms of AN (e.g., body dissatisfaction) (Wagner et al., Reference Wagner, Barbarich‐Marsteller, Frank, Bailer, Wonderlich, Crosby, Henry, Vogel, Plotnicov, McConaha and Kaye2006). Women who met criteria for a current Axis I diagnosis were excluded from both groups, and controls with any eating disorder history were excluded. The study was approved by the University of California San Diego Institutional Review Board and conducted in compliance with the Helsinki Declaration of 1975, as revised in 2008. All participants provided written informed consent (see Supplemental Materials for additional details regarding assessment tools and exclusion criteria).

Probabilistic associative learning task (PALT)

A probabilistic associative learning task (Figure 1) was used to assess trial-by-trial response-outcome instrumental learning to reward (wins) and punishment (losses). The task was administered using two stimulus sets containing different stimuli to examine generalization of learning and evaluate whether rAN demonstrate altered learning from rewarding or aversive feedback with additional exposure to these types of feedback, as seen in ill AN (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). The order of stimulus sets (A or B) was counterbalanced across participants; set 1 refers to the first set presented (A or B) and set 2 refers to the second set presented (A or B) (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022).

Figure 1.

Probabilistic associative learning task (copied with permission from (Mattfeld et al., Reference Mattfeld, Gluck and Stark2011)). Note: The task required participants to determine on a trial-by-trial basis whether fractal images are associated with one of two categories “A” or “B” indicated with a button response (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011). Four images were used for each set, with two images randomly assigned to be “reward” stimuli, and two images assigned to be “punishment” stimuli. Reward-learning trials and punishment-learning trials were intermixed within the task. The participant’s running point tally was displayed at the bottom of the screen on each trial and was set to 500 points at the start of the experiment. For reward trials, the selection of the optimal category typically produced feedback (a smiley face) and a gain of 25 points, whereas selection of the non-optimal category typically produced no feedback and no gain of points. For punishment trials, the selection of the optimal category typically produced no feedback and no loss of points whereas selection of the non-optimal category typically produced feedback (frowning face) and a loss of 25 points. Thus, the task involves gaining 25 points when choosing the optimal response on reward trials (coded 1), but losing 25 points when choosing the non-optimal response on punishment trials (coded -1;(Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Myers et al., Reference Myers, Moustafa, Sheynin, VanMeenen, Gilbertson, Orr, Beck, Pang, Servatius and Boraud2013)). Other outcomes (non-optimal response on reward trials, optimal response on punishment trials) involved no change in points (coded 0). The task was administered using two stimulus sets containing different stimuli to examine differences in learning from rewarding or aversive feedback with additional exposure to these types of feedback. The order of stimulus sets (A or B) was counterbalanced across participants; set 1 refers to the first set presented (A or B) and set 2 refers to the second set presented (A or B). Each set involved 160 trials, divided into four 40-trial blocks. Within each block, each of the four stimuli (two “reward” stimuli, two “punishment” stimuli) appeared 10 times; 8 times the optimal response was associated with the more favorable outcome (i.e., 80% of trials), whereas 2 times the non-optimal response was associated with the more favorable outcome (i.e., 20% of trials) to facilitate probabilistic learning. Trial order was randomized within a block for each participant. Participants responded with the right hand only, using the index and middle fingers, and responses were captured using a Current Designs four button response box. If the participant did not respond, no feedback was provided. Trials lasted until the participant responded, or 4 sec if the participant did not respond, and were separated by a variable inter-trial interval (4 sec), during which time the screen was blank. On each trial, the computer recorded whether the participant made the optimal response, in addition to the actual outcome and response time on that trial. The task took about 32 minutes to complete, was programmed in presentation software, and administered on a Dell Inspiron PC.

Reinforcement learning drift diffusion model (RLDDM)

The hybrid RLDDM model provides information about prediction errors while giving additional information about the decision process. We used a slight modification of Pedersen and colleagues’ (2017) optimal fitting model to analyze our data (Supplemental Materials). Like Pedersen et al. (Reference Pedersen, Frank and Biele2017), we modeled learning using a delta rule and modeled decision-making using a drift diffusion (Wiener) process (Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016; Ratcliff & Tuerlinckx, Reference Ratcliff and Tuerlinckx2002). Pedersen et al.’s (Reference Pedersen, Frank and Biele2017) hierarchical Bayesian model estimated positive and negative prediction error learning rates (η _p and η _n), the scale parameter (m), a base boundary separation parameter (a), an exponent of a power function (i), and a non-decision parameter (t _er), which represents time needed to encode a stimulus and to prepare and execute responses. Larger values of the learning parameters indicate greater adaption of outcome expectancies when the reward is greater than expected or punishment less than expected (η _p) or when the outcome is more punishing or less rewarding than expected (η _n). Parameter m reflects how sensitive participants are to differences in outcome values associated with the two choices (Pedersen et al., Reference Pedersen, Frank and Biele2017). For some participants, differences in choice expectancies might be less important than for other participants. The boundary separation parameter a, which reflects the difference between the two response thresholds, is often interpreted as response caution with higher values favoring accuracy over speed (Myers et al., Reference Myers, Interian and Moustafa2022). Dynamic changes in boundary separation over trials are assumed to follow a power function of trial number with the parameter i reflecting the rate of change (Pedersen et al., Reference Pedersen, Frank and Biele2017). Values near zero represent little dynamic change, whereas larger values might represent less response caution with experience across trials. The psychological meaning of both boundary separation and the dynamic parameter i should be interpreted alongside drift rate v, which is derived from the difference in the two response expectancies multiplied by the sensitivity parameter m (Pedersen et al., Reference Pedersen, Frank and Biele2017). The drift rate represents the average speed of information extraction from a stimulus (Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016). The information accumulation process is noisy and is influenced by task difficulty, sensory discriminability, attention, and speed of cognitive processes among other neurobehavioral processes (Myers et al., Reference Myers, Interian and Moustafa2022; Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016). The association of narrower boundary separation with higher drift rates can lead to fast and accurate responding, implying learning rather than caution. We also added a starting point parameter z to the model (See Supplemental Materials). Values greater than 0.5 bias the participant towards making faster and more frequent optimal choices (Myers et al., Reference Myers, Interian and Moustafa2022). The RLDDM was fit to all valid trials of all participants within group modeling both between block and within block learning. The model hierarchy involved trial set as a higher level variable under which trials were embedded. See Table 2 for a summary of the parameters estimated and their descriptions and Supplemental Materials for more details.

Data analysis

The observed dependent variables were the median RT and mean proportion of optimal choices made (accuracy) calculated over reward and punishment trials separately for stimulus set 1 and 2. For observed data and modeled data involving trial-type or prediction error (learning rate and drift rate), we used the general linear model with repeated measures (SPSS Version 28.0.1.0) to perform a diagnostic group (cCN vs. rAN) x stimulus-set order (set 1 vs set 2) x trial type (reward vs. punishment) Analysis of Variance (ANOVA). To analyze model parameters that did not involve trial type (starting point bias, base boundary separation, boundary power, non-decision time, and scale), we performed a diagnostic group × set ANOVA. In order to explore interactions, we used SPSS multivariate simple main effect tests of marginal means (Garofalo et al., Reference Garofalo, Giovagnoli, Orsoni, Starita, Benassi and Evans2022). These tests provide linearly independent tests of pairwise comparisons among estimates of the marginal means.

Data and model parameter estimates from three participants were excluded from analysis. One cCN participant had a large number of missing values on punishment trials and appeared to game the program. Two participants (1 cCN, 1 rAN) with poor convergence of some parameter estimates were also excluded. Various levels of power given different effect sizes are presented in Supplemental Figure 6. Throughout the paper the 95% CI is reported.

Exploratory clinical associations

Separate Pearson correlational analyses within the rAN group examined relationships between the two RL model values (learning rate positive PE (η_p) and learning rate negative PE (η_n)) and five AN clinical measures (current BMI, lowest BMI, age of AN onset, duration of illness, duration of remission) separately for each set, and between three DDM model values (base boundary separation (a), drift rate reward trials (v _r), drift rate punishment trials (v _p)) and the five AN clinical measures. Bonferroni correction for multiple comparisons was used to determine a family-wise p-value of .005 for the two RL values and the five clinical measures, and a family-wise p-value of .003 for the three DDM values and the five clinical measures, assuming p = .05 for each test. Correlational analyses were repeated for mood/personality measures (BDI, STAI, TCI Harm Avoidance, TCI Reward Dependence) and age.

Results

Demographic variables

The rAN and cCN groups did not differ in education, current BMI, anxiety or depressive symptoms (Table 1). The rAN group was slightly older and had significantly lower historical BMI (p < .001). There were no correlations between parameter values and age by group and within stimulus set, suggesting age had little if any impact on parameter estimates.

Table 1.

Demographic and clinical characteristics of the final sample

Note: Student’s sample t-tests were used to assess statistical significance for between-group differences in continuous variables. Despite the age difference between groups, there were no correlations between parameter values and age by group and within stimulus set, suggesting age had little if any impact on parameter estimates. BDI = Beck Depression Inventory-Second Edition (BDI-2), BMI = body mass index. EDI-3 = Eating Disorder Inventory, STAI = Spielberger State-Trait Anxiety Inventory, TCI = Temperament and Character Inventory.

Observed task values

A priori tests of linear trends of response time and optimal choice over blocks of trials confirmed task learning for both groups (See Supplemental Materials for ANOVA results and plots).

Response time

The set × trial-type effect was significant, F(1,42) = 12.82, p < .001, η _p² = .23 (Figure 2 top row). Simple main effect tests on marginal means revealed that whereas the RT difference between set 1 and set 2 on reward trials was not significant (p = .102, η _p² = .06), RT was significantly slower on set 1 than set 2 on punishment trials (p < .001, η _p²= .46), generating a large effect. RTs for set 1 trials were significantly longer than those for set 2, F(1,42) = 26.05, p < .001, η _p² = .38. Participants responded to trials on which stimuli were associated with punishing outcomes more slowly than on trials where stimuli were associated with rewarding outcomes, F(1,42) = 89.47, p < .001, η _p² = .68. The main effect of diagnostic group and its interactions were not significant (all η _p² < .07).

Figure 2.

Observed and modeled response times by stimulus set and trial type.

Note: Significance levels for simple main effects tests.

Proportion of optimal choices

There was a significant diagnostic group × set × trial-type interaction, F(1,42) = 4.78, p = .034, η _p² = .10 (Figure 3 top row). Simple main effects tests on marginal means revealed significantly lower proportion of optimal choices on set 2 than set 1 only among rAN participants and only on reward trials (p = .007, η _p² = .16). Participants responded on trials associated with punishing outcomes more accurately than when responding on reward trials, F(1,42) = 32.13, p < .001, η _p² = .43. The main effect of diagnostic group and its interaction with trial type were non-significant (both < .02).

Figure 3.

Observed and modeled accuracy by diagnostic group, stimulus set, and trial type.

Note: Accuracy is the proportion of trials an optimal choice was selected. Significance levels for simple main effects tests.

Parameter values

See Table 2 for summary of results.

Table 2.

Model parameters: meaning and summary of results

Note: g refers to the trial grouping variable, either set 1 or set 2; s is the subject number; t is the trial number; PE is prediction error. Some parameters vary across trials within stimulus set and diagnostic group, whereas others are fixed across trials.

PE learning rates

A significant two-factor interaction of learning rate valence with diagnostic group, F(1,42) = 12.28, p = .001, η _p² = .23 indicated a greater learning rate for negative PE in the rAN group compared to cCN, (p < .001, η _p² = .25). Additionally, the difference between positive and negative learning rate valence was greater for set 1 trials than for set 2 trials, F(1,42) = 4.59, p = .038, η _p² = .10 (Figure 4). These two-way interactions varied across a third study condition producing a three-factor learning rate × diagnostic group × set interaction F(1,42) = 8.09, p = .007, η _p² = .16. Simple main effects tests on marginal means indicated a significant increase in negative PE learning rate from set 1 to set 2 trials for rAN participants, p < .001, η _p² = .27. No other simple effect tests within the three-factor interaction, including the apparent set difference on positive PE learning rate for rAN, were significant. Learning rate for positive PE (η_p) was greater than for negative PE (η_n), F(1,42) = 65.85, p < .001, η _p² = .61. Neither the main effect of diagnostic group nor its interaction with stimulus set was significant.

Figure 4.

Learning rates by diagnostic group, stimulus set and prediction error valence.

Drift rate

The diagnostic group × set interaction was significant, F(1,42) = 5.79, p = .021, η _p² = .12, driven by a reduction of mean drift rate from set 1 to set 2 in the rAN group (p = .012, η _p² = .14). Moreover, even though the diagnostic group × trial-type interaction was not significant, the diagnostic group × set x trial-type interaction was, F(1,42) = 4.42, p = .042, η _p² = .10. Simple main effect tests on marginal means revealed a significantly smaller drift rate on set 2 compared with set 1 reward trials within the rAN group, p = .012, η _p² = .14 (Figure 5) clarifying the group × set interaction. No other simple effects were significant with all other η _p² < .03. There were no significant main effects of diagnostic group, trial type, or set on drift rate.

Figure 5.

Mean drift rate by diagnostic group, stimulus set and trial type.

Scale

There were no significant effects of group, set, or their interaction (all η _p² < .02) for the scaling parameter. The grand mean (M = 2.75, [2.40, 3.12]) was significantly greater than 1.0 – a value of the difference scale that would have had no impact on the drift rate, t (43) = 9.32, p < .001, d = 1.40.

Boundary parameters

Base boundary separation

Stimulus set significantly interacted with diagnostic group, F(1,42) = 4.69, p = .036, η _p² = .10 (Figure 6). Although simple main effects tests on marginal means revealed that the base boundary separation estimated from set 1 trials was significantly greater than from set 2 for both cCN (p = .023) and rAN (p < .001), the effect size of the boundary separation difference across sets for rAN was greater (η _p² = .44) than for cCN (η _p² = .12). The base boundary separation estimated from set 1 trials was significantly greater than the separation estimated from set 2, F(1,42) = 32.06, p < .001, η _p² = .43. The main effect of diagnostic group was not significant, p = .285, η _p² = .027.

Figure 6.

Base boundary separation by diagnostic group and stimulus set.

Boundary power separation

Values were negative for all participants when estimated from set 2 stimuli and all participants but two on set 1, causing smaller boundary separation widths as a stimulus was repeated. Mean boundary power value was significantly smaller (F(1,42) = 4.77, p = .035, η _p² = .10) for set 1 stimuli (M = -.09, [−.11, −.07]) than for set 2 (M = .-.07, [−.07, −.06]). However, the mean differences across diagnostic group and the interaction of group with set were not significant (both η _p² < .01).

Starting point bias

The grand mean for starting point bias (M = .48, [.47, .49]) was significantly lower than the unbiased value 0.50, t (43) = −4.15, p < .001, d = −.63, indicating a bias toward the non-optimal choice (lower boundary) at the start of a trial. There were no significant effects of group, set or their interaction on starting point bias (all η _p² < .02).

Non-decision time

There were no significant effects of diagnostic group, set or their interaction on the non-decision time parameter.

Integrating parameter findings

Model estimated non-decision time did not differ across the study conditions, implying that time to encode stimuli and prepare responses did not significantly vary between groups. Rather study conditions affected reinforcement learning and decision processes (below).

The relative increase in negative PE rate from set 1 to set 2 among rAN implies that with greater task exposure rAN more readily learned when receiving less reward or greater punishment than expected compared with cCN.

Findings involving decision processes included effects involving the separation of optimal and non-optimal choice boundaries (Base Boundary Separation) and the mean rate of information extraction from the stimulus. Although the boundary separation was less on set 2 than set 1 for both groups, the difference was larger for rAN. At the same time, the drift rate was slower for set 2 stimuli than set 1 on reward trials for rAN. When drift rate is fixed, less boundary separation causes the extracted information to reach a response boundary more quickly. However, drift rate slowed more for set 2 reward stimuli than for set 1 reward stimuli in rAN. Together tighter boundary separation and slower drift rate could counterbalance one another and account for absence of a significant interaction of group and set on RT. However, together these effects could explain the reduced accuracy seen in rAN on set 2 because responses would be based on less extracted information than on set 1.

The absence of significant main effects or interactions on the scale parameter suggests the two groups might not have differed in their sensitivity to the two learning rates. Finally, our results did not support a group difference in global choice bias.

Analyzing posterior predictions of response time and proportion optimal choices

The modeled values appear to mirror the key interactions from the observed data quite well (Compare the top and bottom rows of Figures 2 and 3). ANOVAs on the modeled RT and accuracy values supported their similarity to the observed data (Supplemental Materials).

Exploratory clinical associations

No associations between age, mood or personality variables and RLDDM parameters were detected in rAN (uncorrected p < .05). For illness-related variables, the only association to meet statistical significance after controlling for multiple comparisons was the association between age of AN onset and learning rate for negative PE on set 2 (r = .57, p = .004) and across both sets (r = .52, p = .009), demonstrating that women with a later age of onset had greater learning rates following negative PE.

Discussion

To better understand the latent processes involved in choice behavior during reinforcement-based decision-making in rAN, this study incorporated the drift diffusion model of decision-making as the choice function, instead of using only a single choice parameter. Three key findings emerged: 1) Participants had greater learning rates for positive PE than negative PE with no significant group differences observed for positive PE; 2) the rAN group had greater learning rates for negative PE than cCN. Moreover, exploration of a three-way interaction indicated a significant increase in negative PE learning rate from set 1 to set 2 trials for rAN participants. Negative PE learning rate on set 2 trials was associated with later age of AN onset; 3) Within a three-way interaction, the rAN group had a lower drift rate for reward trials on set 2, suggesting that their poorer optimal choice accuracy for set 2 reward trials was explained by less information uptake.

Findings that rAN do not differ from cCN in learning rate from positive PE, and have greater learning rate than cCN from negative PE, are in contrast to our previous findings in women with acute AN, suggesting the impairment in the acute illness might be related to state-specific factors, such as psychological symptom severity and/or malnutrition. The current finding of greater learning rate for negative PE in rAN is consistent with studies of reinforcement learning under changing learning rules (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018; Bernardoni et al., Reference Bernardoni, King, Geisler, Ritschel, Schwoebel, Reiter, Endrass, Rössner, Smolka and Ehrlich2021; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014) and outcome contingencies (Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023) demonstrating increased punishment learning in both ill and remitted states, and suggests increased punishment learning in more stable learning contexts as well.

AN is also characterized by cognitive inflexibility, both in ill and remitted states (Miles et al., Reference Miles, Gnatt, Phillipou and Nedeljkovic2020; Roberts et al., Reference Roberts, Tchanturia, Stahl, Southgate and Treasure2007; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2014). However, few studies have examined the degree to which cognitive inflexibility and difficulty set shifting in AN contribute to altered reinforcement learning. We previously found that women ill with AN had smaller explore-exploit parameter values during RL, suggesting that they were less decisive about exploiting what they had learned (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). However, since the single explore-exploit parameter encompasses all of the decision processes leading to a choice, the current study aimed to increase precision by examining whether DDM provides a more fine-grained characterization of altered decision processes in rAN during instrumental learning. An examination of the DDM parameters related to choice behavior revealed no differences between cCN and rAN in starting point bias (both groups preferred the non-optimal trial at the beginning of each set before decision evidence was available), boundary power (groups did not differ in being less cautious when making a choice with greater practice), or scale (no group differences in the importance of expectancy differences). rAN and cCN groups also did not differ on non-decision time, suggesting no differences in the time spent encoding the stimuli and engaging in motor processes. The decision parameters that affected accuracy findings, especially on set 2 for rAN, were the slower drift rate and the more narrow base boundary separation. These findings suggest the hypothesis that differences in the single decision parameter we previously reported might have been driven by slower information extraction rather than by inflexible or perseverative responding.

Like other investigators, we chose to use symbolic feedback in order to investigate reinforcement learning processes generally (Chan et al., Reference Chan, Ahn, Bates, Busemeyer, Guillaume, Redgrave, Danner and Courtet2014; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014; Foerde & Steinglass, Reference Foerde and Steinglass2017). Assuming the present results and those from our previous study would generalize from symbolic to food feedback, some additional hypotheses based on our findings are possible. We previously found that the magnitude of negative PE when punishment was possible was most strongly associated with treatment outcome in ill AN; larger negative PEs predicted less weight gain over the course of treatment (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Deficits in learning from punishment have been hypothesized to help explain the rigid persistence of disordered eating behaviors despite their negative consequences in AN. In the current study, greater learning rate for negative PE in rAN was associated with later age of AN onset, suggesting that worse negative PE learning may be an indicator of early neurodevelopmental disruption. The PE learning rate findings from the present study suggest that with increasing exposure to rewarding and punishing outcomes, rAN shift attention to learning from negative PE while slowing down extraction of information from rewarding stimuli due to limited attentional resources. The increased emphasis on negative over positive prediction error learning in rAN that was not observed in ill AN might be a marker of recovery (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Hypotheses suggested by the current findings require testing in studies using food as feedback.

A key strength of this study is the innovative incorporation of the DDM as the choice mechanism into a PE reinforcement learning model to provide a richer characterization of theoretically distinct aspects of decision-making to reinforcement learning in rAN. Other strengths include refinements to the PE model by modeling separate trial-specific positive and negative PE learning rates, and using hierarchical Bayesian analysis to simultaneously estimate individual and group parameters to ensure reliable and mutually constrained parameter estimates for complex models (Gelman et al., Reference Gelman, Carline, Stern, Dunson, Vehtari and Rubin2013; Kruschke, Reference Kruschke2010). As shown in Supplemental Materials, our models demonstrated good fit to the behavioral data. Moreover, similarity of learning rate values among community controls estimated from the RLDDM in the current study to those from the RL model we previously published (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022) supports the assumption that adding the DDM apparatus to the RL architecture does not greatly alter the reinforcement learning mechanism.

Despite these strengths, this study was limited in its cross-sectional design. Longitudinal studies are needed to determine the causal role of learning rate, particularly for negative PE, in clinical outcome and to determine how enduring our findings are. Similarly, retest data are needed to determine reliability of task measures and model parameter values. The exclusion of individuals with a current Axis I diagnosis from the rAN group may limit the external validity of the study findings, though this is a strength with respect to internal validity. Small sample sizes precluded AN subtype analyses (e.g., restricting-only vs purging). The rAN group was also older and less racially representative; sensitivity analyses suggest age did not contribute to group performance differences. Groups did not differ on reaction time nor did they differ on non-decision time, suggesting the rAN group did not have slowed processing speed indicative of residual cognitive symptoms. Thus, it is unlikely that reduced accuracy on reward trials over time in rAN is reflective of broader cognitive sequelae. Moreover, our stringent remission criteria mitigated potential confounding influences of malnutrition, comorbid psychopathology and medication effects on performance. Finally, direct comparisons of ill and remitted AN within the same study are needed to test hypotheses about the state versus trait status of findings and potential markers of recovery.

Conclusions

This is the first study to evaluate the contribution of theoretically distinct aspects of decision-making to reinforcement learning in remitted AN by integrating computational models of reinforcement learning with a drift diffusion model of decision-making. Using an instrumental probabilistic associative learning task that included positive and negative outcomes, we observed better negative PE learning rate with extended stimuli exposure, and slower information uptake with less cautious responding over time on reward trials in rAN, suggesting with increasing exposure to rewarding and punishing outcomes, rAN shift attention to learning from negative PE while slowing down extraction of information from rewarding stimuli. Simultaneous estimation of RL and DDM parameters provides a fine-grained analysis of the cognitive processes underlying speeded binary decisions during reinforcement learning. The increased emphasis on negative over positive feedback during trial-by-trial learning in rAN that was not previously observed in ill AN suggests this might reflect a marker of recovery and potential target of treatment.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S1355617725000013.

Competing interests

There are no competing interests.

Funding statement

This research received no specific grant from any funding agency, commercial or not-for-profit sectors.

References

American Psychiatric Association. (2013). Diagnostic and Statistical Manual of Mental Disorders (DSM-V). (5th edn). American Psychiatric Association, https://doi.org/10.1176/appi.books.9780890425596 Google Scholar

Bernardoni, F., Geisler, D., King, J. A., Javadi, A.-H., Ritschel, F., Murr, J., Reiter, A. M. F., Rössner, V., Smolka, M. N., Kiebel, S., & Ehrlich, S. (2018). Altered medial frontal feedback learning signals in anorexia nervosa. Biological Psychiatry, 83, 235–243.CrossRef Google Scholar PubMed

Bernardoni, F., King, J. A., Geisler, D., Ritschel, F., Schwoebel, S., Reiter, A. M. F., Endrass, T., Rössner, V., Smolka, M. N., & Ehrlich, S. (2021). More by stick than by carrot: A reinforcement learning style rooted in the medial frontal cortex in anorexia nervosa. Journal of Abnormal Psychology, 130, 736–747.CrossRef Google Scholar

Bischoff-Grethe, A., McCurdy, D., Grenesko-Stevens, E., (Zoe) Irvine, L. E., Wagner, A., Wendy Yau, W.-Y., Fennema-Notestine, C., Wierenga, C. E., Fudge, J. L., Delgado, M. R., & Kaye, W. H. (2013). Altered brain response to reward and punishment in adolescents with anorexia nervosa. Psychiatry Research, 214, 331–340.CrossRef Google Scholar PubMed

Bodi, N., Keri, S., Nagy, H., Moustafa, A., Myers, C. E., Daw, N., Dibo, G., Takats, A., Bereczki, D., Gluck, M. A. (2009). Reward-learning and the novelty-seeking personality: A between- and within-subjects study of the effects of dopamine agonists on young parkinson’s patients. Brain, 132, 2385–2395.CrossRef Google Scholar PubMed

Chan, T. W. S., Ahn, W‐Young, Bates, J. E., Busemeyer, J. R., Guillaume, S., Redgrave, G. W., Danner, U. N., & Courtet, P. (2014). Differential impairments underlying decision making in anorexia nervosa and bulimia nervosa: A cognitive modeling analysis. International Journal of Eating Disorders, 47, 157–267.CrossRef Google Scholar PubMed

Dayan, P., & Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. Cognitive, Affective, and Behavioral Neuroscience, 8, 429–453.CrossRef Google Scholar PubMed

Filoteo, J. V., Paul, E. J., Ashby, F. G., Frank, G. K. W., Helie, S., Rockwell, R., Bischoff-Grethe, A., Wierenga, C., & Kaye, W. H. (2014). Simulating category learning and set shifting deficits in patients weight-restored from anorexia nervosa. Neuropsychology, 28, 741–751.CrossRef Google Scholar PubMed

Foerde, K., Daw, N., Rufin, T., Walsh, B., Shohamy, D., & Steinglass, J. (2021). Deficient goal-directed control in a population characterized by extreme goal pursuit. Journal of Cognitive Neuroscience, 33, 463–481.CrossRef Google Scholar

Foerde, K., & Steinglass, J. (2017). Decreased feedback learning in anorexia nervosa persists after weight restoration. International Journal of Eating Disorders, 50, 415–423.CrossRef Google Scholar PubMed

Frank, M. J., Seeberger, L. C., & O’Reilly, R. C. (2004). By carrot or by stick: Cognitive reinforcement learning in parkinsonism. Science, 306, 1940–1943.CrossRef Google Scholar PubMed

Garofalo, S., Giovagnoli, S., Orsoni, M., Starita, F., Benassi, M., & Evans, R. (2022). Interaction effect: Are you doing the right thing? Plos One, 17, e0271668.CrossRef Google Scholar PubMed

Gelman, A., Carline, J. G., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd edn). CRC Press.CrossRef Google Scholar

Guillaume, S., Gorwood, P., Jollant, F., Van den Eynde, F., Courtet, P., & Richard-Devantoy, S. (2015). Impaired decision-making in symptomatic anorexia and bulimia nervosa patients: A meta-analysis. Psychological Medicine, 45, 3377–3391.CrossRef Google Scholar PubMed

Harrison, A., O’Brien, N., Lopez, C., & Treasure, J. (2010). Sensitivity to reward and punishment in eating disorders. Psychiatry Research, 177, 1–11.CrossRef Google Scholar PubMed

Haynos, A. F., Lavender, J. M., Nelson, J., Crow, S. J., & Peterson, C. B. (2020). Moving towards specificity: A systematic review of cue features associated with reward and punishment in anorexia nervosa. Clinical Psychology Review, 79, 101872. doi:ARTN 10187210.1016/j.cpr.2020.101872 CrossRef Google Scholar PubMed

Herzallah, M. M., Moustafa, A. A., Natsheh, J. Y., Abdellatif, S. M., Taha, M. B., Tayem, Y. I., Sehwail, M. A., Amleh, I., Petrides, G., Myers, C. E., & Gluck, M. A. (2013). Learning from negative feedback in patients with major depressive disorder is attenuated by SSRI antidepressants. Frontiers in Integrative Neuroscience, 7, 67.CrossRef Google Scholar PubMed

Jonker, N. C., Glashouwer, K. A., & de Jong, P. J. (2022). Punishment sensitivity and the persistence of anorexia nervosa: High punishment sensitivity is related to a less favorable course of anorexia nervosa. International Journal of Eating Disorders, 55, 697–702.CrossRef Google Scholar PubMed

Jonker, N. C., Glashouwer, K. A., Hoekzema, A., Ostafin, B. D., de Jong, P. J., & Hadjikhani, N. (2020). Heightened self-reported punishment sensitivity, but no differential attention to cues signaling punishment or reward in anorexia nervosa. PloS one, 15, e0229742.CrossRef Google Scholar PubMed

Kaye, W. H., Wierenga, C. E., Bischoff-Grethe, A., Berner, L. A., Ely, A. V., Bailer, U. F., Paulus, M. P., & Fudge, J. L. (2020). Neural insensitivity to the effects of hunger in women remitted from anorexia nervosa. American Journal of Psychiatry, 177, 601–610.CrossRef Google Scholar

Kruschke, J. K. (2010). Bayesian data analysis. Wiley Interdisciplinary Reviews-Cognitive Science, 1, 658–676.CrossRef Google Scholar PubMed

Mattfeld, A., Gluck, M., & Stark, C. (2011). Functional specialization within the striatum along both the dorsal/ventral and anterior/posterior axes during associative learning via reward and punishment. Learning & Memory, 18, 703–711.CrossRef Google Scholar PubMed

Miles, S., Gnatt, I., Phillipou, A., & Nedeljkovic, M. (2020). Cognitive flexibility in acute anorexia nervosa and after recovery: A systematic review. Clinical Psychology Review, 81, 101905.CrossRef Google Scholar PubMed

Monteleone, A. M., Monteleone, P., Esposito, F., Prinster, A., Volpe, U., Cantone, E., Pellegrino, F., Canna, A., Milano, W., Aiello, M., Di Salle, F., & Maj, M. (2017). Altered processing of rewarding and aversive basic taste stimuli in symptomatic women with anorexia nervosa and bulimia nervosa: An fMRI study. Journal of Psychiatric Research, 90, 94–101.CrossRef Google Scholar PubMed

Myers, C. E., Interian, A., & Moustafa, A. A. (2022). A practical introduction to using the drift diffusion model of decision-making in cognitive psychology, neuroscience, and health sciences. Frontiers in Psychology, 13, 1039172. https://doi.org/10.3389/fpsyg.2022.1039172CrossRef Google Scholar PubMed

Myers, C. E., Moustafa, A. A., Sheynin, J., VanMeenen, K. M., Gilbertson, M. W., Orr, S. P., Beck, K. D., Pang, K. C. H., Servatius, R. J., Boraud, T. (2013). Learning to obtain reward, but not avoid punishment, is affected by presence of PTSD symptoms in male veterans: Empirical data and computational model. Plos One, 8, e72508.CrossRef Google Scholar

O’Hara, C., Schmidt, U., & Campbell, I. (2015). A reward-centered model of anorexia nervosa: A focussed narrative review of the neurological and psychophysiological literature. Neuroscience and Biobehavioral Reviews, 52, 131–152.CrossRef Google Scholar

Pearce, J., & Hall, G. (1980). A model for pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552.CrossRef Google Scholar

Pedersen, M. L., & Frank, M. J. (2020). Simultaneous hierarchical Bayesian parameter estimation for reinforcement learning and drift diffusion models: A tutorial and links to neural data. Computational Brain & Behavior, 3, 458–471.CrossRef Google Scholar PubMed

Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic bulletin & review, 24, 1234–1251.CrossRef Google Scholar

Pike, A. C., Sharpley, A. L., Park, R. J., Cowen, P. J., Browning, M., & Pulcu, E. (2023). Adaptive learning from outcome contingencies in eating-disorder risk groups. Translational Psychiatry, 13, 340.CrossRef Google Scholar PubMed

Ratcliff, R., Smith, P. L., Brown, S. D., & McKoon, G. (2016). Diffusion decision model: Current issues and history. Trends in Cognitive Science, 20, 260–281.CrossRef Google Scholar PubMed

Ratcliff, R., & Tuerlinckx, F. (2002). Estimating parameters of the diffusion model: Approaches to dealing with contaminant reaction times and parameter variability. Psychonomic Bulletin & Review, 9, 438–481.CrossRef Google Scholar PubMed

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and non-reinforcement. Appleton Century Crofts.Google Scholar

Ritschel, F., Geisler, D., King, J. A., Bernardoni, F., Seidel, M., Boehm, I., Vettermann, R., Biemann, R., Roessner, V., Smolka, M. N., Ehrlich, S. (2017). Neural correlates of altered feedback learning in women recovered from anorexia nervosa. Science Reports, 7, 5421.CrossRef Google Scholar PubMed

Roberts, M., Tchanturia, K., Stahl, D., Southgate, L., & Treasure, J. (2007). A systematic review and meta-analysis of set-shifting ability in eating disorders. Psychological Medicine, 37, 1075–1084.CrossRef Google Scholar PubMed

Sojitra, R., Lerner, I., Petok, J., & Gluck, M. (2018). Age affects reinforcement learning through dopamine-based learning imbalance and high decision noise-not through parkinsonian mechanisms. Neurobiology of Aging, 68, 102–113.CrossRef Google Scholar PubMed

Wagner, A., Barbarich‐Marsteller, N. C., Frank, G. K., Bailer, U. F., Wonderlich, S. A., Crosby, R. D., Henry, S. E., Vogel, V., Plotnicov, K., McConaha, C., & Kaye, W. H. (2006). Personality traits after recovery from eating disorders: Do subtypes differ? International Journal of Eating Disorders, 39, 276–284.CrossRef Google Scholar PubMed

Wierenga, C. E., Bischoff-Grethe, A., Melrose, A. J., Irvine, Z., Torres, L., Bailer, U. F., Simmons, A., Fudge, J. L., McClure, S. M., Ely, A., & Kaye, W. H. (2015). Hunger does not motivate reward in women remitted from anorexia nervosa. Biological Psychiatry, 77, 642–652.CrossRef Google Scholar

Wierenga, C. E., Reilly, E., Bischoff-Grethe, A., Kaye, W. H., & Brown, G. G. (2022). Altered reinforcement learning from reward and punishment in anorexia nervosa: Evidence from computational modeling. Journal of the International Neuropsychological Society, 28, 1003–1015.CrossRef Google Scholar PubMed

Wu, M., Brockmeyer, T., Hartmann, M., Skunde, M., Herzog, W., & Friederich, H. (2014). Set-shifting ability across the spectrum of eating disorders and in overweight and obesity: A systematic review and meta-analysis. Psychological Medicine, 44, 3365–3385.CrossRef Google Scholar PubMed

Wu, M., Brockmeyer, T., Hartmann, M., Skunde, M., Herzog, W., & Friederich, H. C. (2016). Reward-related decision making in eating and weight disorders: A systematic review and meta-analysis of the evidence from neuropsychological studies. Neuroscience and Biobehavioral Reviews, 61, 177–196.CrossRef Google Scholar PubMed

Figure 1. Probabilistic associative learning task (copied with permission from (Mattfeld et al., 2011)). Note: The task required participants to determine on a trial-by-trial basis whether fractal images are associated with one of two categories “A” or “B” indicated with a button response (Bodi et al., 2009; Mattfeld et al., 2011). Four images were used for each set, with two images randomly assigned to be “reward” stimuli, and two images assigned to be “punishment” stimuli. Reward-learning trials and punishment-learning trials were intermixed within the task. The participant’s running point tally was displayed at the bottom of the screen on each trial and was set to 500 points at the start of the experiment. For reward trials, the selection of the optimal category typically produced feedback (a smiley face) and a gain of 25 points, whereas selection of the non-optimal category typically produced no feedback and no gain of points. For punishment trials, the selection of the optimal category typically produced no feedback and no loss of points whereas selection of the non-optimal category typically produced feedback (frowning face) and a loss of 25 points. Thus, the task involves gaining 25 points when choosing the optimal response on reward trials (coded 1), but losing 25 points when choosing the non-optimal response on punishment trials (coded -1;(Bodi et al., 2009; Mattfeld et al., 2011; Myers et al., 2013)). Other outcomes (non-optimal response on reward trials, optimal response on punishment trials) involved no change in points (coded 0). The task was administered using two stimulus sets containing different stimuli to examine differences in learning from rewarding or aversive feedback with additional exposure to these types of feedback. The order of stimulus sets (A or B) was counterbalanced across participants; set 1 refers to the first set presented (A or B) and set 2 refers to the second set presented (A or B). Each set involved 160 trials, divided into four 40-trial blocks. Within each block, each of the four stimuli (two “reward” stimuli, two “punishment” stimuli) appeared 10 times; 8 times the optimal response was associated with the more favorable outcome (i.e., 80% of trials), whereas 2 times the non-optimal response was associated with the more favorable outcome (i.e., 20% of trials) to facilitate probabilistic learning. Trial order was randomized within a block for each participant. Participants responded with the right hand only, using the index and middle fingers, and responses were captured using a Current Designs four button response box. If the participant did not respond, no feedback was provided. Trials lasted until the participant responded, or 4 sec if the participant did not respond, and were separated by a variable inter-trial interval (4 sec), during which time the screen was blank. On each trial, the computer recorded whether the participant made the optimal response, in addition to the actual outcome and response time on that trial. The task took about 32 minutes to complete, was programmed in presentation software, and administered on a Dell Inspiron PC.

Table 1. Demographic and clinical characteristics of the final sample

Figure 2. Observed and modeled response times by stimulus set and trial type.Note: Significance levels for simple main effects tests.

Figure 3. Observed and modeled accuracy by diagnostic group, stimulus set, and trial type.Note: Accuracy is the proportion of trials an optimal choice was selected. Significance levels for simple main effects tests.

Table 2. Model parameters: meaning and summary of results

Figure 4. Learning rates by diagnostic group, stimulus set and prediction error valence.

Figure 5. Mean drift rate by diagnostic group, stimulus set and trial type.

Figure 6. Base boundary separation by diagnostic group and stimulus set.

Wierenga et al. supplementary material

File 20.2 MB

Article contents

Reinforcement learning in women remitted from anorexia nervosa: Preliminary examination with a hybrid reinforcement learning/drift diffusion model

Abstract

Keywords

Information

Introduction

Methods

Participants

Probabilistic associative learning task (PALT)

Reinforcement learning drift diffusion model (RLDDM)

Data analysis

Exploratory clinical associations

Results

Demographic variables

Observed task values

Response time

Proportion of optimal choices

Parameter values

PE learning rates

Drift rate

Scale

Boundary parameters

Base boundary separation

Boundary power separation

Starting point bias

Non-decision time

Integrating parameter findings

Analyzing posterior predictions of response time and proportion optimal choices

Exploratory clinical associations

Discussion

Conclusions

Supplementary material

Competing interests

Funding statement

References

Wierenga et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests