Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

Jonathan W. Kanen; Qiang Luo; Mojtaba Rostami Kandroodi; Rudolf N. Cardinal; Trevor W. Robbins; David J. Nutt; Robin L. Carhart-Harris; Hanneke E. M. den Ouden

doi:10.1017/S0033291722002963

Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

Published online by Cambridge University Press: 22 November 2022

Jonathan W. Kanen

Qiang Luo ,

Mojtaba Rostami Kandroodi

Rudolf N. Cardinal

Trevor W. Robbins

David J. Nutt ,

Robin L. Carhart-Harris

and

Hanneke E. M. den Ouden

Show author details

Jonathan W. Kanen*: Affiliation:
Department of Psychology, University of Cambridge, Cambridge, UK Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK
Qiang Luo: Affiliation:
National Clinical Research Center for Aging and Medicine at Huashan Hospital, State Key Laboratory of Medical Neurobiology and Ministry of Education Frontiers Center for Brain Science, Institutes of Brain Science and Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China Center for Computational Psychiatry, Ministry of Education-Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, Human Phenome Institute, Fudan University, Shanghai, 200032, China Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, School of Psychology and Cognitive Science, East China Normal University, Shanghai, 200241, China
Mojtaba Rostami Kandroodi: Affiliation:
Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
Rudolf N. Cardinal: Affiliation:
Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK Department of Psychiatry, University of Cambridge, Cambridge, UK Cambridgeshire and Peterborough NHS Foundation Trust, Cambridge, UK
Trevor W. Robbins: Affiliation:
Department of Psychology, University of Cambridge, Cambridge, UK Behavioural and Clinical Neuroscience Institute, University of Cambridge, Cambridge, UK
David J. Nutt: Affiliation:
Department of Brain Sciences, Centre for Psychedelic Research, Imperial College London, London, UK
Robin L. Carhart-Harris: Affiliation:
Neuroscape Psychedelics Division, University of California San Francisco, San Francisco, California, USA
Hanneke E. M. den Ouden: Affiliation:
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
*: Author for correspondence: Jonathan W. Kanen, E-mail: jonathan.kanen@gmail.com

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Materials and methods
Results
Discussion
Financial support
Footnotes
References

Rights & Permissions

Abstract

Background

The non-selective serotonin 2A (5-HT2A) receptor agonist lysergic acid diethylamide (LSD) holds promise as a treatment for some psychiatric disorders. Psychedelic drugs such as LSD have been suggested to have therapeutic actions through their effects on learning. The behavioural effects of LSD in humans, however, remain incompletely understood. Here we examined how LSD affects probabilistic reversal learning (PRL) in healthy humans.

Methods

Healthy volunteers received intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline) in a within-subjects design and completed a PRL task. Participants had to learn through trial and error which of three stimuli was rewarded most of the time, and these contingencies switched in a reversal phase. Computational models of reinforcement learning (RL) were fitted to the behavioural data to assess how LSD affected the updating (‘learning rates’) and deployment of value representations (‘reinforcement sensitivity’) during choice, as well as ‘stimulus stickiness’ (choice repetition irrespective of reinforcement history).

Results

Raw data measures assessing sensitivity to immediate feedback (‘win-stay’ and ‘lose-shift’ probabilities) were unaffected, whereas LSD increased the impact of the strength of initial learning on perseveration. Computational modelling revealed that the most pronounced effect of LSD was the enhancement of the reward learning rate. The punishment learning rate was also elevated. Stimulus stickiness was decreased by LSD, reflecting heightened exploration. Reinforcement sensitivity differed by phase.

Conclusions

Increased RL rates suggest LSD induced a state of heightened plasticity. These results indicate a potential mechanism through which revision of maladaptive associations could occur in the clinical application of LSD.

Keywords

5-HT2A cognitive flexibility computational modeling LSD probabilistic reversal learning psychedelics reinforcement learning serotonin

Information

Type: Original Article
Information: Psychological Medicine , Volume 53 , Issue 14 , October 2023 , pp. 6434 - 6445

DOI: https://doi.org/10.1017/S0033291722002963 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press

Introduction

Research into lysergic acid diethylamide (LSD) as a potential therapeutic agent in psychiatry has been revitalised in recent years (Nutt & Carhart-Harris, Reference Nutt and Carhart-Harris2020; Vollenweider & Preller, Reference Vollenweider and Preller2020). Theories on the putative beneficial effects of LSD on mental health centre on its effects on learning and plasticity (Carhart-Harris & Nutt, Reference Carhart-Harris and Nutt2017), yet a limited number of human studies have examined its effect on instrumental learning and behavioural or cognitive flexibility (Hutten et al., Reference Hutten, Mason, Dolder, Theunissen, Holze, Liechti and Kuypers2020; Pokorny, Duerler, Seifritz, Vollenweider, & Preller, Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019). LSD acts principally but not exclusively as an agonist at the serotonin (5-HT; 5-hydroxytryptamine) 2A (5-HT_2A) receptor (Marona-Lewicka & Nichols, Reference Marona-Lewicka and Nichols2007; Marona-Lewicka, Thisted, & Nichols, Reference Marona-Lewicka, Thisted and Nichols2005; Nichols, Reference Nichols2016). Indeed, blocking 5-HT_2A receptors inhibits the psychedelic effects of LSD (Nichols, Reference Nichols2016). The 5-HT_2A receptor is involved in plasticity (Barre et al., Reference Barre, Berthoux, De Bundel, Valjent, Bockaert, Marin and Bécamel2016; Vaidya, Marek, Aghajanian, & Duman, Reference Vaidya, Marek, Aghajanian and Duman1997) and its modulation represents a putative neurobiological mechanism through which LSD could facilitate the revision of maladaptive associations (Carhart-Harris & Nutt, Reference Carhart-Harris and Nutt2017). Indeed, LSD and 5-HT_2A agonists have been shown to improve associative learning in non-human animals (Harvey, Reference Harvey2003; Harvey, Gormezano, Cool-Hauser, & Schindler, Reference Harvey, Gormezano, Cool-Hauser and Schindler1988; Romano et al., Reference Romano, Quinn, Li, Dave, Schindler, Aloyo and Harvey2010; Schindler, Gormezano, & Harvey, Reference Schindler, Gormezano and Harvey1986).

Serotonin is critically involved in adapting behaviour flexibly as environmental circumstances change (Barlow et al., Reference Barlow, Alsiö, Jupp, Rabinovich, Shrestha, Roberts and Dalley2015; Brigman et al., Reference Brigman, Mathur, Harvey-White, Izquierdo, Saksida, Bussey and Holmes2010; Clarke, Dalley, Crofts, Robbins, & Roberts, Reference Clarke, Dalley, Crofts, Robbins and Roberts2004; Furr, Danet Lapiz-Bluhm, & Morilak, Reference Furr, Danet Lapiz-Bluhm and Morilak2012; Matias, Lottem, Dugué, & Mainen, Reference Matias, Lottem, Dugué and Mainen2017; Lapiz-Bluhm et al., Reference Lapiz-Bluhm, Soto-Piña, Hensler and Morilak2009; Rygula et al., Reference Rygula, Clarke, Cardinal, Cockcroft, Xia, Dalley and Roberts2015), as well as processing aversive outcomes (Bari et al., Reference Bari, Theobald, Caprioli, Mar, Aidoo-Micah, Dalley and Robbins2010; Chamberlain et al., Reference Chamberlain, Müller, Blackwell, Clark, Robbins and Sahakian2006; Cools, Roberts, & Robbins, Reference Cools, Roberts and Robbins2008; Crockett, Clark, & Robbins, Reference Crockett, Clark and Robbins2009; Dayan & Huys, Reference Dayan and Huys2009; Deakin, Reference Deakin2013; den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013; Geurts, Huys, den Ouden, & Cools, Reference Geurts, Huys, den Ouden and Cools2013). Both can be modelled in a laboratory setting using PRL paradigms. In these, individuals learn by trial and error the most adaptive action, in an ‘acquisition’ stage, and this rule eventually changes in a ‘reversal’ phase (Lawrence, Sahakian, Rogers, Hodges, & Robbins, Reference Lawrence, Sahakian, Rogers, Hodges and Robbins1999). Profound neurotoxin-induced depletion of serotonin from the marmoset orbitofrontal cortex (OFC) causes perseverative, stimulus-bound behaviour (Walker, Robbins, & Roberts, Reference Walker, Robbins and Roberts2009) – an impaired ability to update action upon reversal (Clarke et al., Reference Clarke, Dalley, Crofts, Robbins and Roberts2004). At the same time, repeated dosing of a selective serotonin reuptake inhibitor (SSRI) improved reversal learning in rats (Bari et al., Reference Bari, Theobald, Caprioli, Mar, Aidoo-Micah, Dalley and Robbins2010). Acute administration of SSRIs, meanwhile, has resulted in an increased sensitivity to negative feedback (referred to as ‘lose-shift’ behaviour) in healthy humans (Chamberlain et al., Reference Chamberlain, Müller, Blackwell, Clark, Robbins and Sahakian2006; Skandali et al., Reference Skandali, Rowe, Voon, Deakin, Cardinal, Cormack and Sahakian2018) and rats (Bari et al., Reference Bari, Theobald, Caprioli, Mar, Aidoo-Micah, Dalley and Robbins2010). Indeed, the latter effect may be attributed to findings that acute SSRI administration can paradoxically lower serotonin concentration in projection areas in monkeys and healthy humans (Nord, Finnema, Halldin, & Farde, Reference Nord, Finnema, Halldin and Farde2013), highlighting the complexity of some serotonergic effects.

More specifically, a number of studies have implicated 5-HT_2A receptor function in reversal learning. Furr et al. (Reference Furr, Danet Lapiz-Bluhm and Morilak2012) showed 5-HT_2A receptors in the rat OFC contributed to improved reversal learning following chronic SSRI administration. Barlow et al. (Reference Barlow, Alsiö, Jupp, Rabinovich, Shrestha, Roberts and Dalley2015) reported that highly perseverative rats during reversal learning had reduced 5-HT_2A receptors in the OFC. Boulougouris, Glennon, and Robbins (Reference Boulougouris, Glennon and Robbins2008) demonstrated that systemic 5-HT_2A antagonism impaired reversal learning in rats. At the same time, antagonism of 5-HT_2A in the mouse OFC enhanced perseveration during reversal learning whereas 5-HT_2A antagonism in the dorsomedial striatum improved reversal learning (Amodeo, Rivera, Cook, Sweeney, & Ragozzino, Reference Amodeo, Rivera, Cook, Sweeney and Ragozzino2017). These anatomical functional differences may inform the reconciliation of other rodent studies on 5-HT_2A and reversal learning that have employed systemic drug administration (Amodeo, Jones, Sweeney, & Ragozzino, Reference Amodeo, Jones, Sweeney and Ragozzino2014, Reference Amodeo, Hassan, Klein, Halberstadt and Powell2020; Baker, Thompson, Sweeney, & Ragozzino, Reference Baker, Thompson, Sweeney and Ragozzino2011; Odland, Kristensen, & Andreasen, Reference Odland, Kristensen and Andreasen2021).

In addition to affecting the serotonin system, LSD has dopamine type 2 (D₂) receptor agonist properties (Marona-Lewicka et al., Reference Marona-Lewicka, Thisted and Nichols2005; Marona-Lewicka & Nichols, Reference Marona-Lewicka and Nichols2007; Nichols, Reference Nichols2004). Dopamine is particularly well known to play a fundamental role in learning from feedback (Schultz, Reference Schultz2019; Schultz, Dayan, & Montague, Reference Schultz, Dayan and Montague1997) putatively mediating plasticity changes during associative learning (Shen, Flajolet, Greengard, & Surmeier, Reference Shen, Flajolet, Greengard and Surmeier2008; Yin & Knowlton, Reference Yin and Knowlton2006). Meanwhile, dopamine depletion of the marmoset caudate nucleus, like serotonergic OFC depletion, also induced perseveration (Clarke, Hill, Robbins, & Roberts, Reference Clarke, Hill, Robbins and Roberts2011). Additionally, there is a body of evidence, across species, that D₂-modulating agents affect instrumental reversal learning (Boulougouris, Castañé, & Robbins, Reference Boulougouris, Castañé and Robbins2009; Kanen, Ersche, Fineberg, Robbins, & Cardinal, Reference Kanen, Ersche, Fineberg, Robbins and Cardinal2019; Lee, Groman, London, & Jentsch, Reference Lee, Groman, London and Jentsch2007).

Human studies of LSD have employed a variety of behavioural measures including facial emotion recognition, empathy, and social behaviour (Dolder, Schmid, Müller, Borgwardt, & Liechti, Reference Dolder, Schmid, Müller, Borgwardt and Liechti2016); response inhibition (Schmidt et al., Reference Schmidt, Müller, Lenz, Dolder, Schmid, Zanchi and Borgwardt2018); prepulse inhibition (Schmid et al. Reference Schmid, Enzler, Gasser, Grouzmann, Preller, Vollenweider and Liechti2015); working memory and risk-based decision-making (Family et al., Reference Family, Maillet, Williams, Krediet, Carhart-Harris, Williams and Raz2020; Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019); processing social influence (Duerler, Schilbach, Stämpfli, Vollenweider, & Preller, Reference Duerler, Schilbach, Stämpfli, Vollenweider and Preller2020); semantic processing (Family et al., Reference Family, Vinson, Vigliocco, Kaelen, Bolstridge, Nutt and Carhart-Harris2016); attention, information processing, and cognitive control (Family et al., Reference Family, Maillet, Williams, Krediet, Carhart-Harris, Williams and Raz2020; Hutten et al., Reference Hutten, Mason, Dolder, Theunissen, Holze, Liechti and Kuypers2020); time perception (Yanakieva et al., Reference Yanakieva, Polychroni, Family, Williams, Luke and Terhune2019); paired associates learning and memory, balance, and proprioception (Family et al., Reference Family, Maillet, Williams, Krediet, Carhart-Harris, Williams and Raz2020). The effects of psilocybin and 3,4-methylenedioxymethamphetamine (MDMA), which are also non-selective 5-HT_2A agonists, in humans have also been studied in relation to episodic memory (Barrett, Carbonaro, Hurwitz, Johnson, & Griffiths, Reference Barrett, Carbonaro, Hurwitz, Johnson and Griffiths2018; Doss, Weafer, Gallo, & De Wit, Reference Doss, Weafer, Gallo and De Wit2018).

Higher-order cognitive flexibility, on a set-shifting task, was impaired by acute intoxication with LSD in healthy humans (Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019). Meanwhile, psilocybin increased higher-order cognitive flexibility (set shifting), subsequent to drug treatment, in individuals with major depressive disorder (Doss et al., Reference Doss, Smith, Pova, Rosenberg, Sepeda, Davis and Barrett2021). Ayahuasca, another psychedelic non-selective 5-HT_2A agonist, and psilocybin have been shown to increase creative thinking during and after drug administration, which was interpreted as increased psychological flexibility (Kuypers et al., Reference Kuypers, Riba, de la Fuente Revenga, Barker, Theunissen and Ramaekers2016; Mason, Mischler, Uthaug, & Kuypers, Reference Mason, Mischler, Uthaug and Kuypers2019). Meanwhile, healthy human behaviour on an outcome devaluation task, used to parse habitual v. goal-directed action, was not impaired by LSD (Hutten et al., Reference Hutten, Mason, Dolder, Theunissen, Holze, Liechti and Kuypers2020).

Here, we studied healthy human volunteers to examine the effects of LSD on a widely used translational measure of instrumental conditioning and behavioural/cognitive flexibility: probabilistic reversal learning (PRL). In contrast to the set-shifting and outcome devaluation tasks used previously, PRL models fundamental aspects of choice behaviour under uncertainty (probabilistic reinforcement) and when flexibility is required. We explored how LSD altered not only overt choice behaviour during PRL (using classical statistics) but also the underlying learning mechanisms, using computational models of reinforcement learning (RL, using Bayesian statistics), which have not been employed in previous studies. Utilising PRL in a placebo-controlled study of healthy human volunteers, the aim of the current experiment was to inform the psychological mechanisms by which LSD could have salubrious effects on mental health.

Based on raw data measures, we predicted LSD would modulate either sensitivity to negative feedback or the impact of learned values on subsequent perseverative behaviour (den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013). Measuring ‘staying’ (repeating a choice) or ‘shifting’ (choosing another stimulus) after wins or losses assesses sensitivity to immediate reinforcement but does not account for the integration of feedback history across multiple experiences to influence behaviour (Daw, Reference Daw, Delgado, Phelps and Robbins2011). To this end, we applied computational models of RL. The expected value of choice options, for example, increases or decreases dynamically based on reward or punishment prediction errors (experienced better or worse than expected outcomes). A key objective of this study was to evaluate the effects of LSD on the rate at which value is updated (‘learning rates’) – in essence, does LSD affect how quickly expectations change following reinforcement? Another question of interest was whether LSD modulates exploratory behaviour. We tested two varieties of exploration. First, we addressed whether LSD impacts the extent to which behaviour is guided by exploiting the more highly valued choice or, conversely, an exploratory pattern that is less guided by value (termed high or low ‘reinforcement sensitivity,’ respectively). The second variety of exploration (low ‘stimulus stickiness’) was value-free rather than value-based in that it represents a tendency to explore (rather than repeat) different choices (stimuli) to what has been chosen previously, regardless of the action's outcome (irrespective of value representations).

Materials and methods

Subjects and drug administration

Nineteen healthy volunteers (mean age 30.6; 15 males), over the age of 21, attended two sessions at least two weeks apart where they received either intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline), in a single-blind within-subjects balanced-order design. Whereas 20 participants were included in the original study (Carhart-Harris et al., Reference Carhart-Harris, Muthukumaraswamy, Roseman, Kaelen, Droog, Murphy and Nutt2016b), one participant did not complete the PRL task; therefore, 19 participants are reported here. Demographic information is provided in online Supplementary Table S1. All participants provided written informed consent after briefing on the study and screening. Participants had no personal history of diagnosed psychiatric disorder, or immediate family history of a psychotic disorder. Other inclusion criteria were a normal electrocardiogram (ECG), normal screening blood tests, negative urine tests for pregnancy and recent recreational drug use, a negative breathalyser test for recent alcohol use, alcohol use limited to less than 40 UK units per week, and absence of a significant medical condition. Participants had previous experience with a classic psychedelic drug [e.g. LSD, mescaline, psilocybin/magic mushrooms, or dimethyltryptamine (DMT)/ayahuasca] without an adverse reaction, and had not used these within six weeks of the study. Screening was conducted at the Imperial College London Clinical Research Facility (ICRF) at the Hammersmith Hospital campus, and the study was carried out at the Cardiff University Brain Research Imaging Centre (CUBRIC). Participants were blinded to the condition but the experimenters were not. A cannula was inserted and secured in the antecubital fossa and injection was performed over the course of two minutes. Participants reported noticing subjective effects of LSD five to 15 min after dosing. The PRL task was administered approximately five hours after injection. Once the subjective drug effects subsided, a psychiatrist assessed suitability for discharge. This experiment was part of a larger study, the data from which are published elsewhere (e.g. Carhart-Harris et al. Reference Carhart-Harris, Muthukumaraswamy, Roseman, Kaelen, Droog, Murphy and Nutt2016b). Additional information can be found in Carhart-Harris et al. (Reference Carhart-Harris, Muthukumaraswamy, Roseman, Kaelen, Droog, Murphy and Nutt2016b).

Probabilistic reversal learning task

A schematic of the task is shown in Fig. 1a. On every trial, participants could choose from three visual stimuli, presented at three of four randomised locations on a computer screen. In the first half of the task (40 trials), choosing one of the stimuli resulted in positive feedback in the form of a green smiling face on 75% of trials. A second stimulus resulted in positive feedback 50% of the time, whilst the third stimulus yielded positive feedback on only 25% of trials. Negative feedback was provided in the form of a red frowning face. The first stimulus selected was defined as the initially rewarded stimulus; the choice on trial 1 always resulted in reward. The second stimulus that was selected was defined as the mostly punished stimulus, and by definition the third stimulus was then the ‘neutral’ stimulus. After 40 trials, the most and least optimal stimuli reversed, such that the stimulus that initially was correct 75% of the time was then only correct 25% of the time, and likewise the 25% correct stimulus then resulted in positive feedback on 75% of trials. There were 40 trials in the reversal phase. This is a recently developed version (Rostami Kandroodi et al., Reference Rostami Kandroodi, Cook, Swart, Froböse, Geurts, Vahabie and den Ouden2021) of a widely used PRL task (den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013; Lawrence et al., Reference Lawrence, Sahakian, Rogers, Hodges and Robbins1999) – novel due to the addition of a 50% ‘neutral’ stimulus in order to distinguish learning to select the mostly rewarding stimulus from learning to avoid the mostly punishing stimulus.

Fig. 1.

(a) Schematic of the PRL task. Subjects chose one of three stimuli. The timeline of a trial is depicted: stimuli appear, a choice is made, the outcome is shown, a fixation cross is presented during the intertrial interval, stimuli appear for the next trial (etc.) (RT, reaction time). One stimulus delivered positive feedback (green smiling face) with a 75% probability, one with 50%, and one with 25%. The probabilistic alternative was negative feedback (red sad face). Midway through the task, the contingencies for the best and worst stimuli swapped. s, seconds. (b) Better initial learning was predictive of more perseveration on LSD and not on placebo. Shading indicates ± 1 standard error of the mean (s.e.). (c) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the placebo session. A sliding 5-trial window was used for smoothing. The vertical dotted line indicates the reversal of contingencies. R-P indicates mostly rewarded stimulus, later mostly punished. N-N indicates neutral stimulus during both acquisition and reversal. P-R indicates mostly punished stimulus, later mostly rewarded stimulus. Shading indicates ± 1 s.e. (d) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the LSD session. A sliding 5-trial window was used for smoothing. The vertical dotted line indicates the reversal of contingencies. R-P indicates mostly rewarded stimulus, later mostly punished. N-N indicates neutral stimulus during both acquisition and reversal. P-R indicates mostly punished stimulus, later mostly rewarded stimulus. Shading indicates ± 1 s.e. (e) Distributions depicting the average per-subject probability (scattered dots) of choosing each stimulus while under placebo (shown in dark blue) and LSD (light blue). The mean value for each distribution is illustrated with a single dot at the base of each distribution, and the mean values for the probability of choosing different stimuli in each condition are connected by a line. Black error bars around the mean value show ± 1 s.e. Horizontal dotted line indicates chance-level ‘stay’ behaviour (33%). The global probability of choosing each stimulus did not differ between the placebo and LSD conditions. (f) Raw data measures of feedback sensitivity were unaffected by LSD. Distributions depicting the average per-subject probability (scattered dots) of repeating a choice (staying) after receiving positive or negative feedback under placebo (dark blue) and LSD (light blue). The horizontal dotted line indicates chance-level ‘stay’ behaviour (33%).

Raw data measures of behaviour

We examined whether LSD impaired participants' basic overall ability to perform the task by analysing the number of responses made to each stimulus during the acquisition and reversal phases. We measured feedback sensitivity by determining whether participants stayed with the same choice following positive or negative feedback (win-stay or lose-stay). The win-stay probability was defined as the number of times an individual repeated a choice after a win, divided by the number of trials on which positive feedback occurred (opportunities to stay after a win). Lose-stay probability was calculated in the same manner: the number of times a choice was repeated following a loss, divided by the total losses experienced. Note that in previous studies with a choice between only two stimuli (or responses), this metric is usually referred to as ‘win-stay/lose-shift’, which also captures the tendency to repeat (rather than switch) responses following a win, and the tendency to switch (rather than repeat) choices following a loss. Random choice would result in 50% win-stay and 50% lose-shift; however, in the current paradigm with 3 stimuli, this base rate is 33% (win-)stay and 67% (lose-)shift. We therefore encode both variables with respect to the stay (rather than shift) rate, but they are still conceptually identical to earlier studies. Perseveration was defined according to den Ouden et al. (Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013) and was assessed based on responses in the reversal phase. A perseverative error occurred when two or more (now incorrect) responses were made to the previously correct stimulus, and these errors could occur at any point in the reversal phase. The first trial in the reversal phase (trial 41 of 80) was excluded from the perseveration analysis, however, as at that point behaviour cannot yet have been shaped by the new feedback structure. Note again that this metric is not entirely identical to the previous studies cited employing two stimuli, as the base-rate choice for each stimulus is now 1/3, so the ‘chance’ level of perseverative errors is lower. Null hypothesis significance tests used α = 0.05.

Computational modelling of behaviour

Model fitting, comparison, and interpretation

These methods are based on our previous work (Kanen et al., Reference Kanen, Ersche, Fineberg, Robbins and Cardinal2019). We fitted three RL models to the behavioural data using a hierarchical Bayesian method, via Hamiltonian Markov chain Monte Carlo sampling implemented in Stan 2.17.2 (Carpenter et al., Reference Carpenter, Gelman, Hoffman, Lee, Goodrich, Betancourt and Riddell2017). Convergence was checked according to $\hat{R}$, the potential scale reduction factor measure (Brooks & Gelman, Reference Brooks and Gelman1998; Gelman, Hill, & Yajima, Reference Gelman, Hill and Yajima2012), which approaches 1 for perfect convergence. Values below 1.2 are typically used as a guideline for determining model convergence (Brooks and Gelman Reference Brooks and Gelman1998). We assumed the three models had the same prior probability (0.33). Models were compared via a bridge sampling estimate of the marginal likelihood (Gronau et al., Reference Gronau, Sarafoglou, Matzke, Ly, Boehm, Marsman and Steingroever2017a), using the ‘bridgesampling’ package in R (Gronau, Singmann, & Wagenmakers, Reference Gronau, Singmann and Wagenmakers2017b). Bridge sampling directly estimates the marginal likelihood, and therefore the posterior probability of each model given the data (and prior model probabilities), as well as the assumption that the models represent the entire group of those to be considered. Posterior distributions were interpreted using the 95% highest posterior density interval (HDI), which is the Bayesian ‘credible interval.’ Parameter recovery for this modelling approach has been confirmed in a previous study (Kanen et al., Reference Kanen, Ersche, Fineberg, Robbins and Cardinal2019) and is demonstrated in the online Supplementary material.

The Bayesian hierarchy consisted of ‘drug condition’ at the highest level, and ‘subject’ at the level below. For each parameter, each drug condition (e.g. LSD) had its own mean (with a prior that was the same across conditions, i.e. with priors that were unbiased with respect to LSD v. placebo). This was then merged with the intersubject variability (assumed to be normally distributed; mean 0 by definition, standard deviation determined by a further prior). The priors used for each parameter are shown in Table 1. For instance, the learning rate for a given subject under LSD was taken as: the group mean LSD value for learning rate, plus the subject-specific component of learning rate. The learning rate for a given subject under placebo was taken as: the group mean placebo value for learning rate, plus the subject-specific component of the learning rate for the same subject. This method accounts for the within-subjects structure of the study design. This was done similarly (and separately) for all other model parameters.

Table 1.

Prior distributions for model parameters

rew, reward; pun, punishment; reinf, reinforcement; stim, stimulus.

To determine the change (LSD – placebo) in parameters, we calculated [group mean LSD learning rate] – [group mean placebo learning rate] for each of the ~8000 simulation runs and tested them against zero via the HDI. This approach also removes distributional assumptions and provides an automatic multiple comparisons correction (Gelman et al., Reference Gelman, Hill and Yajima2012; Gelman & Tuerlinckx, Reference Gelman and Tuerlinckx2000; Kruschke, Reference Kruschke2011).

Models

The parameters contained in each model are summarised in Tables 1 and 2. With Model 1, we tested the hypothesis that positive v. negative feedback guides behaviour differentially, and that LSD affects this. We augmented a basic RL model (Rescorla & Wagner, Reference Rescorla, Wagner, Black and Prokasy1972) with separate learning rates for reward, α^rew, and punishment, α^pun. Positive feedback led to an increase in the value V_i of the stimulus i that was chosen, at a speed governed by the reward learning rate, α^rew, via V_i,t ₊₁ ← V_i,t + α^rew(R_t – V_i,t). R_t represents the outcome on trial t (defined as 1 on trials where positive feedback occurred), and (R_t – V_i,t) the prediction error. On trials where negative feedback occurred, R_t = 0, which led to a decrease in value of V_i at a speed governed by the punishment learning rate, α^pun, according to V_i,t ₊₁ ← V_i,t + α^pun(R_t – V_i,t). Stimulus value was incorporated into the final quantity controlling choice according to Q^reinf_t = τ^reinfV_t. The additional parameter τ^reinf, termed reinforcement sensitivity, governs the degree to which behaviour is driven by reinforcement history. The quantities Q associated with the three available choices, for a given trial, were then fed into a standard softmax choice function to compute the probability of each choice:

$$p\,( {\rm actio}{\rm n}_a) = {\rm softma}{\rm x}^a( {Q_1\ldots Q_n} ) = \displaystyle{{e^{{\rm Q}_a}} \over {\mathop \sum \nolimits_{{\rm k\ = \ }1}^n e^{{\rm Q}_k}}}$$

for n = 3 choice options. The probability values for each trial emerging from the softmax function (the probability of choosing stimulus 1) were fitted to the subject's actual choices (did the subject choose stimulus 1?). No further softmax inverse temperature was applied (β = 1; see below), and as a result the reinforcement sensitivity parameter (τ^reinf) directly represented the weight given to the exponents in the softmax function.

Table 2.

Model comparison

rew, reward; pun, punishment; reinf, reinforcement, stim, stimulus; log posterior probabilities are rounded to two decimal places.

Model 2 again augmented a simple RL model, but now also described the tendency to repeat a response, irrespective of the outcome that followed it (in other words, the tendency to ‘stay’ regardless of outcome). With Model 2 we tested the hypothesis that LSD affects this basic perseverative tendency. This was implemented using a ‘stimulus stickiness’ parameter, τ^stim. The stimulus stickiness effect was modelled as Q^stim_t = τ^stims_t _–1, where s_t _–1 was 1 for the stimulus that was chosen on the previous trial and was 0 for the other two stimuli. In this model, we used only a single RL rate, α^reinf. Positive reinforcement led to an increase in the value V_i of the stimulus i that was chosen, at a speed controlled by the learning rate, α^reinf, via V_i,t ₊₁ ← V_i,t + α^reinf(R_t – V_i,t). The final quantity controlling choice incorporated the additional stickiness parameter as Q_t = Q^reinf_t + Q^stim_t. Quantities Q, corresponding to the three choice options on a given trial, were then fed into the softmax function as above. It should be noted that if τ^stim is not in the model (or is zero), then τ^reinf is mathematically identical to the notion of softmax inverse temperature typically implemented as β. The notation τ^reinf is used, however, because it contributes to Q^reinf_t but not to Q^stim_t. A standard implementation of β, by contrast, would govern the effects of both Q^reinf_t and Q^stim_t by weighting the sum of the two (Q_t).

Model 3 was the full model that incorporated separate reward and punishment learning rates as well as the stimulus stickiness parameter. With Model 3, we tested the hypothesis that LSD affects both how positive v. negative feedback guides behaviour differentially, and how LSD affects a basic perseverative tendency. Again, the final quantity controlling choice was determined by Q_t = Q^reinf_t + Q^stim_t.

Results

Learning and perseveration

First, we examined whether LSD altered participants' overall ability to choose the stimulus that led to reward most of the time. Behavioural performance is depicted in Figs 1 and 2. To examine whether LSD affected the number of times each stimulus was chosen, repeated-measures analysis of variance (ANOVA) was conducted with drug (LSD, placebo), phase (acquisition, reversal), and stimulus type (75, 50, or 25% rewarded) as within-subjects factors. This revealed a main effect of stimulus (F _1,23 = 30.66, p = 3 × 10⁻⁶, η_p² = 0.63), a stimulus × phase interaction (F = 28.62, p = 2 × 10⁻⁶, η_p² = 0.61), and no interaction of LSD with stimulus or phase (F < 1.5, p > 0.24, η_p² < 0.08, for terms involving LSD). The number of correct responses did not differ between placebo and LSD during the acquisition (paired-sample t test, t ₁₈ = 0.84, p = 0.4, d = 0.19) or reversal phases (t ₁₈ = 0.23, p = 0.8, d = 0.05).

We then examined the relationship between initial learning and perseveration, following den Ouden et al. (Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013) (Fig. 1b). LSD enhanced the relationship between the number of correct responses during the acquisition phase and the number of perseverative errors made during the subsequent reversal stage [acquisition correct responses (LSD minus placebo) v. reversal perseverative errors (LSD minus placebo): linear regression coefficient β = 0.56, p = 0.002]. Confirming this, making fewer errors during the acquisition phase predicted more perseverative errors when on LSD (β = 0.44, p = 0.003) but not when under placebo (β = 0.04, p = 0.8). Perseverative errors, a subset of all reversal errors, alone did not differ between conditions (t ₁₈ = 0.03, p = 0.98, d = 0.01).

Feedback sensitivity

We next assessed whether LSD influenced individuals' responses on trials immediately after positive v. negative feedback – whether participants stayed with the same choice after a win or a loss (win-stay/lose-stay; Figure 1f). Repeated-measures ANOVA with drug (LSD, placebo) and valence (win, loss) as within-subjects factors revealed a main effect of valence – participants ‘stayed’ more after wins than losses (F _1,18 = 37.76, p = 8.0 × 10^–6, η _p² = 0.68) – and no main effect of LSD (F _1,18 = 0.20, p = 0.66, η_p² = 0.01). There was also no interaction of valence × LSD (F _1,18 = 0.63, p = 0.44, η_p² = 0.03).

Choice of reinforcement learning model

The core modelling results are displayed in Fig. 2. We fitted and compared three RL models. Convergence was good with all three models having $\hat{R}$ < 1.2. Behaviour was best characterised by a RL model with four parameters (Table 2). The four parameters in the winning model were: (1) reward learning rate, which reflects the degree to which the chosen stimulus value is increased following a positive outcome; (2) punishment learning rate, the degree to which the chosen stimulus value is decreased following a negative outcome; (3) reinforcement sensitivity, the degree to which the values learned through reinforcement contribute to final choice; and (4) ‘stimulus stickiness’, which quantifies the tendency to get ‘stuck’ to a stimulus and choose it because it was chosen on the previous trial, irrespective of the outcome. The last two parameters resemble the explore/exploit trade-off: low values of stickiness or reinforcement sensitivity characterise two different types of exploratory behaviour.

Fig. 2.

Effects of LSD relative to placebo on model parameters. Contrasts with the posterior 95% (or greater) HDI of the difference between means excluding zero (0 ∉ 95% HDI) are shown in red. Yellow signifies 0 ∉ 90% HDI. (a) Acquisition and reversal phases (all trials) modelled together. The third row represents a difference of differences scores: (α^rew _LSD – α^pun _LSD) – (α^rew _placebo – α^pun _placebo). (b) Isolating the acquisition phase. (c) Isolating the reversal phase.

Reward and punishment learning rates

First, we modelled all 80 trials in the task (both acquisition and reversal phases) and these results are depicted in Fig. 2a. The reward learning rate was significantly elevated on LSD (mean 0.87) compared to placebo (mean 0.28) [with the posterior 99.9% HDI of the difference between these means excluding zero; 0 ∉ 99.9% HDI]. There was also an increased punishment learning rate under LSD (mean 0.48) relative to placebo (mean 0.39) (drug difference, 0 ∉ 99% HDI; Figure 2a 99% HDIs not shown graphically). LSD increased the reward learning rate to a greater extent than the punishment learning rate [(α^rew,LSD – α^rew,placebo) – (α^pun,LSD – α^pun,placebo) > 0; drug difference, 0 ∉ 99% HDI].

To better understand how LSD affected the dynamics of flexible choice behaviour, we then modelled the acquisition and reversal phases separately (40 trials each). During acquisition (Fig. 2b), the reward learning rate was elevated under LSD (mean 0.72) compared to placebo (mean 0.17) (drug difference, 0 ∉ 99% HDI). The punishment learning rate during acquisition, meanwhile, was not significantly elevated under LSD (mean 0.34) compared to placebo (mean 0.47) (no drug difference, 0 ∈ 90% HDI). LSD increased the reward learning rate more than the punishment learning rate [(α^rew,LSD – α^rew,placebo) – (α^pun,LSD – α^pun,placebo) > 0; drug difference, 0 ∉ 99.9% HDI].

During the reversal phase (Fig. 2c), the reward learning rate was elevated under LSD (mean 0.96) compared to placebo (mean 0.77) (drug difference, 0 ∉ 90% HDI) as was the punishment learning rate (LSD mean 0.42; placebo mean 0.31; drug difference, 0 ∉ 90% HDI). During reversal, there was no difference between the effect of LSD on the reward learning rate v. on the punishment learning rate [(α^rew,LSD – α^rew,placebo) – (α^pun,LSD – α^pun,placebo) drug difference, 0 ∈ 99.9% HDI].

Stimulus stickiness and reinforcement sensitivity

Modelling both acquisition and reversal contiguously, stimulus stickiness was lowered by LSD (mean 0.23) relative to placebo (mean 0.43) (drug difference, 0 ∉ 90% HDI; Figure 2a), which is a manifestation of increased exploratory behaviour. Reinforcement sensitivity was not modulated by LSD (LSD mean 4.70, placebo mean 5.57; no drug difference, 0 ∈ 95% HDI). This is in line with the absence of an effect of LSD on the tendency to ‘stay’ following reward or punishment (see analysis of raw data measures above).

When modelling the acquisition phase alone (Fig. 2b), stimulus stickiness was diminished under LSD (mean 0.09) compared to placebo (mean 0.46) (drug difference, 0 ∉ 90% HDI) as was reinforcement sensitivity (LSD mean 4.92; placebo mean 6.54; drug difference, 0 ∉ 90% HDI). In other words, during acquisition, behaviour under LSD was more exploratory as assessed by two metrics – one value-based (reinforcement sensitivity) and one value-free (stimulus stickiness).

When modelling the reversal phase alone (Fig. 2c), stimulus stickiness remained decreased under LSD (mean 0.36) compared to placebo (mean 0.58) (drug difference, 0 ∉ 90% HDI), as during acquisition. Reinforcement sensitivity, however, which had been decreased under LSD during acquisition, was instead increased under LSD during the reversal phase (LSD mean 3.64; placebo mean 2.47; drug difference, 0 ∉ 90% HDI).

Relationship between model parameters and raw data behavioural measures

Analyses to understand the relationship between computational and raw data measures were conducted. Given the initial finding on the relationship between better acquisition learning and perseveration, the first question addressed was whether the elevated reward learning rate under LSD during acquisition, from the computational model, was predictive of the raw data measure of perseveration from den Ouden et al. (Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013). Simple linear regression showed that under LSD, a higher reward learning rate during acquisition predicted significantly more perseverative errors (β = 26.94, p = 0.02), whereas no such relationship was present when the same participants were under placebo (β = 9.59, p = 0.40). Next, we examined the relationship between the stimulus stickiness parameter from the computational model and the raw data measure of perseveration. Stimulus stickiness during reversal was not significantly correlated with the raw data measure of perseveration, in either the placebo (β = 4.13, p = 0.50) or LSD (β = 11.60, p = 0.09) condition. Further exploratory analyses are reported in the online Supplementary material.

Discussion

There has been a recent surge of interest in the potential therapeutic effects of psychedelics, including LSD. Theorising on the mechanisms of such effects centres on their role in enhancing learning and plasticity. In the current study, we tested these postulated effects of LSD in flexible learning in humans and find that LSD increased learning rates, exploratory behaviour, and the impact of previously learnt values on subsequent perseverative behaviour. Specifically, LSD increased the speed at which value representations were updated following prediction error (the mismatch between expectations and experience). Whilst LSD enhanced the impact of both positive and negative feedback, overall it augmented learning from reward significantly more than it augmented learning from punishment.

The observation that LSD enhanced learning rates may be particularly important for understanding the mechanisms through which LSD might be therapeutically useful. Psychedelic drugs have been hypothesised to destabilise pre-existing beliefs (relax prior beliefs or ‘priors’), making them amenable to revision (Carhart-Harris & Friston, Reference Carhart-Harris and Friston2019). The notion of relaxed priors is directly compatible with increased RL rates: in our study, LSD rendered subjects more sensitive to prediction errors, which naturally implies downweighting of prior beliefs (Carhart-Harris & Friston, Reference Carhart-Harris and Friston2019). That LSD affected a fundamental belief-updating process is notable given that psychedelics are under investigation trans-diagnostically for diverse clinical disorders including depression (Carhart-Harris et al., Reference Carhart-Harris, Bolstridge, Rucker, Day, Erritzoe, Kaelen and Nutt2016a, Reference Carhart-Harris, Bolstridge, Day, Rucker, Watts, Erritzoe and Nutt2018, Reference Carhart-Harris, Giribaldi, Watts, Baker-Jones, Murphy-Beiner, Murphy and Nutt2021; Goldberg et al. Reference Goldberg, Pace, Nicholas, Raison and Hutson2020; Ross et al., Reference Ross, Bossis, Guss, Agin-Liebes, Malone, Cohen and Schmidt2016), anxiety (Goldberg et al. Reference Goldberg, Pace, Nicholas, Raison and Hutson2020; Griffiths et al., Reference Griffiths, Johnson, Carducci, Umbricht, Richards, Richards and Klinedinst2016; Grob et al., Reference Grob, Danforth, Chopra, Hagerty, McKay, Halberstad and Greer2011), alcohol (Bogenschutz et al., Reference Bogenschutz, Forcehimes, Pommy, Wilcox, Barbosa and Strassman2015) and nicotine abuse (Johnson, Garcia-Romeu, Cosimano, & Griffiths, Reference Johnson, Garcia-Romeu, Cosimano and Griffiths2014), obsessive–compulsive disorder (OCD) (Moreno, Wiegand, Taitano, & Delgado, Reference Moreno, Wiegand, Taitano and Delgado2006), and eating disorders (Lafrance et al., Reference Lafrance, Loizaga-Velder, Fletcher, Renelli, Files and Tupper2017). A unifying feature of these conditions is intransigent maladaptive associations in need of revision.

Behaviour was more exploratory overall under LSD, as assessed computationally in two ways, consistent with theoretical accounts of psychedelic effects which have predicted increased exploratory tendencies (Carhart-Harris & Friston, Reference Carhart-Harris and Friston2019). First, LSD decreased stimulus stickiness, which indicates a diminished tendency to repeat previously chosen options, irrespective of reinforcement history (value-free). This effect on stickiness was significant in all phases of the experiment – when considering the entire experiment as a whole (acquisition and reversal), when examining initial learning only (acquisition), and when isolating the reversal phase. In other words, regardless of LSD-induced changes in value-guided choice strategies (elaborated upon below), LSD promoted an overall latent tendency to explore in the form of shifting between choices, irrespective of feedback and value, which was maintained during both stable and changing circumstances. That LSD lowered stimulus stickiness may also be clinically relevant: stimulus stickiness was recently shown to be abnormally high in cocaine and amphetamine use disorders (Kanen et al., Reference Kanen, Ersche, Fineberg, Robbins and Cardinal2019).

LSD also modulated value-based exploratory tendencies (indexed by the reinforcement sensitivity parameter), which, by contrast, differed by phase. When looking at the experiment as a whole, there was no effect of LSD on reinforcement sensitivity, although lack of an effect here was obscured by the following patterns: When examining initial learning only, reinforcement sensitivity was substantially diminished under LSD, indicating a tendency for increased exploration away from the more highly valued choice option. During the reversal phase, meanwhile, reinforcement sensitivity was increased, indicative of a heightened tendency to exploit the choice option that was computed to be more highly valued trial-by-trial, which can be seen as adaptive when circumstances change, and rapid reorienting of actions is required.

A shift in the computations underlying choice was also observed in relation to RL rates, during learning to maximise reward and minimise punishment in an initial situation and when adapting actions following contingency reversal. Whereas overall, LSD enhanced both the reward and punishment rates (especially for rewards), the increase in punishment learning rate appeared during the reversal phase only. The reward learning rate was elevated in both the acquisition and reversal phases. Together, these learning rate findings suggest that LSD accelerates the updating of value, in a way that is (overall) especially reward-driven, and LSD speeds up learning from negative feedback that is encountered when circumstances change.

Under LSD, better initial learning led to more perseverative responding. The implication is that when a behaviour is newly and more strongly learned through positive reinforcement (i.e. the acquisition phase) under LSD, it may persist more strongly even when that action is no longer relevant (i.e. the reversal phase). These measures of overt performance defined based on feedback are orthogonal to an overall latent tendency towards exploration irrespective of reinforcement history (low stimulus stickiness). Importantly, perseveration (den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013) itself, as assessed in the analysis of raw data measures, was not elevated by LSD, nor did it correlate with stimulus stickiness (online Supplementary Table S3).

Given the broad effect of LSD on a range of neurotransmitter systems (Nichols, Reference Nichols2004, Reference Nichols2016), it is not possible to determine the specific neurochemical mechanism underlying the observed LSD effects on learning. Nonetheless, obvious possibilities involve the serotonin and dopamine systems, in particular 5-HT_2A and D₂ receptors (Marona-Lewicka et al., Reference Marona-Lewicka, Thisted and Nichols2005; Marona-Lewicka & Nichols, Reference Marona-Lewicka and Nichols2007; Nichols, Reference Nichols2004, Reference Nichols2016). Specifically, the psychological plasticity purportedly promoted by psychedelics is believed to be mediated through action at 5-HT_2A receptors (Carhart-Harris & Nutt, Reference Carhart-Harris and Nutt2017) via downstream enhancement of glutamatergic activity (Barre et al., Reference Barre, Berthoux, De Bundel, Valjent, Bockaert, Marin and Bécamel2016) and brain-derived neurotrophic factor (BDNF) expression (Hutten et al., Reference Hutten, Mason, Dolder, Theunissen, Holze, Liechti and Kuypers2021; Vaidya et al., Reference Vaidya, Marek, Aghajanian and Duman1997). The hypothesis that the present results regarding RL rates are driven by the serotonergic effects of LSD is supported by two recent studies in mice. Optogenetically stimulating dorsal raphé serotonin neurons enhanced RL rates (Iigaya, Fonseca, Murakami, Mainen, & Dayan, Reference Iigaya, Fonseca, Murakami, Mainen and Dayan2018), whilst activation of these neurons tracked both reward and punishment prediction errors during reversal learning (Matias et al., Reference Matias, Lottem, Dugué and Mainen2017). Neurotoxic manipulation of serotonin in marmoset monkeys during PRL, meanwhile, altered stimulus stickiness (Rygula et al., Reference Rygula, Clarke, Cardinal, Cockcroft, Xia, Dalley and Roberts2015): this implicates a serotonergic mechanism underlying increased exploratory behaviour following LSD administration in the present study.

In addition to affecting the serotonin system, however, LSD also acts at dopamine receptors (Nichols, Reference Nichols2004, Reference Nichols2016), albeit with a far lower direct affinity for dopamine receptors than for 5-HT receptors. Dopamine has long been known to play a crucial role in belief updating following reward (Schultz et al., Reference Schultz, Dayan and Montague1997), and more recent evidence shows that dopaminergic manipulations may alter learning rates (Kanen et al., Reference Kanen, Ersche, Fineberg, Robbins and Cardinal2019; Schultz, Reference Schultz2019; Swart et al., Reference Swart, Froböse, Cook, Geurts, Frank, Cools and den Ouden2017). A dopaminergic effect would be in line with our previous study where genetic variation in the dopamine, but not serotonin transporter polymorphism, was associated with the same enhanced relationship between acquisition and perseveration as reported here under LSD (den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013).

Serotonin–dopamine interactions represent another candidate mechanism that could underlie the present findings. For example, stimulation of 5-HT_2A receptors in the prefrontal cortex of the rat enhanced ventral tegmental area dopaminergic activity (Bortolozzi, Díaz-Mataix, Scorza, Celada, & Artigas, Reference Bortolozzi, Díaz-Mataix, Scorza, Celada and Artigas2005). Indeed, the initial action of LSD at 5-HT_2A receptors has been proposed to sensitise dopamine neuron firing (Nichols, Reference Nichols2016). LSD action at D₂ receptors, albeit with a low binding affinity, may be more pronounced in a late phase of LSD's effects (Marona-Lewicka et al., Reference Marona-Lewicka, Thisted and Nichols2005; Marona-Lewicka & Nichols, Reference Marona-Lewicka and Nichols2007), which may be relevant given the relatively long delay between LSD administration and performance of the current task (see Methods). However, arguing against a late dopaminergic effect is a previous study in rodents where the effects of LSD on reversal learning were consistent across four different time lags between drug administration and behavioural testing (King, Martin, & Melville, Reference King, Martin and Melville1974).

The result of the enhanced coupling of acquisition learning and perseverative responding under LSD is in line with a recent study showing that LSD induced higher-order cognitive inflexibility in a set-shifting paradigm (Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019). Importantly, these effects were blocked by co-administration of the 5-HT_2A antagonist ketanserin (Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019), showing that the LSD-induced impairments were mediated by 5-HT_2A agonism, consistent with a 5-HT_2A mechanism underlying the present results.

LSD's effects to increase acquisition-perseveration coupling and worsen set-shifting (Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019), in conjunction, suggest that what is newly or recently learnt through reinforcement under LSD is more ‘stamped in’, and thus may subsequently be harder to update. Whilst these findings are ostensibly at odds with the observation that LSD enhanced plasticity (through enhanced learning rates), they can be reconciled by considering the timing of drug administration with respect to initial learning and tests of cognitive flexibility. In both the present experiment and the previous set-shifting study (Pokorny et al., Reference Pokorny, Duerler, Seifritz, Vollenweider and Preller2019), all phases of learning (acquisition and reversal) were conducted after LSD administration. In contrast, when acquisition learning was conducted prior to LSD administration, LSD resulted in improved reversal learning (using a reversal paradigm in rats; King et al., Reference King, Martin and Melville1974). Likewise, when acquisition learning was conducted prior to the administration of a 5-HT_2A antagonist, reversal learning was impaired (Boulougouris et al., Reference Boulougouris, Glennon and Robbins2008; also see Furr et al., Reference Furr, Danet Lapiz-Bluhm and Morilak2012). Collectively, these findings suggest that whether a prior belief is down- or up-weighted under LSD may depend on whether the prior is formed before or during drug administration, respectively. This observation is of great relevance for a putative therapeutic setting, where maladaptive beliefs will have been formed before treatment.

Another important consideration for reconciling the effects of 5-HT_2A receptor modulation on behavioural/cognitive flexibility is that 5-HT_2A antagonism can produce opposite effects depending on whether the OFC or striatum is targeted (Amodeo et al., Reference Amodeo, Rivera, Cook, Sweeney and Ragozzino2017), complicating the interpretation of studies employing systemic administration (Amodeo et al., Reference Amodeo, Jones, Sweeney and Ragozzino2014, Reference Amodeo, Hassan, Klein, Halberstadt and Powell2020; Baker et al., Reference Baker, Thompson, Sweeney and Ragozzino2011; Odland et al., Reference Odland, Kristensen and Andreasen2021). Species, strain, dose, compound, route of administration, task specifications (and engagement of cortical and subcortical structures), and reinforcement schedule must also be considered. The application of computational modelling may also help unify effects across studies and species.

While we observed an effect of LSD on acquisition-perseveration coupling, reminiscent of a previous similar observation as a function of genetic variability in the dopamine transporter (den Ouden et al., Reference den Ouden, Daw, Fernandez, Elshout, Rijpkema, Hoogman and Cools2013), we did not observe effects of LSD on acquisition performance or perseveration directly, or on lose-stay and win-stay behaviour, unexpectedly. In fact, more broadly, the effects of LSD observed here differ from the effects of neurochemically more specific influences such as acute serotonin reuptake inhibition (Bari et al., Reference Bari, Theobald, Caprioli, Mar, Aidoo-Micah, Dalley and Robbins2010; Skandali et al., Reference Skandali, Rowe, Voon, Deakin, Cardinal, Cormack and Sahakian2018), or neurotoxic serotonin depletion (Bari et al., Reference Bari, Theobald, Caprioli, Mar, Aidoo-Micah, Dalley and Robbins2010; Rygula et al., Reference Rygula, Clarke, Cardinal, Cockcroft, Xia, Dalley and Roberts2015). More in line with this, previous studies with LSD administration, examining perseveration, using an outcome devaluation paradigm, found no effect of LSD (Hutten et al., Reference Hutten, Mason, Dolder, Theunissen, Holze, Liechti and Kuypers2020), nor did a study on visual memory during paired associates learning (Family et al., Reference Family, Maillet, Williams, Krediet, Carhart-Harris, Williams and Raz2020).

Our computational modelling approach, here, was more sensitive to detecting the effects of LSD. It may be possible to reconcile these robust computational effects with the minimal overt behavioural performance effects via the following speculation. Subtle differences in states of underlying plasticity may not translate to overt differences in instrumental or Pavlovian responses, even if the long-term expression of these learned responses would differ. For example, in the memory reconsolidation literature, a previously learned associative memory is believed to become susceptible to disruption (e.g. pharmacologically or behaviourally) following cued reactivation or recall for a period of several hours known as the ‘reconsolidation window’ (Lee, Nader, & Schiller, Reference Lee, Nader and Schiller2017). There is evidence that conducting extinction training (learning) during the reconsolidation window – when mechanisms of plasticity differ – does not alter the overt success or failure of extinction within the session, yet there are long-term effects; extinction learning during the reconsolidation window can be more enduring than extinction learned outside of this window (Schiller, Kanen, LeDoux, Monfils, & Phelps, Reference Schiller, Kanen, LeDoux, Monfils and Phelps2013; Steinfurth et al., Reference Steinfurth, Kanen, Raio, Clem, Huganir and Phelps2014). These Pavlovian extinction learning data, showing no difference during extinction itself, may parallel the instrumental conditioning data in the present study, in that we report no observable effect of LSD on most raw data measures (e.g. number of correct responses), yet latent learning processes that relate to purported mechanisms of plasticity, namely learning rate, were affected. Future studies would need to determine whether and how to harness this apparent window of heightened plasticity for therapeutic benefit.

Limitations of this study include the following. We have made a case for the critical involvement of the 5-HT_2A receptor; however, we cannot be sure which particular receptor interaction(s) the current findings are caused by. LSD, in addition to binding with high affinity to 5-HT_2A receptors, acts at numerous other receptors including D₁, D₂, 5-HT_1A/1B/1D, 5-HT_2C, 5-HT_5A, 5-HT₆, and 5-HT₇ (Nichols, Reference Nichols2004). Indeed, 5-HT_2C receptors can counter 5-HT_2A effects on reversal learning (Boulougouris et al., Reference Boulougouris, Glennon and Robbins2008). A future study co-administering LSD with a 5-HT_2A antagonist would help discern the putative 5-HT_2A-mediated effects. Additionally, the subjective effects and plasma levels of LSD were not measured at the time of task administration. Furthermore, even though our parameter recovery analysis was successful (see online Supplementary material), we were unable to demonstrate the initial learning-perseveration effect observed in the behavioural data in the simulated data.

In summary, the core result of this study was that LSD enhanced the rate at which humans updated their beliefs based on feedback. RL was most enhanced by LSD when receiving the reward, and to a lesser extent following punishment. LSD also increased exploratory behaviour. These findings have implications for understanding the mechanisms through which LSD might be therapeutically useful for revising deleterious associations.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722002963

Financial support

D.J.N. has advisory roles for the following companies working in the psychedelic space: Awaknlifesciences, Neural therapeutics, Alvarius, Psyched Wellness and COMPASS Pathways. T.W.R. discloses consultancy with Cambridge Cognition, Greenfields Bioventures and Unilever; he receives research grants from Shionogi & Co and GlaxoSmithKline and royalties for CANTAB from Cambridge Cognition and editorial honoraria from Springer Verlag and Elsevier. R.N.C. consults for Campden Instruments and receives royalties from Cambridge Enterprise, Routledge, and Cambridge University Press. H.E.M.d.O has consulted on task design and data analysis for Eleusis Benefit Corp but does not own stocks or shares. J.W.K., R.L.C-H, Q.L., and M.R.K. declare no conflicts of interest. This study was funded by the Walacea.com crowdfunding campaign and the Beckley Foundation, awarded to R.L.C-H. J.W.K. was supported by a Gates Cambridge Scholarship and an Angharad Dodds John Bursary in Mental Health and Neuropsychiatry, T.W.R. by a Wellcome Trust Senior Investigator Grant 104631/Z/14/Z, and H.E.M.d.O. by the Netherlands Organisation for Scientific Research, NWO. R.N.C.'s research is funded by the UK Medical Research Council (MC_PC_17213, MR/W014386/1). Q.L. was partially supported by grants from the National Key Research and Development Program of China (No. 2019YFA0709502), the National Natural Science Foundation of China (No. 81873909), the Science and Technology Commission of Shanghai Municipality (No.s 20ZR1404900 and 20DZ2260300), the Shanghai Municipal Science and Technology Major Project (No.s 2018SHZDZX01 and 2021SHZDZX0103), and the Fundamental Research Funds for the Central Universities. During the preparation of this manuscript, Q.L. was a Visiting Fellow at Clare Hall, University of Cambridge, Cambridge, UK. This research was supported in part by the UK National Health Service (NHS) National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre (BRC-1215-20014); the views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care.

Footnotes

Joint first authorship.

References

Amodeo, D. A., Hassan, O., Klein, L., Halberstadt, A. L., & Powell, S. B. (2020). Acute serotonin 2A receptor activation impairs behavioral flexibility in mice. Behavioural Brain Research, 395(April), 1–5. https://doi.org/10.1016/j.bbr.2020.112861.CrossRef Google Scholar PubMed

Amodeo, D. A., Jones, J. H., Sweeney, J. A., & Ragozzino, M. E. (2014). Risperidone and the 5-HT2A receptor antagonist M100907 improve probabilistic reversal learning in BTBR T + tf/J mice. Autism Research, 7(5), 555–567. https://doi.org/10.1002/aur.1395.CrossRef Google Scholar PubMed

Amodeo, D. A., Rivera, E., Cook, E. H., Sweeney, J. A., & Ragozzino, M. E. (2017). 5HT2A Receptor blockade in dorsomedial striatum reduces repetitive behaviors in BTBR mice. Genes, Brain and Behavior, 16(3), 342–351. https://doi.org/10.1111/gbb.12343.CrossRef Google Scholar PubMed

Baker, P. M., Thompson, J. L., Sweeney, J. A., & Ragozzino, M. E. (2011). Differential effects of 5-HT2A and 5-HT2C receptor blockade on strategy-switching. Behavioural Brain Research, 219(1), 123–131. https://doi.org/10.1016/j.bbr.2010.12.031.CrossRef Google Scholar PubMed

Bari, A., Theobald, D. E., Caprioli, D., Mar, A. C., Aidoo-Micah, A., Dalley, J. W., & Robbins, T. W. (2010). Serotonin modulates sensitivity to reward and negative feedback in a probabilistic reversal learning task in rats. Neuropsychopharmacology, 35(6), 1290–1301. https://doi.org/10.1038/npp.2009.233.CrossRef Google Scholar

Barlow, R. L., Alsiö, J., Jupp, B., Rabinovich, R., Shrestha, S., Roberts, A. C., … Dalley, J. W. (2015). Markers of serotonergic function in the orbitofrontal cortex and dorsal raphé nucleus predict individual variation in spatial-discrimination serial reversal learning. Neuropsychopharmacology, 40(7), 1619–1630. https://doi.org/10.1038/npp.2014.335.CrossRef Google Scholar PubMed

Barre, A., Berthoux, C., De Bundel, D., Valjent, E., Bockaert, J., Marin, P., & Bécamel, C. (2016). Presynaptic serotonin 2A receptors modulate thalamocortical plasticity and associative learning. Proceedings of the National Academy of Sciences of the United States of America, 113(10), E1382–E1391. https://doi.org/10.1073/pnas.1525586113.Google Scholar PubMed

Barrett, F. S., Carbonaro, T. M., Hurwitz, E., Johnson, M. W., & Griffiths, R. R. (2018). Double-blind comparison of the two hallucinogens psilocybin and dextromethorphan: Effects on cognition. Psychopharmacology, 235, 2915–2927.CrossRef Google Scholar PubMed

Bogenschutz, M. P., Forcehimes, A. A., Pommy, J. A., Wilcox, C. E., Barbosa, P., & Strassman, R. J. (2015). Psilocybin-assisted treatment for alcohol dependence: A proof-of-concept study. Journal of Psychopharmacology, 29(3), 289–299. https://doi.org/10.1177/0269881114565144.CrossRef Google Scholar PubMed

Bortolozzi, A., Díaz-Mataix, L., Scorza, M. C., Celada, P., & Artigas, F. (2005). The activation of 5-HT2A receptors in prefrontal cortex enhances dopaminergic activity. Journal of Neurochemistry, 95(6), 1597–1607. https://doi.org/10.1111/j.1471-4159.2005.03485.x.CrossRef Google Scholar PubMed

Boulougouris, V., Castañé, A., & Robbins, T. W. (2009). Dopamine D2/D3 receptor agonist quinpirole impairs spatial reversal learning in rats: Investigation of D3 receptor involvement in persistent behavior. Psychopharmacology, 202, 611–620. https://doi.org/10.1007/s00213-008-1341-2.CrossRef Google Scholar PubMed

Boulougouris, V., Glennon, J. C., & Robbins, T. W. (2008). Dissociable effects of selective 5-HT2A and 5-HT2C receptor antagonists on serial spatial reversal learning in rats. Neuropsychopharmacology, 33(8), 2007–2019. https://doi.org/10.1038/sj.npp.1301584.CrossRef Google Scholar PubMed

Brigman, J. L., Mathur, P., Harvey-White, J., Izquierdo, A., Saksida, L. M., Bussey, T. J., … Holmes, A. (2010). Pharmacological or genetic inactivation of the serotonin transporter improves reversal learning in mice. Cerebral Cortex, 20(8), 1955–1963. https://doi.org/10.1093/cercor/bhp266.CrossRef Google Scholar PubMed

Brooks, S. P., & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7(4), 434–455. https://doi.org/10.1080/10618600.1998.10474787.Google Scholar

Carhart-Harris, R., Giribaldi, B., Watts, R., Baker-Jones, M., Murphy-Beiner, A., Murphy, R., … Nutt, D. J. (2021). Trial of psilocybin versus escitalopram for depression. New England Journal of Medicine, 384(15), 1402–1411. https://doi.org/10.1056/nejmoa2032994.CrossRef Google Scholar PubMed

Carhart-Harris, R. L., Bolstridge, M., Day, C. M. J., Rucker, J., Watts, R., Erritzoe, D. E., … Nutt, D. J. (2018). Psilocybin with psychological support for treatment-resistant depression: Six-month follow-up. Psychopharmacology, 235, 399–408.CrossRef Google Scholar PubMed

Carhart-Harris, R. L., Bolstridge, M., Rucker, J., Day, C. M. J., Erritzoe, D., Kaelen, M., … Nutt, D. J. (2016a). Psilocybin with psychological support for treatment-resistant depression: An open-label feasibility study. The Lancet Psychiatry, 3(7), 619–627. https://doi.org/10.1016/S2215-0366(16)30065-7.CrossRef Google Scholar PubMed

Carhart-Harris, R. L., & Friston, K. J. (2019). REBUS and the anarchic brain: Toward a unified model of the brain action of psychedelics. Pharmacological Reviews, 71(3), 316–344. https://doi.org/10.1124/pr.118.017160.CrossRef Google Scholar

Carhart-Harris, R. L., Muthukumaraswamy, S., Roseman, L., Kaelen, M., Droog, W., Murphy, K., … Nutt, D. J. (2016b). Neural correlates of the LSD experience revealed by multimodal neuroimaging. Proceedings of the National Academy of Sciences of the United States of America, 113(17), 4853–4858. https://doi.org/10.1073/pnas.1518377113.CrossRef Google Scholar PubMed

Carhart-Harris, R. L., & Nutt, D. J. (2017). Serotonin and brain function: A tale of two receptors. Journal of Psychopharmacology, 31(9), 1091–1120. https://doi.org/10.1177/0269881117725915.CrossRef Google Scholar PubMed

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., … Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01.CrossRef Google Scholar

Chamberlain, S. R., Müller, U., Blackwell, A. D., Clark, L., Robbins, T. W., & Sahakian, B. J. (2006). Neurochemical modulation of response inhibition and probabilistic learning in humans. Science (New York, N.Y.), 311(5762), 861–863. https://doi.org/10.1126/science.1121218.CrossRef Google Scholar PubMed

Christakou, A., Gershman, S., Niv, Y., Simmons, A., Brammer, M., & Rubia, K. (2013). Neural and psychological maturation of decision-making in adolescence and young adulthood. Journal of Cognitive Neuroscience, 25(11), 1807–1823. https://doi.org/10.1162/jocn_a_00447.CrossRef Google Scholar PubMed

Clarke, H. F., Dalley, J. W., Crofts, H. S., Robbins, T. W., & Roberts, A. C. (2004). Cognitive inflexibility after prefrontal serotonin depletion. Science (New York, N.Y.), 304(5672), 878–880. https://doi.org/https://doi.org/10.1126/science.1094987.CrossRef Google Scholar PubMed

Clarke, H. F., Hill, G. J., Robbins, T. W., & Roberts, A. C. (2011). Dopamine, but not serotonin, regulates reversal learning in the marmoset caudate nucleus. Journal of Neuroscience, 31(11), 4290–4297. https://doi.org/10.1523/JNEUROSCI.5066-10.2011.CrossRef Google Scholar

Cools, R., Roberts, A. C., & Robbins, T. W. (2008). Serotoninergic regulation of emotional and behavioural control processes. Trends in Cognitive Sciences, 12(1), 31–40. https://doi.org/10.1016/j.tics.2007.10.011.CrossRef Google Scholar PubMed

Crockett, M. J., Clark, L., & Robbins, T. W. (2009). Reconciling the role of serotonin in behavioral inhibition and aversion: Acute tryptophan depletion abolishes punishment-induced inhibition in humans. The Journal of Neuroscience, 29(38), 11993–11999. https://doi.org/10.1523/JNEUROSCI.2513-09.2009.CrossRef Google Scholar PubMed

Daw, N. D. (2011). Trial-by-trial data analysis using computational models. In Delgado, M. R., Phelps, E. A., & Robbins, T. W. (Eds.), Decision making, affect, and learning: Attention and performance XXIII (pp. 1–26). Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199600434.003.0001.Google Scholar

Dayan, P., & Huys, Q. J. M. (2009). Serotonin in affective control. Annual Review of Neuroscience, 32(1), 95–126. https://doi.org/10.1146/annurev.neuro.051508.135607.CrossRef Google Scholar PubMed

Deakin, J. F. W. (2013). The origins of “5-HT and mechanisms of defence” by Deakin and Graeff: A personal perspective. Journal of Psychopharmacology, 27(12), 1084–1089. https://doi.org/10.1177/0269881113503508.CrossRef Google Scholar

den Ouden, H., Daw, N. D., Fernandez, G., Elshout, J. A., Rijpkema, M., Hoogman, M., … Cools, R. (2013). Dissociable effects of dopamine and serotonin on reversal learning. Neuron, 80(4), 1090–1100. https://doi.org/10.1016/j.neuron.2013.08.030.CrossRef Google Scholar PubMed

Dolder, P. C., Schmid, Y., Müller, F., Borgwardt, S., & Liechti, M. E. (2016). LSD Acutely impairs fear recognition and enhances emotional empathy and sociality. Neuropsychopharmacology, 41(11), 2638–2646. https://doi.org/10.1038/npp.2016.82.CrossRef Google Scholar PubMed

Doss, M. K., Smith, G. S., Pova, M., Rosenberg, M. D., Sepeda, N. D., Davis, A. K., … Barrett, F. S. (2021). Psilocybin therapy increases cognitive and neural flexibility in patients with major depressive disorder. Translational Psychiatry, 11(June), 574.CrossRef Google Scholar PubMed

Doss, M. K., Weafer, J., Gallo, D. A., & De Wit, H. (2018). MDMA impairs both the encoding and retrieval of emotional recollections. Neuropsychopharmacology, 43(4), 791–800. https://doi.org/10.1038/npp.2017.171.CrossRef Google Scholar PubMed

Duerler, P., Schilbach, L., Stämpfli, P., Vollenweider, F. X., & Preller, K. H. (2020). LSD-induced increases in social adaptation to opinions similar to one's own are associated with stimulation of serotonin receptors. Scientific Reports, 10(1), 1–11. https://doi.org/10.1038/s41598-020-68899-y.CrossRef Google Scholar PubMed

Family, N., Maillet, E. L., Williams, L. T. J., Krediet, E., Carhart-Harris, R. L., Williams, T. M., … Raz, S. (2020). Safety, tolerability, pharmacokinetics, and pharmacodynamics of low dose lysergic acid diethylamide (LSD) in healthy older volunteers. Psychopharmacology, 237(3), 841–853. https://doi.org/10.1007/s00213-019-05417-7.CrossRef Google Scholar PubMed

Family, N., Vinson, D., Vigliocco, G., Kaelen, M., Bolstridge, M., Nutt, D. J., & Carhart-Harris, R. L. (2016). Semantic activation in LSD: Evidence from picture naming. Language, Cognition and Neuroscience, 31(10), 1320–1327. https://doi.org/10.1080/23273798.2016.1217030.CrossRef Google Scholar

Furr, A., Danet Lapiz-Bluhm, M., & Morilak, D. A. (2012). 5-HT2A Receptors in the orbitofrontal cortex facilitate reversal learning and contribute to the beneficial cognitive effects of chronic citalopram treatment in rats. International Journal of Neuropsychopharmacology, 15(9), 1295–1305. https://doi.org/10.1017/S1461145711001441.CrossRef Google Scholar

Gelman, A., Hill, J., & Yajima, M. (2012). Why we (usually) don't have to worry about multiple comparisons. Journal of Research on Educational Effectiveness, 5, 189–211. https://doi.org/https://doi.org/10.1080/19345747.2011.618213.CrossRef Google Scholar

Gelman, A., & Tuerlinckx, F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures 1 Introduction. Computational Statistics, 15, 373–390. Retrieved from https://doi.org/10.1007/s001800000040.CrossRef Google Scholar

Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6. https://doi.org/10.1016/j.jmp.2016.01.006.CrossRef Google Scholar

Geurts, D. E. M., Huys, Q. J. M., den Ouden, H. E. M., & Cools, R. (2013). Serotonin and aversive Pavlovian control of instrumental behavior in humans. Journal of Neuroscience, 33(48), 18932–18939. https://doi.org/10.1523/JNEUROSCI.2749-13.2013.CrossRef Google Scholar PubMed

Goldberg, S. B., Pace, B. T., Nicholas, C. R., Raison, C. L., & Hutson, P. R. (2020). The experimental effects of psilocybin on symptoms of anxiety and depression: A meta-analysis. Psychiatry Research, 284(April 2019), 1–4. https://doi.org/10.1016/j.psychres.2020.112749.CrossRef Google Scholar PubMed

Griffiths, R. R., Johnson, M. W., Carducci, M. A., Umbricht, A., Richards, W. A., Richards, B. D., … Klinedinst, M. A. (2016). Psilocybin produces substantial and sustained decreases in depression and anxiety in patients with life-threatening cancer: A randomized double-blind trial. Journal of Psychopharmacology, 30(12), 1181–1197. https://doi.org/10.1177/0269881116675513.CrossRef Google Scholar

Grob, C. S., Danforth, A. L., Chopra, G. S., Hagerty, M., McKay, C. R., Halberstad, A. L., & Greer, G. R. (2011). Pilot study of psilocybin treatment for anxiety in patients with advanced-stage cancer. Archives of General Psychiatry, 68(1), 71–78. https://doi.org/10.1001/archgenpsychiatry.2010.116.CrossRef Google Scholar PubMed

Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., … Steingroever, H. (2017a). A tutorial on bridge sampling. Journal of Mathematical Psychology, 81, 80–97. https://doi.org/10.1016/j.jmp.2017.09.005.CrossRef Google Scholar PubMed

Gronau, Q. F., Singmann, H., & Wagenmakers, E.-J. (2017b). bridgesampling: An R Package for Estimating Normalizing Constants. ArXiv, 1710.08162. Retrieved from http://arxiv.org/abs/1710.08162.CrossRef Google Scholar

Harvey, J. A. (2003). Role of the serotonin 5-HT2A receptor in learning. Learning and Memory, 10, 355–362. https://doi.org/10.1101/lm.60803.CrossRef Google Scholar PubMed

Harvey, J. A., Gormezano, I., Cool-Hauser, V. A., & Schindler, C. W. (1988). Effects of LSD on classical conditioning as a function of CS-UCS interval: Relationship to reflex facilitation. Pharmacology, Biochemistry and Behavior, 30(2), 433–441. https://doi.org/10.1016/0091-3057(88)90477-7.CrossRef Google Scholar PubMed

Hutten, N. R. P. W., Mason, N. L., Dolder, P. C., Theunissen, E. L., Holze, F., Liechti, M. E., … Kuypers, K. P. C. (2020). Mood and cognition after administration of low LSD doses in healthy volunteers: A placebo-controlled dose-effect finding study. European Neuropsychopharmacology, 41, 81–91. https://doi.org/10.1016/j.euroneuro.2020.10.002.CrossRef Google Scholar PubMed

Hutten, N. R. P. W., Mason, N. L., Dolder, P. C., Theunissen, E. L., Holze, F., Liechti, M. E., … Kuypers, K. P. C. (2021). Low doses of LSD acutely increase BDNF blood plasma levels in healthy volunteers. ACS Pharmacology & Translational Science, 4(2), 461–466. https://doi.org/10.1021/acsptsci.0c00099.CrossRef Google Scholar PubMed

Iigaya, K., Fonseca, M. S., Murakami, M., Mainen, Z. F., & Dayan, P. (2018). An effect of serotonergic stimulation on learning rates for rewards apparent after long intertrial intervals. Nature Communications, 9(1), 10–12. https://doi.org/10.1038/s41467-018-04840-2.CrossRef Google Scholar PubMed

Johnson, M. W., Garcia-Romeu, A., Cosimano, M. P., & Griffiths, R. R. (2014). Pilot study of the 5-HT2AR agonist psilocybin in the treatment of tobacco addiction. Journal of Psychopharmacology, 28(11), 983–992. https://doi.org/10.1177/0269881114548296.CrossRef Google Scholar PubMed

Kanen, J. W., Ersche, K. D., Fineberg, N. A., Robbins, T. W., & Cardinal, R. N. (2019). Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant use disorder and obsessive-compulsive disorder: Remediating effects of dopaminergic D2/3 receptor agents. Psychopharmacology, 236(8), 2337–2358. https://doi.org/10.1007/s00213-019-05325-w.CrossRef Google Scholar PubMed

King, A. R., Martin, I. L., & Melville, K. A. (1974). Reversal learning enhanced by lysergic acid diethylamide (LSD): Concomitant rise in brain 5-hydroxytryptamine levels. British Journal of Pharmacology, 52(3), 419–426. https://doi.org/10.1111/j.1476-5381.1974.tb08611.x.CrossRef Google Scholar PubMed

Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psychological Science, 6, 299–312. https://doi.org/10.1177/1745691611406925.CrossRef Google Scholar PubMed

Kuypers, K. P. C., Riba, J., de la Fuente Revenga, M., Barker, S., Theunissen, E. L., & Ramaekers, J. G. (2016). Ayahuasca enhances creative divergent thinking while decreasing conventional convergent thinking. Psychopharmacology, 233(18), 3395–3403. https://doi.org/10.1007/s00213-016-4377-8.CrossRef Google Scholar PubMed

Lafrance, A., Loizaga-Velder, A., Fletcher, J., Renelli, M., Files, N., & Tupper, K. W. (2017). Nourishing the spirit: Exploratory research on ayahuasca experiences along the continuum of recovery from eating disorders. Journal of Psychoactive Drugs, 49(5), 427–435. https://doi.org/10.1080/02791072.2017.1361559.CrossRef Google Scholar PubMed

Lapiz-Bluhm, M. D. S., Soto-Piña, A. E., Hensler, J. G., & Morilak, D. A. (2009). Chronic intermittent cold stress and serotonin depletion induce deficits of reversal learning in an attentional set-shifting test in rats. Psychopharmacology, 202(1–3), 329–341. https://doi.org/10.1007/s00213-008-1224-6.CrossRef Google Scholar

Lawrence, A. D., Sahakian, B. J., Rogers, R. D., Hodges, J. R., & Robbins, T. W. (1999). Discrimination, reversal, and shift learning in Huntington's disease: Mechanisms of impaired response selection. Neuropsychologia, 37(12), 1359–1374. https://doi.org/10.1016/S0028-3932(99)00035-4.CrossRef Google Scholar PubMed

Lee, B., Groman, S., London, E. D., & Jentsch, J. D. (2007). Dopamine D2/D3 receptors play a specific role in the reversal of a learned visual discrimination in monkeys. Neuropsychopharmacology, 32(10), 2125–2134. https://doi.org/10.1038/sj.npp.1301337.CrossRef Google Scholar PubMed

Lee, J. L. C., Nader, K., & Schiller, D. (2017). An update on memory reconsolidation updating. Trends in Cognitive Sciences, 21(7), 531–545. https://doi.org/10.1016/j.tics.2017.04.006.CrossRef Google Scholar PubMed

Marona-Lewicka, D., & Nichols, D. E. (2007). Further evidence that the delayed temporal dopaminergic effects of LSD are mediated by a mechanism different than the first temporal phase of action. Pharmacology Biochemistry and Behavior, 87(4), 453–461. https://doi.org/10.1016/j.pbb.2007.06.001.CrossRef Google Scholar

Marona-Lewicka, D., Thisted, R. A., & Nichols, D. E. (2005). Distinct temporal phases in the behavioral pharmacology of LSD: Dopamine D2 receptor-mediated effects in the rat and implications for psychosis. Psychopharmacology, 180(3), 427–435. https://doi.org/10.1007/s00213-005-2183-9.CrossRef Google Scholar PubMed

Mason, N. L., Mischler, E., Uthaug, M. V., & Kuypers, K. P. C. (2019). Sub-acute effects of psilocybin on empathy, creative thinking, and subjective well-being. Journal of Psychoactive Drugs, 51(2), 123–134. https://doi.org/10.1080/02791072.2019.1580804.CrossRef Google Scholar PubMed

Matias, S., Lottem, E., Dugué, G. P., & Mainen, Z. F. (2017). Activity patterns of serotonin neurons underlying cognitive flexibility. ELife, 6, 1–24. https://doi.org/10.7554/eLife.20552.CrossRef Google Scholar PubMed

Moreno, F. A., Wiegand, C. B., Taitano, E. K., & Delgado, P. L. (2006). Safety, tolerability, and efficacy of psilocybin in 9 patients with obsessive-compulsive disorder. Journal of Clinical Psychiatry, 67(11), 1735–1740. https://doi.org/10.4088/JCP.v67n1110.CrossRef Google Scholar PubMed

Nichols, D. E. (2004). Hallucinogens. Pharmacology Therapeutics, 101, 131–181. https://doi.org/10.1016/j.pharmthera.2003.11.002.CrossRef Google Scholar PubMed

Nichols, D. E. (2016). Psychedelics. Pharmacological Reviews, 68, 264–355.CrossRef Google Scholar PubMed

Nord, M., Finnema, S. J., Halldin, C., & Farde, L. (2013). Effect of a single dose of escitalopram on serotonin concentration in the non-human and human primate brain. International Journal of Neuropsychopharmacology, 16(7), 1577–1586. https://doi.org/10.1017/S1461145712001617.CrossRef Google Scholar PubMed

Nutt, D. J., & Carhart-Harris, R. L. (2020). The current status of psychedelics in psychiatry. JAMA Psychiatry, 78(2), 121–122.CrossRef Google Scholar

Odland, A., Kristensen, J., & Andreasen, J. (2021). The selective 5-HT2A receptor agonist 25CN-NBOH does not affect reversal learning in mice. Behavioural Pharmacology, 32(5), 448–452.CrossRef Google Scholar

Pokorny, T., Duerler, P., Seifritz, E., Vollenweider, F. X., & Preller, K. H. (2019). LSD Acutely impairs working memory, executive functions, and cognitive flexibility, but not risk-based decision-making. Psychological Medicine, 50(13), 2255–2264. https://doi.org/10.1017/s0033291719002393.CrossRef Google Scholar

Rescorla, R. A., & Wagner, A. R. (1972). A theory of classical conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Black, A. H. & Prokasy, W. F. (Eds.), Classical conditioning II current research and theory (Vol. 21, pp. 64–99). New York: Appleton-Century-Crofts.Google Scholar

Romano, A. G., Quinn, J. L., Li, L., Dave, K. D., Schindler, E. A., Aloyo, V. J., & Harvey, J. A. (2010). Intrahippocampal LSD accelerates learning and desensitizes the 5-HT2A receptor in the rabbit. Psychopharmacology, 212(3), 441–448. https://doi.org/10.1007/s00213-010-2004-7.CrossRef Google Scholar PubMed

Ross, S., Bossis, A., Guss, J., Agin-Liebes, G., Malone, T., Cohen, B., … Schmidt, B. L. (2016). Rapid and sustained symptom reduction following psilocybin treatment for anxiety and depression in patients with life-threatening cancer: A randomized controlled trial. Journal of Psychopharmacology, 30(12), 1165–1180. https://doi.org/10.1177/0269881116675512.CrossRef Google Scholar PubMed

Rostami Kandroodi, M., Cook, J. L., Swart, J. C., Froböse, M. I., Geurts, D. E. M., Vahabie, A. H., … den Ouden, H. E. M. (2021). Effects of methylphenidate on reinforcement learning depend on working memory capacity. Psychopharmacology, 238, 3569–3584. https://doi.org/10.1007/s00213-021-05974-w.CrossRef Google Scholar PubMed

Rygula, R., Clarke, H. F., Cardinal, R. N., Cockcroft, G. J., Xia, J., Dalley, J. W., … Roberts, A. C. (2015). Role of central serotonin in anticipation of rewarding and punishing outcomes: Effects of selective amygdala or orbitofrontal 5-HT depletion. Cerebral Cortex, 25(9), 3064–3076. https://doi.org/10.1093/cercor/bhu102.CrossRef Google Scholar PubMed

Schiller, D., Kanen, J. W., LeDoux, J. E., Monfils, M.-H., & Phelps, E. A. (2013). Extinction during reconsolidation of threat memory diminishes prefrontal cortex involvement. Proceedings of the National Academy of Sciences of the United States of America, 110(50), 20040–20045. https://doi.org/10.1073/pnas.1320322110.CrossRef Google Scholar PubMed

Schindler, C. W., Gormezano, I., & Harvey, J. A. (1986). Effect of LSD on acquisition, maintenance, extinction and differentiation of conditioned responses. Pharmacology, Biochemistry and Behavior, 24(5), 1293–1300. https://doi.org/10.1016/0091-3057(86)90187-5.CrossRef Google Scholar PubMed

Schmid, Y., Enzler, F., Gasser, P., Grouzmann, E., Preller, K. H., Vollenweider, F. X., … Liechti, M. E. (2015). Acute effects of lysergic acid diethylamide in healthy subjects. Biological Psychiatry, 78(8), 544–553. https://doi.org/10.1016/j.biopsych.2014.11.015.CrossRef Google Scholar PubMed

Schmidt, A., Müller, F., Lenz, C., Dolder, P. C., Schmid, Y., Zanchi, D., … Borgwardt, S. (2018). Acute LSD effects on response inhibition neural networks. Psychological Medicine, 48(9), 1464–1473. https://doi.org/10.1017/S0033291717002914.CrossRef Google Scholar PubMed

Schultz, W. (2019). Recent advances in understanding the role of phasic dopamine activity [version 1; peer review: 3 approved]. F1000Research, 8, 1–12. https://doi.org/10.12688/f1000research.19793.1.CrossRef Google Scholar PubMed

Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science (New York, N.Y.), 275, 1593–1599. https://doi.org/10.1126/science.275.5306.1593.CrossRef Google Scholar PubMed

Shen, W., Flajolet, M., Greengard, P., & Surmeier, D. J. (2008). Dichotomous dopaminergic control of striatal synaptic plasticity. Science (New York, N.Y.), 321(5890), 848–851. https://doi.org/10.1126/science.1160575.CrossRef Google Scholar PubMed

Skandali, N., Rowe, J. B., Voon, V., Deakin, J. B., Cardinal, R. N., Cormack, F., … Sahakian, B. J. (2018). Dissociable effects of acute SSRI (escitalopram) on executive, learning and emotional functions in healthy humans. Neuropsychopharmacology, 43(13), 2645–2651. https://doi.org/10.1038/s41386-018-0229-z.CrossRef Google Scholar PubMed

Steinfurth, E. C. K., Kanen, J. W., Raio, C. M., Clem, R. L., Huganir, R. L., & Phelps, E. A. (2014). Young and old Pavlovian fear memories can be modified with extinction training during reconsolidation in humans. Learning & Memory, 21(7), 338–341. https://doi.org/10.1101/lm.033589.113.CrossRef Google Scholar

Swart, J. C., Froböse, M. I., Cook, J. L., Geurts, D. E. M., Frank, M. J., Cools, R., & den Ouden, H. E. M. (2017). Catecholaminergic challenge uncovers distinct Pavlovian and instrumental mechanisms of motivated (in)action. ELife, 6, 1–36. https://doi.org/10.7554/eLife.22169.CrossRef Google Scholar PubMed

Vaidya, V. A., Marek, G. J., Aghajanian, G. K., & Duman, R. S. (1997). 5-HT2A receptor-mediated regulation of brain-derived neurotrophic factor mRNA in the hippocampus and the neocortex. Journal of Neuroscience, 17(8), 2785–2795. https://doi.org/10.1523/jneurosci.17-08-02785.1997.CrossRef Google Scholar PubMed

Vollenweider, F. X., & Preller, K. H. (2020). Psychedelic drugs: Neurobiology and potential for treatment of psychiatric disorders. Nature Reviews Neuroscience, 21, 611–624. https://doi.org/10.1038/s41583-020-0367-2.CrossRef Google Scholar PubMed

Walker, S. C., Robbins, T. W., & Roberts, A. C. (2009). Differential contributions of dopamine and serotonin to orbitofrontal cortex function in the marmoset. Cerebral Cortex, 19, 889–898. https://doi.org/10.1093/cercor/bhn136.CrossRef Google Scholar PubMed

Yanakieva, S., Polychroni, N., Family, N., Williams, L. T. J., Luke, D. P., & Terhune, D. B. (2019). The effects of microdose LSD on time perception: A randomised, double-blind, placebo-controlled trial. Psychopharmacology, 236(4), 1159–1170. https://doi.org/10.1007/s00213-018-5119-x.CrossRef Google Scholar PubMed

Yin, H. H., & Knowlton, B. J. (2006). The role of the basal ganglia in habit formation. Nature Reviews Neuroscience, 7(6), 464–476. https://doi.org/10.1038/nrn1919.CrossRef Google Scholar PubMed

Fig. 1. (a) Schematic of the PRL task. Subjects chose one of three stimuli. The timeline of a trial is depicted: stimuli appear, a choice is made, the outcome is shown, a fixation cross is presented during the intertrial interval, stimuli appear for the next trial (etc.) (RT, reaction time). One stimulus delivered positive feedback (green smiling face) with a 75% probability, one with 50%, and one with 25%. The probabilistic alternative was negative feedback (red sad face). Midway through the task, the contingencies for the best and worst stimuli swapped. s, seconds. (b) Better initial learning was predictive of more perseveration on LSD and not on placebo. Shading indicates ± 1 standard error of the mean (s.e.). (c) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the placebo session. A sliding 5-trial window was used for smoothing. The vertical dotted line indicates the reversal of contingencies. R-P indicates mostly rewarded stimulus, later mostly punished. N-N indicates neutral stimulus during both acquisition and reversal. P-R indicates mostly punished stimulus, later mostly rewarded stimulus. Shading indicates ± 1 s.e. (d) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the LSD session. A sliding 5-trial window was used for smoothing. The vertical dotted line indicates the reversal of contingencies. R-P indicates mostly rewarded stimulus, later mostly punished. N-N indicates neutral stimulus during both acquisition and reversal. P-R indicates mostly punished stimulus, later mostly rewarded stimulus. Shading indicates ± 1 s.e. (e) Distributions depicting the average per-subject probability (scattered dots) of choosing each stimulus while under placebo (shown in dark blue) and LSD (light blue). The mean value for each distribution is illustrated with a single dot at the base of each distribution, and the mean values for the probability of choosing different stimuli in each condition are connected by a line. Black error bars around the mean value show ± 1 s.e. Horizontal dotted line indicates chance-level ‘stay’ behaviour (33%). The global probability of choosing each stimulus did not differ between the placebo and LSD conditions. (f) Raw data measures of feedback sensitivity were unaffected by LSD. Distributions depicting the average per-subject probability (scattered dots) of repeating a choice (staying) after receiving positive or negative feedback under placebo (dark blue) and LSD (light blue). The horizontal dotted line indicates chance-level ‘stay’ behaviour (33%).

Table 1. Prior distributions for model parameters

Table 2. Model comparison

Fig. 2. Effects of LSD relative to placebo on model parameters. Contrasts with the posterior 95% (or greater) HDI of the difference between means excluding zero (0 ∉ 95% HDI) are shown in red. Yellow signifies 0 ∉ 90% HDI. (a) Acquisition and reversal phases (all trials) modelled together. The third row represents a difference of differences scores: (αrewLSD – αpunLSD) – (αrewplacebo – αpunplacebo). (b) Isolating the acquisition phase. (c) Isolating the reversal phase.

Kanen et al. supplementary material

File 724.6 KB

Article contents

Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

Abstract

Keywords

Information

Introduction

Materials and methods

Subjects and drug administration

Probabilistic reversal learning task

Raw data measures of behaviour

Computational modelling of behaviour

Model fitting, comparison, and interpretation

Models

Results

Learning and perseveration

Feedback sensitivity

Choice of reinforcement learning model

Reward and punishment learning rates

Stimulus stickiness and reinforcement sensitivity

Relationship between model parameters and raw data behavioural measures

Discussion

Supplementary material

Financial support

Footnotes

References

Kanen et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests