Effect of lysergic acid diethylamide (LSD) on reinforcement learning in humans

Background The non-selective serotonin 2A (5-HT2A) receptor agonist lysergic acid diethylamide (LSD) holds promise as a treatment for some psychiatric disorders. Psychedelic drugs such as LSD have been suggested to have therapeutic actions through their effects on learning. The behavioural effects of LSD in humans, however, remain incompletely understood. Here we examined how LSD affects probabilistic reversal learning (PRL) in healthy humans. Methods Healthy volunteers received intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline) in a within-subjects design and completed a PRL task. Participants had to learn through trial and error which of three stimuli was rewarded most of the time, and these contingencies switched in a reversal phase. Computational models of reinforcement learning (RL) were fitted to the behavioural data to assess how LSD affected the updating (‘learning rates’) and deployment of value representations (‘reinforcement sensitivity’) during choice, as well as ‘stimulus stickiness’ (choice repetition irrespective of reinforcement history). Results Raw data measures assessing sensitivity to immediate feedback (‘win-stay’ and ‘lose-shift’ probabilities) were unaffected, whereas LSD increased the impact of the strength of initial learning on perseveration. Computational modelling revealed that the most pronounced effect of LSD was the enhancement of the reward learning rate. The punishment learning rate was also elevated. Stimulus stickiness was decreased by LSD, reflecting heightened exploration. Reinforcement sensitivity differed by phase. Conclusions Increased RL rates suggest LSD induced a state of heightened plasticity. These results indicate a potential mechanism through which revision of maladaptive associations could occur in the clinical application of LSD.

Higher-order cognitive flexibility, on a set-shifting task, was impaired by acute intoxication with LSD in healthy humans (Pokorny et al., 2019).Meanwhile, psilocybin increased higherorder cognitive flexibility (set shifting), subsequent to drug treatment, in individuals with major depressive disorder (Doss et al., 2021).Ayahuasca, another psychedelic non-selective 5-HT 2A agonist, and psilocybin have been shown to increase creative thinking during and after drug administration, which was interpreted as increased psychological flexibility (Kuypers et al., 2016;Mason, Mischler, Uthaug, & Kuypers, 2019).Meanwhile, healthy human behaviour on an outcome devaluation task, used to parse habitual v. goal-directed action, was not impaired by LSD (Hutten et al., 2020).
Here, we studied healthy human volunteers to examine the effects of LSD on a widely used translational measure of instrumental conditioning and behavioural/cognitive flexibility: probabilistic reversal learning (PRL).In contrast to the set-shifting and outcome devaluation tasks used previously, PRL models fundamental aspects of choice behaviour under uncertainty (probabilistic reinforcement) and when flexibility is required.We explored how LSD altered not only overt choice behaviour during PRL (using classical statistics) but also the underlying learning mechanisms, using computational models of reinforcement learning (RL, using Bayesian statistics), which have not been employed in previous studies.Utilising PRL in a placebo-controlled study of healthy human volunteers, the aim of the current experiment was to inform the psychological mechanisms by which LSD could have salubrious effects on mental health.
Based on raw data measures, we predicted LSD would modulate either sensitivity to negative feedback or the impact of learned values on subsequent perseverative behaviour (den Ouden et al., 2013).Measuring 'staying' (repeating a choice) or 'shifting' (choosing another stimulus) after wins or losses assesses sensitivity to immediate reinforcement but does not account for the integration of feedback history across multiple experiences to influence behaviour (Daw, 2011).To this end, we applied computational models of RL.The expected value of choice options, for example, increases or decreases dynamically based on reward or punishment prediction errors (experienced better or worse than expected outcomes).A key objective of this study was to evaluate the effects of LSD on the rate at which value is updated ('learning rates')in essence, does LSD affect how quickly expectations change following reinforcement?Another question of interest was whether LSD modulates exploratory behaviour.We tested two varieties of exploration.First, we addressed whether LSD impacts the extent to which behaviour is guided Psychological Medicine by exploiting the more highly valued choice or, conversely, an exploratory pattern that is less guided by value (termed high or low 'reinforcement sensitivity,' respectively).The second variety of exploration (low 'stimulus stickiness') was value-free rather than value-based in that it represents a tendency to explore (rather than repeat) different choices (stimuli) to what has been chosen previously, regardless of the action's outcome (irrespective of value representations).

Subjects and drug administration
Nineteen healthy volunteers (mean age 30.6; 15 males), over the age of 21, attended two sessions at least two weeks apart where they received either intravenous LSD (75 μg in 10 mL saline) or placebo (10 mL saline), in a single-blind within-subjects balanced-order design.Whereas 20 participants were included in the original study (Carhart-Harris et al., 2016b), one participant did not complete the PRL task; therefore, 19 participants are reported here.Demographic information is provided in online Supplementary Table S1.All participants provided written informed consent after briefing on the study and screening.Participants had no personal history of diagnosed psychiatric disorder, or immediate family history of a psychotic disorder.Other inclusion criteria were a normal electrocardiogram (ECG), normal screening blood tests, negative urine tests for pregnancy and recent recreational drug use, a negative breathalyser test for recent alcohol use, alcohol use limited to less than 40 UK units per week, and absence of a significant medical condition.Participants had previous experience with a classic psychedelic drug [e.g.LSD, mescaline, psilocybin/magic mushrooms, or dimethyltryptamine (DMT)/ayahuasca] without an adverse reaction, and had not used these within six weeks of the study.Screening was conducted at the Imperial College London Clinical Research Facility (ICRF) at the Hammersmith Hospital campus, and the study was carried out at the Cardiff University Brain Research Imaging Centre (CUBRIC).Participants were blinded to the condition but the experimenters were not.A cannula was inserted and secured in the antecubital fossa and injection was performed over the course of two minutes.Participants reported noticing subjective effects of LSD five to 15 min after dosing.The PRL task was administered approximately five hours after injection.Once the subjective drug effects subsided, a psychiatrist assessed suitability for discharge.This experiment was part of a larger study, the data from which are published elsewhere (e.g.Carhart-Harris et al. 2016b).Additional information can be found in Carhart-Harris et al. (2016b).

Probabilistic reversal learning task
A schematic of the task is shown in Fig. 1a.On every trial, participants could choose from three visual stimuli, presented at three of four randomised locations on a computer screen.In the first half of the task (40 trials), choosing one of the stimuli resulted in positive feedback in the form of a green smiling face on 75% of trials.A second stimulus resulted in positive feedback 50% of the time, whilst the third stimulus yielded positive feedback on only 25% of trials.Negative feedback was provided in the form of a red frowning face.The first stimulus selected was defined as the initially rewarded stimulus; the choice on trial 1 always resulted in reward.The second stimulus that was selected was defined as the mostly punished stimulus, and by definition the third stimulus was then the 'neutral' stimulus.After 40 trials, the most and least optimal stimuli reversed, such that the stimulus that initially was correct 75% of the time was then only correct 25% of the time, and likewise the 25% correct stimulus then resulted in positive feedback on 75% of trials.There were 40 trials in the reversal phase.This is a recently developed version (Rostami Kandroodi et al., 2021) of a widely used PRL task (den Ouden et al., 2013;Lawrence et al., 1999) novel due to the addition of a 50% 'neutral' stimulus in order to distinguish learning to select the mostly rewarding stimulus from learning to avoid the mostly punishing stimulus.

Raw data measures of behaviour
We examined whether LSD impaired participants' basic overall ability to perform the task by analysing the number of responses made to each stimulus during the acquisition and reversal phases.We measured feedback sensitivity by determining whether participants stayed with the same choice following positive or negative feedback (win-stay or lose-stay).The win-stay probability was defined as the number of times an individual repeated a choice after a win, divided by the number of trials on which positive feedback occurred (opportunities to stay after a win).Lose-stay probability was calculated in the same manner: the number of times a choice was repeated following a loss, divided by the total losses experienced.Note that in previous studies with a choice between only two stimuli (or responses), this metric is usually referred to as 'win-stay/lose-shift', which also captures the tendency to repeat (rather than switch) responses following a win, and the tendency to switch (rather than repeat) choices following a loss.Random choice would result in 50% win-stay and 50% lose-shift; however, in the current paradigm with 3 stimuli, this base rate is 33% (win-)stay and 67% (lose-)shift.We therefore encode both variables with respect to the stay (rather than shift) rate, but they are still conceptually identical to earlier studies.Perseveration was defined according to den Ouden et al. (2013) and was assessed based on responses in the reversal phase.A perseverative error occurred when two or more (now incorrect) responses were made to the previously correct stimulus, and these errors could occur at any point in the reversal phase.The first trial in the reversal phase (trial 41 of 80) was excluded from the perseveration analysis, however, as at that point behaviour cannot yet have been shaped by the new feedback structure.Note again that this metric is not entirely identical to the previous studies cited employing two stimuli, as the base-rate choice for each stimulus is now 1/3, so the 'chance' level of perseverative errors is lower.Null hypothesis significance tests used α = 0.05.

Model fitting, comparison, and interpretation
These methods are based on our previous work (Kanen et al., 2019).We fitted three RL models to the behavioural data using a hierarchical Bayesian method, via Hamiltonian Markov chain Monte Carlo sampling implemented in Stan 2.17.2 (Carpenter et al., 2017).Convergence was checked according to R, the potential scale reduction factor measure (Brooks & Gelman, 1998;Gelman, Hill, & Yajima, 2012), which approaches 1 for perfect convergence.Values below 1.2 are typically used as a guideline for determining model convergence (Brooks and Gelman 1998).We assumed the three models had the same prior probability (0.33).Models were compared via a bridge sampling estimate of the marginal likelihood (Gronau et al., 2017a), using the 'bridgesampling' package in R (Gronau, Singmann, & Wagenmakers, 2017b).Bridge sampling directly estimates the marginal likelihood, and therefore the posterior probability of each model given the data (and prior model probabilities), as well as the assumption that the models represent the entire group of those to be considered.Posterior distributions were interpreted using the 95% highest posterior density interval (HDI), which is the Bayesian 'credible interval.'Parameter recovery for this modelling approach has been confirmed in a previous study (Kanen et al., 2019) and is demonstrated in the online Supplementary material.
The Bayesian hierarchy consisted of 'drug condition' at the highest level, and 'subject' at the level below.For each parameter, each drug condition (e.g.LSD) had its own mean (with a prior that was the same across conditions, i.e. with priors that were unbiased with respect to LSD v. placebo).This was then merged with the intersubject variability (assumed to be normally distributed; mean 0 by definition, standard deviation determined by a further prior).The priors used for each parameter are shown in Table 1.For instance, the learning rate for a given subject under LSD was taken as: the group mean LSD value for learning rate, plus the subject-specific component of learning rate.The learning rate for a given subject under placebo was taken as: the group mean placebo value for learning rate, plus the subjectspecific component of the learning rate for the same subject.This method accounts for the within-subjects structure of the study design.This was done similarly (and separately) for all other model parameters.
To determine the change (LSDplacebo) in parameters, we calculated [group mean LSD learning rate] -[group mean placebo learning rate] for each of the ∼8000 simulation runs and tested them against zero via the HDI.This approach also removes distributional assumptions and provides an automatic multiple comparisons correction (Gelman et al., 2012;Gelman & Tuerlinckx, 2000;Kruschke, 2011).

Models
The parameters contained in each model are summarised in Tables 1 and 2. With Model 1, we tested the hypothesis that positive v. negative feedback guides behaviour differentially, and that LSD affects this.We augmented a basic RL model (Rescorla & Wagner, 1972) with separate learning rates for reward, α rew , and punishment, α pun .Positive feedback led to an increase in the value V i of the stimulus i that was chosen, at a speed governed by the reward learning rate, α rew , via R t represents the outcome on trial t (defined as 1 on trials where positive feedback occurred), and (R t -V i,t ) the prediction error.On trials where negative feedback occurred, R t = 0, which led to a decrease in value of V i at a speed governed by the punishment learning rate, α pun , according to V i,t+1 ← V i,t + α pun (R t -V i,t ).Stimulus value was incorporated into the final quantity controlling choice according to Q reinf t = τ reinf V t .The additional parameter τ reinf , termed reinforcement sensitivity, governs the degree to which behaviour is driven by reinforcement history.The quantities Q associated with the three available choices, for a given trial, were then fed into a standard softmax choice function to compute the probability of each choice: for n = 3 choice options.The probability values for each trial emerging from the softmax function (the probability of choosing stimulus 1) were fitted to the subject's actual choices (did the subject choose stimulus 1?).No further softmax inverse temperature was applied (β = 1; see below), and as a result the reinforcement sensitivity parameter (τ reinf ) directly represented the weight given to the exponents in the softmax function.
Model 2 again augmented a simple RL model, but now also described the tendency to repeat a response, irrespective of the outcome that followed it (in other words, the tendency to 'stay' regardless of outcome).With Model 2 we tested the hypothesis that LSD affects this basic perseverative tendency.This was implemented using a 'stimulus stickiness' parameter, τ stim .The stimulus stickiness effect was modelled as Q stim t = τ stim s t-1 , where s t-1 was 1 for the stimulus that was chosen on the previous trial and was 0 for the other two stimuli.In this model, we used only a single RL rate, α reinf .Positive reinforcement led to an increase in the value V i of the stimulus i that was chosen, at a speed controlled by the learning rate, α reinf , via V i,t+1 ← V i,t + α reinf (R t -V i,t ).The final quantity controlling choice incorporated the additional stickiness parameter as . Quantities Q, corresponding to the three choice options on a given trial, were then fed into the softmax function as above.It should be noted that if τ stim is not in the model (or is zero), then τ reinf is mathematically identical to the notion of softmax inverse temperature typically implemented as β.The notation τ reinf is used, however, because it contributes to Q reinf t but not to Q stim t .A standard implementation of β, by contrast, would govern the effects of both Q reinf t and Q stim t by weighting the sum of the two (Q t ).Model 3 was the full model that incorporated separate reward and punishment learning rates as well as the stimulus stickiness parameter.With Model 3, we tested the hypothesis that LSD affects both how positive v. negative feedback guides behaviour differentially, and how LSD affects a basic perseverative tendency.Again, the final quantity controlling choice was determined by
We then examined the relationship between initial learning and perseveration, following den Ouden et al. ( 2013) (Fig. 1b).LSD enhanced the relationship between the number of correct responses during the acquisition phase and the number of perseverative errors made during the subsequent reversal stage [acquisition correct responses (LSD minus placebo) v. reversal perseverative errors (LSD minus placebo): linear regression coefficient β = 0.56, p = 0.002].Confirming this, making fewer errors during the acquisition phase predicted more perseverative errors when on LSD (β = 0.44, p = 0.003) but not when under placebo (β = 0.04, p = 0.8).Perseverative errors, a subset of all reversal errors, alone did not differ between conditions (t 18 = 0.03, p = 0.98, d = 0.01).

Choice of reinforcement learning model
The core modelling results are displayed in Fig. 2. We fitted and compared three RL models.Convergence was good with all three models having R < 1.2.Behaviour was best characterised by a RL model with four parameters (Table 2).The four parameters in the winning model were: (1) reward learning rate, which reflects the degree to which the chosen stimulus value is increased following a positive outcome; (2) punishment learning rate, the degree to which the chosen stimulus value is decreased following a negative outcome; (3) reinforcement sensitivity, the degree to which the values learned through reinforcement contribute to final choice; and (4) 'stimulus stickiness', which quantifies the tendency to get 'stuck' to a stimulus and choose it because it was chosen on the previous trial, irrespective of the outcome.The last two parameters resemble the explore/exploit trade-off: low values of stickiness or reinforcement sensitivity characterise two different types of exploratory behaviour.

Reward and punishment learning rates
First, we modelled all 80 trials in the task (both acquisition and reversal phases) and these results are depicted in Fig. 2a.The reward learning rate was significantly elevated on LSD (mean 0.87) compared to placebo (mean 0.28) [with the posterior 99.9% HDI of the difference between these means excluding zero; 0 ∉ 99.9% HDI].There was also an increased punishment learning rate under LSD (mean 0.48) relative to placebo (mean 0.39) (drug difference, 0 ∉ 99% HDI; Figure 2a 99% HDIs not shown graphically).LSD increased the reward learning rate to a greater extent than the punishment learning rate [(α rew,LSDα rew,placebo ) -(α pun,LSD -α pun,placebo ) > 0; drug difference, 0 ∉ 99% HDI].

Stimulus stickiness and reinforcement sensitivity
Modelling both acquisition and reversal contiguously, stimulus stickiness was lowered by LSD (mean 0.23) relative to placebo (mean 0.43) (drug difference, 0 ∉ 90% HDI; Figure 2a), which is a manifestation of increased exploratory behaviour.Reinforcement sensitivity was not modulated by LSD (LSD mean 4.70, placebo mean 5.57; no drug difference, 0 ∈ 95% HDI).This is in line with the absence of an effect of LSD on the tendency to 'stay' following reward or punishment (see analysis of raw data measures above).

Relationship between model parameters and raw data behavioural measures
Analyses to understand the relationship between computational and raw data measures were conducted.Given the initial finding on the relationship between better acquisition learning and perseveration, the first question addressed was whether the elevated reward learning rate under LSD during acquisition, from the computational model, was predictive of the raw data measure of perseveration from den Ouden et al. (2013).Simple linear regression showed that under LSD, a higher reward learning rate during acquisition predicted significantly more perseverative errors (β = 26.94,p = 0.02), whereas no such relationship was present when the same participants were under placebo (β = 9.59, p = 0.40).Next, we examined the relationship between the stimulus stickiness parameter from the computational model and the raw data measure of perseveration.Stimulus stickiness during reversal was not significantly correlated with the raw data measure of perseveration, in either the placebo (β = 4.13, p = 0.50) or LSD (β = 11.60,p = 0.09) condition.Further exploratory analyses are reported in the online Supplementary material.

Discussion
There has been a recent surge of interest in the potential therapeutic effects of psychedelics, including LSD.Theorising on the mechanisms of such effects centres on their role in enhancing learning and plasticity.In the current study, we tested these postulated effects of LSD in flexible learning in humans and find that LSD increased learning rates, exploratory behaviour, and the impact of previously learnt values on subsequent perseverative behaviour.Specifically, LSD increased the speed at which value representations were updated following prediction error (the mismatch between expectations and experience).Whilst LSD enhanced the impact of both positive and negative feedback, overall it augmented learning from reward significantly more than it augmented learning from punishment.
Behaviour was more exploratory overall under LSD, as assessed computationally in two ways, consistent with theoretical accounts of psychedelic effects which have predicted increased exploratory tendencies (Carhart-Harris & Friston, 2019).First, LSD decreased stimulus stickiness, which indicates a diminished tendency to repeat previously chosen options, irrespective of reinforcement history (value-free).This effect on stickiness was significant in all phases of the experimentwhen considering the entire experiment as a whole (acquisition and reversal), when examining initial learning only (acquisition), and when isolating the reversal phase.In other words, regardless of LSD-induced changes in value-guided choice strategies (elaborated upon below), LSD promoted an overall latent tendency to explore in the form of shifting between choices, irrespective of feedback and value, which was maintained during both stable and changing circumstances.That LSD lowered stimulus stickiness may also be clinically relevant: stimulus stickiness was recently shown to be abnormally high in cocaine and amphetamine use disorders (Kanen et al., 2019).
LSD also modulated value-based exploratory tendencies (indexed by the reinforcement sensitivity parameter), which, by contrast, differed by phase.When looking at the experiment as a whole, there was no effect of LSD on reinforcement sensitivity, although lack of an effect here was obscured by the following patterns: When examining initial learning only, reinforcement sensitivity was substantially diminished under LSD, indicating a tendency for increased exploration away from the more highly valued choice option.During the reversal phase, meanwhile, reinforcement sensitivity was increased, indicative of a heightened tendency to exploit the choice option that was computed to be more highly valued trial-by-trial, which can be seen as adaptive when circumstances change, and rapid reorienting of actions is required.
A shift in the computations underlying choice was also observed in relation to RL rates, during learning to maximise reward and minimise punishment in an initial situation and when adapting actions following contingency reversal.Whereas overall, LSD enhanced both the reward and punishment rates (especially for rewards), the increase in punishment learning rate appeared during the reversal phase only.The reward learning rate was elevated in both the acquisition and reversal phases.Together, these learning rate findings suggest that LSD accelerates the updating of value, in a way that is (overall) especially rewarddriven, and LSD speeds up learning from negative feedback that is encountered when circumstances change.
Under LSD, better initial learning led to more perseverative responding.The implication is that when a behaviour is newly and more strongly learned through positive reinforcement (i.e. the acquisition phase) under LSD, it may persist more strongly even when that action is no longer relevant (i.e. the reversal phase).These measures of overt performance defined based on feedback are orthogonal to an overall latent tendency towards exploration irrespective of reinforcement history (low stimulus stickiness).Importantly, perseveration (den Ouden et al., 2013) itself, as assessed in the analysis of raw data measures, was not elevated by LSD, nor did it correlate with stimulus stickiness (online Supplementary Table S3).
Given the broad effect of LSD on a range of neurotransmitter systems (Nichols, 2004(Nichols, , 2016)), it is not possible to determine the specific neurochemical mechanism underlying the observed LSD effects on learning.Nonetheless, obvious possibilities involve the serotonin and dopamine systems, in particular 5-HT 2A and D 2 receptors (Marona-Lewicka et al., 2005;Marona-Lewicka & Nichols, 2007;Nichols, 2004Nichols, , 2016)).Specifically, the psychological plasticity purportedly promoted by psychedelics is believed to be mediated through action at 5-HT 2A receptors (Carhart-Harris & Nutt, 2017) via downstream enhancement of glutamatergic activity (Barre et al., 2016) and brain-derived neurotrophic factor (BDNF) expression (Hutten et al., 2021;Vaidya et al., 1997).The hypothesis that the present results regarding RL rates are driven by the serotonergic effects of LSD is supported by two recent studies in mice.Optogenetically stimulating dorsal raphé serotonin neurons enhanced RL rates (Iigaya, Fonseca, Murakami, Mainen, & Dayan, 2018), whilst activation of these neurons tracked both reward and punishment prediction errors during reversal learning (Matias et al., 2017).Neurotoxic manipulation of serotonin in marmoset monkeys during PRL, meanwhile, altered stimulus stickiness (Rygula et al., 2015): this implicates a serotonergic mechanism underlying increased exploratory behaviour following LSD administration in the present study.
In addition to affecting the serotonin system, however, LSD also acts at dopamine receptors (Nichols, 2004(Nichols, , 2016)), albeit with a far lower direct affinity for dopamine receptors than for 5-HT receptors.Dopamine has long been known to play a crucial role in belief updating following reward (Schultz et al., 1997), and more recent evidence shows that dopaminergic manipulations may alter learning rates (Kanen et al., 2019;Schultz, 2019;Swart et al., 2017).A dopaminergic effect would be in line with our previous study where genetic variation in the dopamine, but not serotonin transporter polymorphism, was associated with the same enhanced relationship between acquisition and perseveration as reported here under LSD (den Ouden et al., 2013).
Serotonin-dopamine interactions represent another candidate mechanism that could underlie the present findings.For example, stimulation of 5-HT 2A receptors in the prefrontal cortex of the rat Psychological Medicine enhanced ventral tegmental area dopaminergic activity (Bortolozzi, Díaz-Mataix, Scorza, Celada, & Artigas, 2005).Indeed, the initial action of LSD at 5-HT 2A receptors has been proposed to sensitise dopamine neuron firing (Nichols, 2016).LSD action at D 2 receptors, albeit with a low binding affinity, may be more pronounced in a late phase of LSD's effects (Marona-Lewicka et al., 2005;Marona-Lewicka & Nichols, 2007), which may be relevant given the relatively long delay between LSD administration and performance of the current task (see Methods).However, arguing against a late dopaminergic effect is a previous study in rodents where the effects of LSD on reversal learning were consistent across four different time lags between drug administration and behavioural testing (King, Martin, & Melville, 1974).
The result of the enhanced coupling of acquisition learning and perseverative responding under LSD is in line with a recent study showing that LSD induced higher-order cognitive inflexibility in a set-shifting paradigm (Pokorny et al., 2019).Importantly, these effects were blocked by co-administration of the 5-HT 2A antagonist ketanserin (Pokorny et al., 2019), showing that the LSD-induced impairments were mediated by 5-HT 2A agonism, consistent with a 5-HT 2A mechanism underlying the present results.
LSD's effects to increase acquisition-perseveration coupling and worsen set-shifting (Pokorny et al., 2019), in conjunction, suggest that what is newly or recently learnt through reinforcement under LSD is more 'stamped in', and thus may subsequently be harder to update.Whilst these findings are ostensibly at odds with the observation that LSD enhanced plasticity (through enhanced learning rates), they can be reconciled by considering the timing of drug administration with respect to initial learning and tests of cognitive flexibility.In both the present experiment and the previous set-shifting study (Pokorny et al., 2019), all phases of learning (acquisition and reversal) were conducted after LSD administration.In contrast, when acquisition learning was conducted prior to LSD administration, LSD resulted in improved reversal learning (using a reversal paradigm in rats; King et al., 1974).Likewise, when acquisition learning was conducted prior to the administration of a 5-HT 2A antagonist, reversal learning was impaired (Boulougouris et al., 2008; also see Furr et al., 2012).Collectively, these findings suggest that whether a prior belief is down-or up-weighted under LSD may depend on whether the prior is formed before or during drug administration, respectively.This observation is of great relevance for a putative therapeutic setting, where maladaptive beliefs will have been formed before treatment.
Another important consideration for reconciling the effects of 5-HT 2A receptor modulation on behavioural/cognitive flexibility is that 5-HT 2A antagonism can produce opposite effects depending on whether the OFC or striatum is targeted (Amodeo et al., 2017), complicating the interpretation of studies employing systemic administration (Amodeo et al., 2014(Amodeo et al., , 2020;;Baker et al., 2011;Odland et al., 2021).Species, strain, dose, compound, route of administration, task specifications (and engagement of cortical and subcortical structures), and reinforcement schedule must also be considered.The application of computational modelling may also help unify effects across studies and species.
While we observed an effect of LSD on acquisitionperseveration coupling, reminiscent of a previous similar observation as a function of genetic variability in the dopamine transporter (den Ouden et al., 2013), we did not observe effects of LSD on acquisition performance or perseveration directly, or on lose-stay and win-stay behaviour, unexpectedly.In fact, more broadly, the effects of LSD observed here differ from the effects of neurochemically more specific influences such as acute serotonin reuptake inhibition (Bari et al., 2010;Skandali et al., 2018), or neurotoxic serotonin depletion (Bari et al., 2010;Rygula et al., 2015).More in line with this, previous studies with LSD administration, examining perseveration, using an outcome devaluation paradigm, found no effect of LSD (Hutten et al., 2020), nor did a study on visual memory during paired associates learning (Family et al., 2020).
Our computational modelling approach, here, was more sensitive to detecting the effects of LSD.It may be possible to reconcile these robust computational effects with the minimal overt behavioural performance effects via the following speculation.Subtle differences in states of underlying plasticity may not translate to overt differences in instrumental or Pavlovian responses, even if the long-term expression of these learned responses would differ.For example, in the memory reconsolidation literature, a previously learned associative memory is believed to become susceptible to disruption (e.g.pharmacologically or behaviourally) following cued reactivation or recall for a period of several hours known as the 'reconsolidation window' (Lee, Nader, & Schiller, 2017).There is evidence that conducting extinction training (learning) during the reconsolidation windowwhen mechanisms of plasticity differdoes not alter the overt success or failure of extinction within the session, yet there are long-term effects; extinction learning during the reconsolidation window can be more enduring than extinction learned outside of this window (Schiller, Kanen, LeDoux, Monfils, & Phelps, 2013;Steinfurth et al., 2014).These Pavlovian extinction learning data, showing no difference during extinction itself, may parallel the instrumental conditioning data in the present study, in that we report no observable effect of LSD on most raw data measures (e.g.number of correct responses), yet latent learning processes that relate to purported mechanisms of plasticity, namely learning rate, were affected.Future studies would need to determine whether and how to harness this apparent window of heightened plasticity for therapeutic benefit.
Limitations of this study include the following.We have made a case for the critical involvement of the 5-HT 2A receptor; however, we cannot be sure which particular receptor interaction(s) the current findings are caused by.LSD, in addition to binding with high affinity to 5-HT 2A receptors, acts at numerous other receptors including D 1 , D 2 , 5-HT 1A/1B/1D , 5-HT 2C , 5-HT 5A , 5-HT 6 , and 5-HT 7 (Nichols, 2004).Indeed, 5-HT 2C receptors can counter 5-HT 2A effects on reversal learning (Boulougouris et al., 2008).A future study co-administering LSD with a 5-HT 2A antagonist would help discern the putative 5-HT 2Amediated effects.Additionally, the subjective effects and plasma levels of LSD were not measured at the time of task administration.Furthermore, even though our parameter recovery analysis was successful (see online Supplementary material), we were unable to demonstrate the initial learning-perseveration effect observed in the behavioural data in the simulated data.
In summary, the core result of this study was that LSD enhanced the rate at which humans updated their beliefs based on feedback.RL was most enhanced by LSD when receiving the reward, and to a lesser extent following punishment.LSD also increased exploratory behaviour.These findings have implications for understanding the mechanisms through which LSD might be therapeutically useful for revising deleterious associations.
Supplementary material.The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722002963

Fig. 1 .
Fig. 1.(a) Schematic of the PRL task.Subjects chose one of three stimuli.The timeline of a trial is depicted: stimuli appear, a choice is made, the outcome is shown, a fixation cross is presented during the intertrial interval, stimuli appear for the next trial (etc.)(RT, reaction time).One stimulus delivered positive feedback (green smiling face) with a 75% probability, one with 50%, and one with 25%.The probabilistic alternative was negative feedback (red sad face).Midway through the task, the contingencies for the best and worst stimuli swapped.s, seconds.(b) Better initial learning was predictive of more perseveration on LSD and not on placebo.Shading indicates ± 1 standard error of the mean (S.E.).(c) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the placebo session.A sliding 5-trial window was used for smoothing.The vertical dotted line indicates the reversal of contingencies.R-P indicates mostly rewarded stimulus, later mostly punished.N-N indicates neutral stimulus during both acquisition and reversal.P-R indicates mostly punished stimulus, later mostly rewarded stimulus.Shading indicates ± 1 S.E.(d) Trial-by-trial average probability of choosing each stimulus, averaged over subjects during the LSD session.A sliding 5-trial window was used for smoothing.The vertical dotted line indicates the reversal of contingencies.R-P indicates mostly rewarded stimulus, later mostly punished.N-N indicates neutral stimulus during both acquisition and reversal.P-R indicates mostly punished stimulus, later mostly rewarded stimulus.Shading indicates ± 1 S.E.(e) Distributions depicting the average per-subject probability (scattered dots) of choosing each stimulus while under placebo (shown in dark blue) and LSD (light blue).The mean value for each distribution is illustrated with a single dot at the base of each distribution, and the mean values for the probability of choosing different stimuli in each condition are connected by a line.Black error bars around the mean value show ± 1 S.E.Horizontal dotted line indicates chance-level 'stay' behaviour (33%).The global probability of choosing each stimulus did not differ between the placebo and LSD conditions.(f) Raw data measures of feedback sensitivity were unaffected by LSD.Distributions depicting the average per-subject probability (scattered dots) of repeating a choice (staying) after receiving positive or negative feedback under placebo (dark blue) and LSD (light blue).The horizontal dotted line indicates chance-level 'stay' behaviour (33%).