Aberrant reward learning, but not negative reinforcement learning, is related to depressive symptoms: an attentional perspective

Background. Aberrant reward functioning is implicated in depression. While attention precedes behavior and guides higher-order cognitive processes, reward learning from an attentional perspective – the effects of prior reward-learning on subsequent attention allocation – has been mainly overlooked. Methods. The present study explored the effects of reward-based attentional learning in depression using two separate, yet complimentary, studies. In study 1, participants with high (HD) and low (LD) levels of depression symptoms were trained to divert their gaze toward one type

An important field of research that has been mainly overlooked in this renewed view of anhedonia is the study of reward learning from an attentional perspectivethe effects of prior reward learning on subsequent attention allocation, also known as reward-based selection history or experience-based attention selection (Awh, Belopolsky, & Theeuwes, 2012). As attention precedes behavior and guides thought and higher-order cognitive processes, such as working memory and decision-making (Desimone & Duncan, 1995;Feldmann-Wüstefeld, Busch, & Schubö, 2019), reward-related attentional allocation seems vital for subsequent stages of reward processing. Put differently, exploring how one's learning of the (rewarding) value of specific stimuli affects the way attention is later allocated to those stimuli, when encountered, may shed much-needed light on ensuing anhedonia-related processes. Indeed, much research among healthy individuals has repeatedly shown that stimuli imbued with a rewarding value can later guide visuospatial attention allocation, even without conscious intent (Anderson, 2016(Anderson, , 2017Failing & Theeuwes, 2018;Gaspelin, Gaspar, & Luck, 2019;Schwark, Dolgov, Sandry, & Volkman, 2013).
Conversely, only few studies to date have explored the effects of reward-based selection history in depression, showing that while depressed individuals exhibit an intact ability to learn stimulusreward associations, these fail in producing subsequent changes in attention processes, characteristic of non-depressed individuals (Anderson, 2017;Anderson, Leal, Hall, Yassa, & Yantis, 2014;Brailean, Koster, Hoorelbeke, & De Raedt, 2014). While providing initial evidence for aberrant reward-based selection history in depression, these studies entail several limitations curbing our understanding of this important phenomenon. Three limitations are related to the quantification of attention allocation via reaction-time (RT)-based measures. First, as RT-based measures are derived from keypresses occurring at the very end of the information processing sequence, different attentional components taking place earlier in the process can only be indirectly inferred from facilitated/impaired performance, providing no information about the course and dynamics of attention deployment before or after the moment of measurement (Lazarov et al., 2019;Lee & Lee, 2014;Thomas, Goegan, Newman, Arndt, & Sears, 2013). Second, RT-based tasks exhibit poor psychometric properties, including low internal consistency and test-retest reliability (Brown et al., 2014;Draheim, Mashburn, Martin, & Engle, 2019;Rodebaugh et al., 2016;Schmukle, 2005;Staugaard, 2009;Waechter & Stolz, 2015), which are vital for trusting emergent results. Third, keypresses give rise to potential confounding elements related to the execution of the required motor responses (Hadwin & Field, 2010;Kimble, Fleming, Bandy, Kim, & Zambetti, 2010;Krajbich, Bartling, Hare, & Fehr, 2015), which is particularly relevant in depression due to psychomotor retardation (Caligiuri & Ellwanger, 2000). †1 Two additional shortcomings are related to the nature of rewards used during training/ learning, as prior research has exclusively used monetary rewards. First, monetary reward, considered a secondary rather than a primary reinforcer, is less of a motivational driving force for depressed individuals, who tend to exhibit a priori disinterest in maximizing monetary gain (e.g. Godara, Sanchez-Lopez, & De Raedt, 2019;Maddox, Gorlick, Worthy, & Beevers, 2012;Pizzagalli et al., 2008). Indeed, primary and secondary rewards are associated with different neurological pathways (e.g. Blood & Zatorre, 2001;Menon & Levitin, 2005;Sescousse, Caldú, Segura, & Dreher, 2013;Thut et al., 1997). Second, mirroring the RT-based nature of tasks used, rewards were delivered via reaction-based feedback for single trials, following a short time interval between the response and reward deliverance, rather than in a continuous 'online' manner that better reflects the dynamic nature of attention allocation. This is imperative for examining the influence of ongoing reward conditioning on continuous attentional allocation (Brailean et al., 2014).
The first aim of the study was to examine reward learning from an attentional perspective while addressing extant limitations of selection history research in depression (Anderson, 2017;Anderson et al., 2014;Brailean et al., 2014). Hence, here, rewardbased selection history was examined using an eye-tracking-based gaze-contingent music reward procedure (Lazarov, Pine, & Bar-Haim, 2017b;Shamai-Leshem, Lazarov, Pine, & Bar-Haim, 2021), in which ongoing musical reward feedback was provided for attention allocation to one type of stimuli over another (i.e. two types of shapes; rounded over angular), creating an association between the (rewarded) stimulus type and the (rewarding) music. Attention allocation was assessed pre-and post-training using a reliable free-viewing eye-tracking attention allocation task (Lazarov, Abend, & Bar-Haim, 2016;Lazarov, Ben-Zion, Shamai, Pine, & Bar-Haim, 2018;Lazarov et al., 2021a), presenting similar stimuli to those used in training, but without gaze-contingent music. Based on past research, we predicted a differential change pattern in attention allocation from pre-to post-training (i.e. near-transfer effects), such that this change would be greater among individuals with low levels of depression symptoms, compared with individuals with high levels of depression symptoms. Potential group differences in reward learning during training (i.e. online training) were also explored, although prior research shows no group differences on this measure.
While encouraging a specific behavior with rewards (i.e. positive reinforcement) is clearly relevant to anhedonia and depression (Carvalho & Hopko, 2011;Manos, Kanter, & Busch, 2010), the same behavior can be also strengthened by the removal of an aversive or negative stimulus for performing the desired behavior. This process is known as negative reinforcement (Abreu & Santos, 2008;Reinen et al., 2021) the removal of an aversive stimulus to increase the probability of a (desired) behavior being repeated (Gordan & Amutan, 2014). Thus, negative and positive reinforcement are similar in that both can be used to attain the same resultan increase in a (desired) behaviorbut via different reinforcing cues/stimuli. In depression and anhedonia, research on learning processes shows that negative reinforcement can facilitate learning processes better than positive reinforcement (Beevers et al., 2013;Chiu & Deldin, 2007;Eshel & Roiser, 2010;Hevey, Thomas, Laureano-Schelten, Looney, & Booth, 2017;Maddox et al., 2012;Reinen et al., 2021;Santesso et al., 2008). Relatedly, attention research shows depression to be associated with an attentional preference for aversive/dysphoric stimuli, over neutral or positive ones (Gotlib, Krasnoperova, Yue, & Joormann, 2004;Hamilton & Gotlib, 2008;Johnston et al., 2015;Rudich-Strassler, Hertz-Palmor, & Lazarov, 2022;Suslow, Husslack, Kersting, & Bodenschatz, 2020). One intriguing question in the present context is whether implementing the same gaze-contingent procedure while substituting the appetitive music reward (i.e. positive reinforcement) with the removal of an aversive sound (i.e. negative reinforcement) for performing the desired behavior (i.e. gazing rounded shapes), would yield similar learning patterns, within and following training. This † The notes appear after the main text.
constituted the study's second aim. Hence, we replicated the above-described procedure, using a new cohort of participants, with one pivotal changeduring training, gazing rounded shapes stopped an aversive white noise that would otherwise play.

Experimental tasks
Attention allocation assessment task. Attention allocation was assessed using an well-established eye-tracking-based free-viewing task (Lazarov et al., 2016(Lazarov et al., , 2018(Lazarov et al., , 2021a(Lazarov et al., , 2021b adapted for the present study (see online Supplementary Material for a full description). Briefly, participants freely viewed 30 4-by-4 shape matrices (i.e. 16 shapes per matrix), presented for 6000 ms each, with half of the shapes being without sharp angles (i.e. rounded shapes) and half having sharp angles (i.e. angular shapes; see Fig. 1, right panel, for an example). Attention allocation was quantified as dwell time percent (DT%) on rounded shapes (see below). Gaze-contingent training task. The training task was a modified version of the assessment task, designed to divert participants' attention toward rounded over angular shapes via music reward. Specifically, before each training block, participants chose a 12 minute music track (from an extensive music menu) to which they wanted to listen during the task. During each block, 30 successive shape matrices were presented, each for 24 s, with no inter-trial intervals. Importantly, the music played only when fixating one of the rounded shapes. Fixating one of the angular shapes stopped the music. Here, too, attention allocation was quantified as DT% on rounded shapes. See online Supplementary Material for a full task description.
Attention allocation (DT%). For each matrix, in both tasks, two areas of interest (AOI) were definedthe target AOI comprised of the eight (rewarded) rounded shapes, and the non-target AOI comprised of the eight (non-rewarded) angular shapes (see Fig. 1, left panel). Total dwell time (in milliseconds) on each AOI in each matrix (i.e. aggregating dwell time across the eight single shapes comprising the AOI) was calculated, and the proportion of dwell time (DT%) on the target AOI, relative to the total dwell time on both AOIs, was computed, reflecting attention allocation to rounded shapes on the matrix. DT% was then averaged across the presented matrices in a block (30 matrices).

General procedure
The general procedure is fully described in the online Supplementary Material (see also Fig. 2). Briefly, during day 1, participants completed the assessment task (i.e. pre-training assessment), followed by two training blocks (B1, B2), and then competed the self-report measures. During day 2, participants first completed two additional training blocks (B3, B4), followed by the post-training assessment task, and were then questioned for explicit rule learning.

Data analysis
Main analysis. Independent-samples t tests compared groups on descriptive characteristics (e.g. age, PHQ-9, BDI-II, STAI-T, SHAPS and BMRQ), with a χ 2 test comparing groups on gender ratio. An independent-samples t test was also used to compare groups on attention allocation (DT%) at pre-training assessment.
Attention allocation during training, termed online learning, was analyzed using a repeated-measures analysis of variance (ANOVA) for DT% on rounded shapes, with group (HD/LD) as a between-subject variable, and training block (B1-to-B4) as a within-subject variable. A χ 2 test was used to compare groups on explicit rule learning.
To examine learning generalization from pre-to post-training, termed near-transfer effects, a repeated-measures ANOVA for DT % on rounded shapes was used, with group (HD/LD) as a between-subject variable, and time (pre-training/post-training) as a within-subject variable. Follow-up analyses included separate paired-samples t tests to compare DT% on rounded shapes between pre-and post-training within groups, and an independent-samples t test was used to examine between-group differences at post-training assessment. To address within-trial changes in attention allocation during the assessment task, we also conducted a time-course analysis of attention allocation by entering Epoch as another within-subject variable to the abovedescribed ANOVA. Following extant eye-tracking-based attentional research exploring within-trial changes in attention allocation (Armstrong & Olatunji, 2012;Felmingham, Rennie, Manor, & Bryant, 2011;Kimble et al., 2010), each 6 s trial was divided into three 2 s time epochs (i.e. Epochs 1-3).
While no study to date has specifically explored differences between anxious and non-anxious individuals on attention learning/training, prior research has implicated anhedonia and deficient reward learning in anxiety disorders (e.g. Pike & Robinson, 2022;Taylor, Hoffman, & Khan, 2022). Hence, to rule out anxiety levels as a possible alternative explanation for significant between-groups results, we also conducted a repeatedmeasures analysis of co-variance, controlling for anxiety scores, for significant findings.
All analyses were two-sided, using α of 0.05. Effect sizes are reported in η 2 p for ANOVAs and Cohen's d for t tests. Analyses were carried with the 'stats' package in R, and visualized using the 'ggplot2' package (Wickham, 2011).
Sensitivity analysis. Each main analysis was followed by a sensitivity analysis to ensure that emergent null findings did not stem from lack of power (i.e. type II errors). As opposed to the main analysis in which DT% (on rounded shapes) was averaged across the 30 matrices per block, yielding a single index per block, here Psychological Medicine each single matrix (i.e. each single trial) was treated as a separate observation. Specifically, we conducted a mixed-effects linear regression with DT% on rounded shapes as the dependent variable, and introduced each matrix to the model as a separate observation, instead of collapsing the 30 matrices of each assessment/training block into a single observation. 2 Thus, for training, instead of having four observations per participant (i.e. 4 training blocks), each participant now provided 120 observations (i.e. 4 training blocks × 30 matrices per block), resulting in a substantially more powerful model. Similarly, in the assessment-phase model, each participant provided 60 observations (2 assessment blocks × 30 matrices per block) instead of only two. Participants were modelled as random factors to account for within-subject variance.
As in the main analysis, to rule out anxiety levels as a possible alternative explanation for significant between-groups results, we introduced anxiety scores to the model as a covariate. While groups did not differ on musical anhedonia, we decided to introduce BMRQ scores to the model to ascertain that musical anhedonia was unrelated to performance on the tasks.
We controlled the false discovery rate (FDR) with Benjamini and Hochberg FDR correction for multiple comparisons   (Benjamini & Hochberg, 1995). Effect sizes in the sensitivity analysis are reported with standardized β. Analyses were carried with the 'lmerTest' package in R (Kuznetsova, Brockhoff, & Christensen, 2017). Secondary analysis. As HD participants were not formally assessed per the diagnostic criteria for depression, and to account for within-group heterogeneity in PHQ-9 scores, we repeated our analysis using PHQ-9 scores as a continuous predictor in a mixed-effects linear model. As noted for the sensitivity analysis, each single matrix was treated as a separate observation while modeling participants as random effects. To address the potential effects of anhedonia (SHAPS scores) on emergent findings (i.e. the two groups also differed on SHAPS scores, also showing within-group heterogeneity), while accounting for its multicollinearity with PHQ-9 scores, we replicated this analysis in separate models with SHAPS scores, rather than PHQ-9 scores, as the predictor.

Results
Data and codes for all analyses are openly available in Open Science Foundation.

Sample characteristics
Demographic and clinical characteristics by group are described in Table 1 (left panel). The HD group scored significantly higher on depression (PHQ-9; BDI-II), trait anxiety (STAI-T), and anhedonia (SHAPS) ( p < 0.001). No group differences emerged for age, gender ratio, or musical anhedonia (BMRQ).
Online learning (DT% during training) DT% on rounded shapes by group and training block is shown in  Table S1). Results were replicated with PHQ-9/ SHAPS scores as a continuous predictor in mixed-effects linear models (see online Supplementary Table S9/S10, respectively).

Psychological Medicine 5
% on rounded shapes at post-training was significant, t (56) = −2.15, p = 0.036, Cohen's d = −0.57. The group-by-time interaction effect remained significant after controlling for anxiety levels, F (1,56) = 7.82, p = 0.007, η 2 p = 0.12. Additional analyses showed that Epoch had no effect on DT% on rounded shapes and did not significantly interact with either time or group (see online Supplementary

Measures
Same measures were administered in study 2. Yet, as white noise, not music, was used as the reinforcer, rather than assessing musical anhedonia, we assessed resilience to adverse events via the Connor-Davidson Resilience Scale (CD-RISC; Campbell-Sills & Stein, 2007) and noise annoyance using a single question developed and recommended by the International Commission on Biological Effects of Noise (ICBEN; Fields et al., 2001). See online Supplementary Material for a detailed description of these measures.

Procedure, tasks, and measures
The procedure was identical to that of study 1, but with one crucial changerather than music playing when fixating one of the rounded shapes, here gazing one of these shapes (the target AOI) stopped an aversive white noise that would otherwise play. See online Supplementary Material for additional information on the procedure, tasks, and measures.

Data analysis
The statistical approach was similar to that of study 1.

Results
Data and codes for all analyses are openly available in Open Science Foundation.

Sample characteristics
Demographic and clinical characteristics by group are described in Table 1 (right panel). As in study 1, the HD group scored  Nimrod Hertz-Palmor et al.
Online learning DT% on rounded shapes by group and block is shown in Fig. 3a [right panel; see Fig. 3b (right panel) for individual trajectories]. Akin to study 1, only a main effect of block emerged, F (3,168) = 17.5, p < 0.001, η 2 p = 0.24, reflecting the intended increase in attention allocation (toward rounded stimuli) during training in both groups. The sensitivity analysis confirmed these results (online Supplementary Table S3). Like study 1, results were replicated with PHQ-9/SHAPS scores as a continuous predictor (see online Supplementary Table S13/S14, respectively).
Explicit rule learning No significant group difference was noted for explicit rule learning [HD: 15 (53.6%) learners; LD: 13 (43.3%) learners], χ 2 (1) = 0.27, p = 0.61. Akin to study 1, additional analyses showed that Epoch had no effect on DT% on rounded shapes and did not significantly interact with either time or group (see online Supplementary

Additional analyses
To further explore the emergent data across both studies, the below-described additional analyses were conducted. For a complete description of all data analyses and results, see online Supplementary Material. All analysis codes are openly available in Open Science Foundation.
First, to better elucidate the discrepancy between the two studies in the group-by-time interaction effect of learning generalization (i.e. near-transfer effect), we conducted an integrated group-level analysis using a unified model consisting of all participants from both studies (N = 116). Results confirmed that the discrepancy between the two studies was statistically significant [i.e. a group (HD/LD)-by-time (pre-post)-by-reinforcer (music/ white noise) interaction; see online Supplementary Table S5].
Second, to better understand the emergent learning processes, individual eye-tracking gaze data were analyzed at the individual level, taking a within-person approach. Individual-level analyses included exploration of: (1) learning magnitude (for both online learning and near-transfer effects); (2) predicting learning (exploring whether specific changes in DT% between subsequent training steps could predict online learning and near-transfer effects); (3) (online) learning speed; and (4) (online) learning patterns (i.e. cluster analysis).
For learning magnitude, results replicated those of the grouplevel analysesno group differences in online learning with either reinforcer, with a significant group difference on near-transfer, but only when reinforced with music (see Fig. 4 for descriptive individual trajectories of online learning; 4A for music and 4B for white noise). The associations pattern between the three learning indices (online learning, near-transfer, explicit rule learning) further supported this result (see Fig. 5).
For predicting learning, results showed that matrices 6-10 (i.e. change in DT% from matrices 1-5 to matrices 6-10 matrices) were the only assemblage that consistently predicted online learning in both groups under both reinforcers. For near-transfer effects this assemblage was predictive among LD participants under both types of reinforcers, while among HD participants it was predictive only under white noise (see online Supplementary Table S6).
Zooming in on the above-emergent 'hot-spot' (matrices 1-10), learning speed results showed no group differences in speed under the music reinforcer. Conversely, HD participants showed faster learning compared with LD participants when reinforced with white noise.
Finally, the cluster analysis yielded three learning patternsquick learners, slow learners, and non-learners, which differed significantly on their respected learning trajectories (see Fig. 6). Cluster distribution did not differ between reinforcer types, which was also independent of group under both music and white noise. Conversely, cluster was associated with explicit rule learning under both reinforcers. Comparing clusters on near transfer effects (i.e. learning magnitude) showed that both learner types (quick, slow) showed significantly higher learning than nonlearners, under both reinforcers, with the two learner types not differing under either music or white noise. See online Supplementary Tables S_CA1-CA15.

Discussion
The present study examined reward learning in depression from an attentional perspective. In study 1, individuals with high (HD) and low (LD) levels of depressive symptoms underwent a novel gaze-contingent music reward learning procedure while their attention allocation to rewarded and non-rewarded stimuli was examined, during and following training. While no group differences in learning emerged during training, groups differed significantly in their attention allocation at post-trainingunlike LD participants, HD participants showed no learning-related changes post-training. In study 2, a similar procedure with negative (i.e. white noise), rather than positive (i.e. music), reinforcement yielded no group differences, with both groups showing the intended change in attention allocation post-training. Results of both studies were maintained when controlling for anxiety, and were replicated when using a more powerful sensitivity analysis and when treating depression scores as a continuous variable/predictor.

Psychological Medicine
The impaired near-transfer effect following reward learning in HD participants concur with past research showing blunted reward responsiveness and impaired reward learning in depression and anhedonia, from both a neuroscience (Borsini, Wallis, Zunszain, Pariante, & Kempton, 2020;Eshel & Roiser, 2010;Keren et al., 2018;Luking, Pagliaccio, Luby, & Barch, 2016;Pizzagalli, 2022;Whitton, Treadway, & Pizzagalli, 2015), and a behavioral perspective (Eshel & Roiser, 2010;Halahakoon et al., 2020), while elaborating extant knowledge to the realm of attention. This lack of near-transfer effects is also in line with the few early RT-based studies of selection history in depression, that also showed less attentional capture post-training by previously rewarded stimuli in individuals with high depressive symptoms (Anderson et al., 2014). Yet, elaborating on these earlier studies, here, eye-tracking methodology was used to assess attention allocation following training, rather than RT-based  Nimrod Hertz-Palmor et al.
attentional measures derived from manual keypresses, which enabled the exploration of the time course and dynamics of attention deployment (Lazarov et al., 2019). Relatedly, eye tracking was also used in the reward training/learning procedure itselfreward (i.e. the music) was delivered in a continuous gaze-contingent 'online' manner, rather than following a short time interval

Psychological Medicine 9
after the manual response (i.e. the keypress), better corresponding with the dynamic nature of ongoing attention allocation (Brailean et al., 2014). Finally, music reward, considered a primary reinforcer, was used, rather than monetary reward, considered a secondary reinforcer which is less motivating for depressed individuals (e.g. Godara et al., 2019;Maddox et al., 2012;Pizzagalli et al., 2008). The fact that groups did not differ on musical anhedonia, also found to be unrelated to performance on the tasks, suggests that current findings cannot be attributed to group differences on the rewarding value of the music (i.e. the liking aspect of anhedonia). Unlike group differences in near-transfer following reward learning (study 1), no corresponding group differences were noted when using a similar negative-reinforcement procedure (study 2), echoing previous (non-attentional) research showing enhanced sensitivity to negative outcomes among depressed individuals (Baek et al., 2017;Beevers et al., 2013;Chandrasekhar Pammi et al., 2015;Hevey et al., 2017;Johnston et al., 2015;Maddox et al., 2012;Reinen et al., 2021;Santesso et al., 2008;Smoski et al., 2008;Trew, 2011), while expanding extant knowledge to the realm of attention. Our integrated and exploratory analyses further support the difference between the effects of positive and negative reinforcement among HD and LD individuals. Specifically, results showed that while positive and negative reinforcements yielded similar online learning in both groups, neartransfer effects (the intended shift in attention post-training) were noted under both reinforcements only among LD participants, while HD participants presented near-transfer effects exclusively under negative reinforcement. This suggests that for HD individuals, aversive reinforcers may yield better learningbased shifts in attention, compared with positive reinforcers, reflecting a specific aberration in reward-related selection history in depression. This is further supported by the emergent correlations between the three learning indices (i.e. online learning, explicit rule learning, and near-transfer effects), which were positively associated among LD participants regardless of reinforcer type. Conversely, association with near-transfer effects among HD participants emerged only when using negative, but not positive, reinforcement. Predicting learning based on the first 10 training matrices echoed these findings, as these matrices predicted online learning in both groups under both reinforcer types, but were predictive of near-transfer effects under both reinforcer types only among LD participants. For HD participants, prediction emerged only under the white noise reinforcer. Taken together, these results echo the 'inverse functionality' effecthypo-and hyper-striatal activity among depressed individuals in response to reward and punishment, respectively (Groenewold, Opmeer, de Jonge, Aleman, & Costafreda, 2013;Johnston et al., 2015;Scheuerecker et al., 2010;Ubl et al., 2015), highlighting the potency of negative reinforcers in the facilitation of learning and attention modulation among depressed individuals. While this phenomenon is well-established in neuroscience, it has been relatively neglected in behavioral research, including attention.
Unlike the divergent results of the two studies/reinforcers when exploring near-transfer effects, examining performance during training (i.e. online learning) yielded similar results in both studiesboth HD and LD participants showed the intended increase in attention allocation toward rounded shapes (online learning), echoing previous research on learning in depression, using both positive (e.g. Anderson, 2017;Anderson et al., 2014) and negative reinforcement (e.g. Maddox et al., 2012; Reinen

10
Nimrod Hertz-Palmor et al. et al., 2021). Our cluster analysis of gaze data during training further supports the notion that HD and LD participants do not differ in online learning, regradless of reinforcer type. Specifically, while three significant clusters emerged (i.e. quick learners, slow learners, and non-learners), cluster distribution did not differ between the two reinforcer types (positive, negative) or between groups (HD, LD) under both the music and white noise conditions. Surprisingly, exploring learning speed during the first 10 training matrices showed a higher learning speed in the HD group, compared to the LD group, when reinforced with white noise, but not when reinforced with music, which may represent a possible 'compensation' mechanism enabling generalization of learning under negative reinforcement. Put differently, heightened experienced averseness of negative outcomes among HD individuals may better motivate or enable generalization of learning, which is absent when encountering rewards. Thus, 'escaping' aversive stimuli might be better embedded and reflected in subsequent selection history. The fact that compared with LD participants, HD individuals scored significantly lower on resilience to adverse events, and scored higher on noise annoyance (albeit at trend level), supports this suggestion, with noise annoyance also predicting attention allocation during training with white noise (at trend level).
Several limitations should be acknowledged. First, participants were individuals with high and low levels of depression symptoms. While stringent inclusion criteria were used (i.e. two depression measures at screening; score stability on the PHQ-9 3 ), future research should replicate the present study using clinically diagnosed MDD patients. Still, using depression scores as a continuous, rather than a grouping, variable yielded similar results, strengthening our confidence in current findings. Second, as the present study aimed to explore selection history in depression, building on past research in the field (Anderson, 2017;Anderson et al., 2014;Brailean et al., 2014), participants were recruited based on depression scores. It is very likely, however, that anhedoniaa key feature of depression (American Psychiatric Association, 2013)plays a primary role in rewardbased selection history in depression, possibly contributing to emergent results. Indeed, our sensitivity analysis with anhedonia scores yielded similar results to those obtained with depression scores. To further elucidate the specific role of anhedonia in reward-based selection history, future research could replicate the present one while recruiting participants based on anhedonia symptoms (e.g. SHAPS scores), or specifically recruit those high on anhedonia but low on depression. Relatedly, as anhedonia is a clinical feature of additional psychopathologies (e.g. PTSD), future research could also replicate the present study in these other conditions. Third, the present study did not include a follow-up assessment of attention allocation to examine the stability of near-transfer effects over time, which is especially important for the effects noted for HD participants under negative reinforcement (study 2). Future research in depression should explore this, as previously done for positive-reinforcement procedures Shamai-Leshem et al., 2021). Fourth, as non-emotional geometrical shapes were used, we could not explore whether negative reinforcement could counter/overcome attention biases to negative information characterizing depressed individuals (Suslow et al., 2020), especially as positive-reinforcement procedures have failed in doing so (Shamai-Leshem et al., 2021). Future studies could replicate the present study using emotional stimuli (e.g. sad/happy faces).
Fifth, rounded shapes were randomly chosen by the research team to serve as the target shape type, assuming no a priori differences between groups on attentional preference for rounded vs. angular shapes. Indeed, the pre-training assessment task showed no group differences in DT% (which was around 0.5 in both groups across both studies). Yet, we encourage future research to counterbalance the angular vs. rounded shapes as target shapes. Finally, reward-based selection history was explored using an established gaze-contingent paradigm previously used in depression (Shamai-Leshem et al., 2021). While advantageous in some aspects, this paradigm is 'deterministic' in nature as reinforcement is delivered using a 100% ratioeach fixation on rounded shapes resulted in music playing (study 1)/removal of noise (study 2). Hence, it does not entail the trial-wise dynamics of probabilistic reinforcement learning tasks used in past research on selection history in depression (Anderson, 2017;Anderson et al., 2014). As probabilistic reinforcement learning tasks have shown a bidirectional interaction between attention allocation and trial-and-error reinforcement learning processes (e.g. Leong, Radulescu, Daniel, DeWoskin, & Niv, 2017), future research could incorporate non-100% (positive or negative) reinforcement ratios within the present paradigm.
Current findings may have some clinical implications, especially for reinforcement-based interventions. In attention, research has utilized gaze-contingent attention modification procedures to modify patients' (biased) attention to dysphoric over positive/neutral stimuli (for reviews see Gotlib & Joormann, 2010;LeMoult & Gotlib, 2019;Suslow et al., 2020), hoping to alleviate depression symptoms (Möbius, Ferrari, van den Bergh, Becker, & Rinck, 2018;Shamai-Leshem et al., 2021;Woolridge, Harrison, Best, & Bowie, 2021). Especially relevant is a recent randomized control trial that used a similar gaze-contingent music reward procedure to divert participants' attention away from sad and toward happy faces (Shamai-Leshem et al., 2021). While online learning was observed during training, no significant differences in symptom reduction were noted between the active and a placebo group that received non-contingent music throughout training (see also Möbius et al., 2018;Woolridge et al., 2021 for similar null findings). Importantly, the two groups also did not differ on pre-to-post changes in attention allocation (i.e. near-transfer effects). Considering current results, this lack of clinical efficacy may be attributed to aberrant reward-based selection history in depression, namely, failure to induce experiencedbased shifts in attention following training. Put differently, if near-transfer effects are not achieved post-training, why should far-transfer effects (i.e. symptom change) follow? (Lazarov, Abend, Seidner, Pine, & Bar-Haim, 2017a). Indeed, using the same procedure in social anxiety showed a significant reduction in attention allocation post-treatment, sustained 3 months following training (Zhu et al., 2022), which partially mediated a significant reduction in symptoms (Lazarov et al., 2017b). Thus, present findings strengthen the specificity of aberrant reward-based selection history to depression. Taking a broader perspective, present results may be also relevant for other interventions, such as behavioral activation, that aim to induce or restore positive affect in depressed patients by encouraging rewarding activities (Hopko, Lejuez, Ruggiero, & Eifert, 2003). Yet, due to reward bluntness, this most often remains an unmet therapeutic goal (Craske et al., 2019). Current results accentuate the intricacy, including attentional ones, of relying on hedonic capacity in MDD interventions.
To conclude, present results implicate aberrant selection history in individuals with high levels of depression symptoms, but

Psychological Medicine 11
only when based on positive reinforcement, that is, when using rewards. When negative reinforcement is used, no deficits emerge. Current findings may offer future avenues for both clinical and basic research on learning and attention in depression and anhedonia, highlighting the efficacy of negative, over positive, reinforcement in producing experience-based changes in attention allocation. Notes 1 Motor confounds in attentional research can be also addressed when using reaction-time-based tasks (i.e. tasks entailing keypresses) by either modeling these using, for example, sequential-sampling models, or by using conditions where motor demands are matched, but factors expected to affect information processing are not. 2 Random intercepts were allowed to vary at the participant level (to estimate whether non-significant effects potentially resulted from type-II errors by enhancing statistical power, while accounting for inter-individual differences in baseline DT% on rounded shapes). 3 Future research should assess score stability of both depression measures (i.e. PHQ-9 and BDI-II).