Pupillary response in reward processing in adults with major depressive disorder in remission

Abstract Objective: Major depressive disorder (MDD) is associated with impaired reward processing and reward learning. The literature is inconclusive regarding whether these impairments persist after remission. The current study examined reward processing during a probabilistic learning task in individuals in remission from MDD (n = 19) and never depressed healthy controls (n = 31) matched for age and sex. The outcome measures were pupil dilation (an indirect index of noradrenergic activity and arousal) and computational modeling parameters. Method: Participants completed two versions (facial/nonfacial feedback) of probabilistic reward learning task with changing contingencies. Pupil dilation was measured with a corneal reflection eye tracker. The hypotheses and analysis plan were preregistered. Result: Healthy controls had larger pupil dilation following losses than gains (p <.001), whereas no significant difference between outcomes was found in individuals with a history of MDD, resulting in an interaction between group and outcome (β = 0.81, SE = 0.34, t = 2.37, p = .018). The rMDD group also achieved lower mean score at the last trial (t[46.77] = 2.12, p = .040) as well as a smaller proportion of correct choices (t[46.70] = 2.09, p = .041) compared with healthy controls. Conclusion: Impaired reward processing may persist after remission from MDD and could constitute a latent risk factor for relapse. Measuring pupil dilation in a reward learning task is a promising method for identifying reward processing abnormalities linked to MDD. The task is simple and noninvasive, which makes it feasible for clinical research.


Introduction
Major depressive disorder (MDD) is the most common mental health condition worldwide. It is highly disabling and associated with a wide range of negative outcomes including suicide, low occupational achievement, and other forms of psychopathology (Bernaras et al., 2019). Treatments for MDD remain only partially effective, with rates of clinical improvement around 50% (Cuijpers et al., 2014;Fournier et al., 2010). Even after successful treatment, the risk of relapse is high, with 50-80% experiencing additional depressive episodes during their life time (Burcusa & Iacono, 2007;Holmes et al., 2018).
A core symptom of MDD is anhedonia, or a reduced ability to experience pleasure and positive affect. Symptoms of anhedonia are often treatment resistant and may even increase in severity with standard pharmacological treatments, e.g. (Craske et al., 2016). Behavioral and neurophysiological studies have shown that MDD, and anhedonia in particular, is linked to atypical processing of rewards (Cooper et al., 2018). Previous studies have documented reward processing impairments in MDD. However, the literature is inconsistent about to which extent these impairments reflect acute MDD symptoms and to whether they constitute vulnerability markers that are not state-dependent. Specifically, whether they extend to currently asymptomatic individuals at elevated risk such as patients in remission (rMDD) or relatives of individuals with MDD Cléry-Melin et al., 2018). A large body of research has examined the role of dopamine (DA) in reward processing, both in healthy individuals (Schultz, 2007) and in MDD Cléry-Melin et al., 2018). However, recent studies have shown that reward processing and decision making in healthy individuals are also closely linked to the locus coeruleus-noradrenergic (LC-NE) system (Braem et al., 2011;Van Slooten et al., 2018). This research has primarily used pupil dilation, an index of arousal, modulated by activity in the LC-NE and cholinergic systems (Samuels & Szabadi, 2008).
So far, pupil dilation and LC-NE arousal have not been examined during reward learning in relation to rMDD. Research addressing this gap in the literature could be informative about the neural mechanisms underlying altered reward processing in MDD. Recently, Schneider et al. (2020) examined pupillary responses during reward anticipation in currently depressed individuals. Although no group differences were found between depressed and healthy individuals, blunted pupillary responses during reward anticipation were strongly correlated with the level of depressive symptoms.
A small number of previous studies have examined pupil dilation in rMDD, although not in the context of reward learning tasks. Two of these studies reported reduced pupil dilation to negative words in patients with MDD in remission, compared to nonremitters or healthy controls (Siegle et al., 2011;Steidtmann et al., 2010). Pupil dilation has also been examined as a longitudinal predictor or MDD recurrence. Kudinova et al. (2016) found that reduced pupil dilation to images of sad faces predicted recurrence of MDD during a 2-year follow-up period in women with a history of rMDD. When analyzing responses to angry faces, the authors found evidence for a quadratic relationship, with both blunted and enhanced reactivity predicting recurrence of MDD.
The aim of the present study was to compare reinforcement learning and pupillary indices of reward processing in currently euthymic individuals with a history of MDD and in never depressed healthy controls. The background to this research question is introduced in the following sections.

Reward processing in MDD
In healthy individuals, reward processing is linked to a group of interacting brain mechanisms including the ventral striatum and the medial frontal cortex Le Heron et al., 2018;Schultz, 2007). Behaviorally, reward processing can be divided into several subprocesses including expectation of reward ("wanting"), the consumption of the reward ("liking"), and learning of reward contingencies (Der-Avakian & Markou, 2012;Rømer Thomsen et al., 2015). Importantly, shared reward processing mechanisms have been found for a wide range of rewards, including money, food and drink, and symbolic rewards, such as the number of points gained in a game (Levy & Glimcher, 2012;Schultz, 2007). On a behavioral level, individuals with ongoing MDD display deficiency across all three stages of reward processing Cléry-Melin et al., 2018). Consistent with this, atypical brain responses in regions linked to reward processing have been observed in patients with MDD, including reduced activation in the ventral striatum and the anterior cingulate cortex (ACC; Chentsova-Dutton & Hanley, 2010;Wacker et al., 2009). One research area that has begun to unravel the link between reward processing and psychopathology is computational psychiatry, which applies, for example, reinforcement learning models that capture the internal learning and evaluation processes in a person. Reinforcement learning models can be applied to reward-related decision making in uncertain and volatile environments, arguably providing ecologically valid measures of reward processing in everyday life (Huys et al., 2016).
Computational modeling studies have documented a range of abnormalities in reward processing in MDD. For example, individuals with MDD show less exploration than healthy peers during reward learning (Cella et al., 2010;Huys et al., 2013), specifically, they deviate significantly more from an optimal model (Blanco et al., 2013). Further, there is research indicating that the central feature in decision making in anhedonic patients is linked to the motivational aspects of reward behavior (Treadway & Zald, 2011). The results echo findings indicating that MDD are less willing to expend effort on rewards, less skilled at discovering the underlying task characteristics (Treadway & Zald, 2011), and display less efficient learning in a probabilistic response task (Pizzagalli et al., 2008). Importantly, individuals with MDD show learning impairments regardless of outcome value (win/loss) and environment (reward/punishment) in volatile, probabilistic tasks (Gagne et al., 2020).

Reward processing as a risk marker for MDD
Recent research suggests that reward abnormalities are not only seen in symptomatic individuals, but may also persist after remission (Pechtel et al., 2013;Weinberg & Shankman, 2017;Whitton et al., 2016), and may be present in nonaffected relatives of patients with MDD (Gotlib et al., 2010). If this is correct, abnormal reward processing could be a trait-like and heritable vulnerability factor that increases the risk for MDD. However, the literature is not consistent. Two studies using event-related potentials (ERPs) reported blunted responses to rewards (Weinberg & Shankman, 2017;Whitton et al, 2016). Similarly, behavioral indices of reduced reward responsiveness in rMDD were found by Pecthel et al.
(2013) using a probabilistic reward learning task. In contrast to these findings, one study reported similar affective responses (indicated by subjective ratings) to rewards in individuals with rMDD and controls without a history of MDD, whereas both groups differed from currently depressed individuals (McFarland & Klein, 2009). One study using functional magnetic resonance imaging (fMRI) reported normalized activity in frontostriatal brain regions associated with reward processing after successful psychological treatment (Dichter et al., 2009). Surprisingly, one study (Ubl et al., 2015) reported enhanced neural responses to rewards rather than the expected blunted activity in the amygdala and hippocampus in individuals with rMDD compared to healthy controls. To sum up, further research is needed to understand the nature and extent of reward processing atypicalities in rMDD.

Pupil dilation as an index of reward processing and decision making
Pupil dilation increases during processing of motivationally salient, novel, or cognitively demanding stimuli and is closely linked to arousal and activity in the brain's LC-NE and cholinergic systems (Kleberg et al., 2019(Kleberg et al., , 2020Samuels & Szabadi, 2008). The LC-NE system interacts with brain areas implicated in aberrant reward processing in MDD such as the ACC and the amygdala (Aston-Jones & Cohen, 2005;Samuels & Szabadi, 2008). Typically, pupil dilation is induced by both positive and negative stimuli (e.g., both rewards and losses; Laeng et al., 2012). Recent studies have also shown that pupil dilation may map onto specific reinforcement learning processes such as the degree of exploration and rate of learning (Manohar & Husain, 2015;Van Slooten et al., 2018) and the uncertainty of the environment (Nassar et al., 2012). Taken together, this suggests that disrupted processing of rewards in MDD may be indexed by altered pupil dilation. However, pupil dilation during reward processing has not been examined in individuals with MDD or rMDD. Burkhouse et al. (2016) recently reported that adolescents with ongoing MDD as well as rMDD showed increased pupil dilation to emotional faces compared to healthy controls. This study, though, did not examine reward processing specifically.

Preregistration
The analysis plan and hypotheses were preregistered in the Open Science Framework (link: https://osf.io/tk4ac/).

Research questions and hypotheses
In the present study, we investigated pupil dilation in relation to reinforcement learning processes in a probabilistic learning task in an rMDD and a healthy control group. We hypothesized the following group differences: 1. Blunted pupillary reactivity to losses and gains in the rMDD group 2. Blunted pupillary reactivity in expectation of gains and losses (e.g., after a choice has been made, but before the outcome has been presented). 3. Lower learning rate (alpha) in the rMDD group. 4. Less exploration (higher beta values) in the rMDD group.

Experimental paradigm
The experimental paradigm is shown in Figure 1. In the experimental task, participants chose repeatedly between two different stimuli. Choosing one of the stimuli was associated with high probability of winning a point (85%), but also with a low probability of losing a point (15%). The other stimulus has the reverse probabilities. Feedback learning was therefore needed to successfully master the task. The contingencies changed five times during the course of the experiment (after 15 trials), meaning that a change of strategy was needed for optimal performance. Participants completed 75 trials, with changing contingencies after each 15th trials. Each trial began with a fixation cross presented for 1.5 s, followed by two stimuli presented to the left and right during a random interval ranging between 0.2 and 1 s, after which participants were instructed to make a choice by a key press when a question mark ("?") appeared on the screen. Each stimulus covered approximately 5.25°of the visual field vertically and horizontally and was presented at 5.5°eccentricity from the center of the screen. Following the key press, a 1.5 s interval (the expectation period) followed during which the stimuli remained on screen and the chosen stimulus was marked by a rectangle. Subsequently, the outcome was presented (the feedback period) during 2 s. Two versions of the task were completed in randomized order. In the facial feedback condition, a loss is indicated by an angry face and a win by a happy face. In the nonfacial feedback condition, a loss is indicated by a stylized hand with the thumb pointing downward, and a win by the same hand with the thumb pointing upwards. Feedback stimuli covered approximately 6.18°of the visual field horizontally and 3.9°vertically. Positive and negative feedback stimuli were matched for luminance within each condition using the SHINE toolbox (Willenbockel et al., 2010) implemented in MATLAB (Mathworks, Inc.), but stimuli were not matched across conditions. Participants started the experiment with 100 points.

Participants
Participants were recruited from a database of individuals expressing their interest to participate in research administered by the Karolinska Institute. Exclusion criteria were the presence of any current psychiatric condition (including MDD) or substance abuse. Participants were informed about these exclusion criteria in written form before signing up, and subsequently prescreened before the experimental session. In total, 50 individuals agreed to participate and were tested (no participant fulfilled the exclusion criteria). After recruitment, all participants were interviewed for current and lifetime history of mental health conditions according to diagnostic criteria in the DSM-5 (American Psychiatric Association, 2013) by experienced clinical psychologists using the Mini International Neuropsychiatric Interview (MINI; Sheehan et al., 1998). MINI is a semi-structured clinical interview for mental disorders, which is validated for identifying MDD and other mental health conditions when used by trained clinicians (Lecrubier et al., 1997;Sheehan et al., 1998). Participants were included in the rMDD group if the assessment indicated that they had fulfilled DSM-5 criteria for one or more MDD episodes during any time point in their lifetime regardless of whether they had received treatment and/or a diagnosis by a professional. The assessment was not complemented by medical records.
Based on the interview, 19 individuals were characterized as having a history of MDD (henceforth the rMDD), while 31 did not have a life time history of MDD or other major psychiatric disorders (healthy control). All individuals in the rMDD group were asymptomatic at the time of testing according to the clinical interview and had been in full remission for at least 4 months. One individual in the rMDD group had a previous diagnosis of generalized anxiety disorder (GAD). No other individual had any previous or ongoing psychiatric disorder. Further, the two groups were matched for age and gender proportion. Demographic information along with p-values for gender proportions, age differences, response times, and proportion of interpolated samples are shown in Table 1. As can be seen, groups did not differ in median response times or the proportion of interpolated samples in the pupil dilation analysis. Due to a technical error, eight individuals were missing data from the facial feedback condition, and ten individuals were missing data from the nonfacial condition. Four individuals in the rMDD group were on stable psychotropic medication (see Table 1). Exclusion of these participants did not change any of the results, and they were therefore retained. Power analyses using the R package simr (Green & MacLeod, 2016) indicated that the study had power above 80% to detect fixed effects explaining 5% or more of the total variance at an alpha level of .05, given the number of trials, and assuming a range of possible random effects variances.

Design
The design was a 2 x 2 mixed factorial, with rMDD as between-subjects factor and stimulus (face/no face) as within-subjects factor.

Recording and processing of eye tracking data
Eye tracking data were recorded using a Tobii X-120 corneal reflection eye tracker (Tobii Inc, Danderyd, Sweden), which samples gaze at 120 HZ and pupil size at 40 HZ. Data were recorded using the Tobii MATLAB SDK, and stimuli were presented in MATLAB using PsychToolbox (Kleiner et al., 2007) for MATLAB (Mathworks, Inc). Participants were asked to look at the screen during the entire experiment. No head fixation was used.
Raw data were processed using custom scripts written in MATLAB. The pupil data were filtered according to procedures described in an earlier publication from our group (Kleberg et al., 2019). The pupil size of the left and right eye was averaged. Gaps in the data shorter than 150 ms (for example blinks or movement artefacts) were replaced through linear interpolation, and the data were subsequently filtered by a moving median filter with a window corresponding to 100 ms. Previous studies have shown that corneal reflection eye trackers can both under-and overestimate pupil size when the distance between gaze position and the center of the screen increases (Brisson et al., 2013;Hayes & Petrov, 2016). We tried to minimize gaze position artefacts by presenting stimuli at a short distance from the centre of the screen and counterbalancing the horizontal position of the stimulus associated with a higher probability of being reinforced (see Experimental Paradigm). On average, 89.6% (SD = 9.9%) of the valid recorded gaze samples were within the area covering the stimuli. As can be seen in Table 1, no group differences were found in average Euclidean distance from the center of the screen. A post hoc analysis (see Supplementary materials) showed a very weak relation (average r = −.07, SD = 0.11) between Euclidean distance from the center of the screen and pupil size.
The average proportions of interpolated samples per participant and condition are shown in Table 1. As can be seen, no group differences were found (all p > .5). For a detailed description of the preprocessing of the eye-tracking data, we refer to the Supplemental materials.

Computational modeling of behavioral data
We fitted behavioral data with computational models for twoarmed bandits. Each participant's data in each contingency was fit to five different models in order to assess what model best captured the participants' behavior. In essence, two different versions of two models were compared, as well as a random response model. One version departed from the idea that participants applied a simple heuristic, win-stay-lose-shift, that is, staying with an option if you win and shift if you lose. In our version, we also assumed that participants switched away from the rewarding action with a certain probability. The other version departed from the idea that participants take the previous outcomes into account and learn the long-term value of choosing an action, the Q-learning model. The winning model was the Q-learning model (Watkins & Dayan, 1992), a variant of the Rescorla Wagner model. For computational details, model comparisons, and parameter recovery, we refer to the Supplemental materials. The winning Q-learning model has two free parameters: learning rate (α) ranging from 0 to 1 and determining the degree to which expected values are updated according to the recent outcome and exploration rate β ranging from 0 to ∞, determining the balance between exploration and exploitation.

Statistical analyses
For all the models with pupillary response as dependent variable, we excluded trials where the response was 3 < or > 3 SDs from the individual mean, resulting in 48 participants, which were included in all analyses. For the analyses containing alpha and beta, we excluded the no-face contingency for one participant who were choosing the same alternative throughout all trials in that contingency, resulting in 49 participants. Due to a software problem, data from the face contingency were lost for eight participants and from the nonfacial condition for ten participants. For this reason, we double-checked all analyses including stimulus with data only containing participants who had done both contingencies. We used two types of models: (i) mixed linear models for pupillary response, which allowed us to capture the variance for each participant and include all trials for each participant and (ii) Wilcoxon signed-rank tests for alpha and beta values, since they were not normally distributed. We departed from Bates et al.'s recommendations for model comparison of mixed linear models (Bates et al., 2015). First, we specified three intercept models (for each outcome variable) with different complexity of the random effects. Next, we specified two models with different complexity of the fixed effects using the random effects structure from the winning model in step one. The decision criterion was maximum likelihood estimate. For model comparison details, we refer to the Appendix, Tables A2-A3.

Ethical approval
The study protocol was approved by the regional ethics committee of Stockholm, Sweden (DNR: 2018/1218-31, 2019-03603). Figure 2 shows pupil dilation as a function of time in the healthy control and rMDD groups for gain and loss trials. Table 2 shows descriptive and inferential statistics from model parameters (α and β), mean score on the last trial, and differences in proportions of correct choices for each group. There were no significant differences between the groups for the model parameters, but there was a significant difference between groups on mean score on the last trial and percentage of correct choices over all trials. The proportion of correct choices per block and group is illustrated in Figure 3.

H1. Blunted pupillary reactivity to losses and gains in rMDD
In order to investigate whether the rMDD group displayed blunted pupillary responses to losses and gains, we ran a mixed linear regression with pupillary dilation after feedback as outcome variable and outcome (gain/loss) and group (rMDD/healthy control) as fixed effects, and block (5 in total) and ID as random effects. The best model included an interaction between group and outcome as fixed effects and a correlated random intercept for each participant and slope for each block: pupil response ∼ group * outcome þ (block|ID) Results are shown in Figure 4 and Table 3. There was an effect of outcome (β = − 1.12, SE = 0.21, t = − 5.28, p < .001), amounting to a smaller pupillary response when the outcome was positive. No main effect of group was found, but the interaction effect between group and outcome was significant (β = 0.81, SE = 0.34, t = 2.37, p = .018) (depicted in Figure 4). Pairwise comparisons showed that healthy controls responded differently to gains and losses (βd iff = 1.23, SE = 0.21, Z-ratio = 5.28, p < .001), whereas no difference between outcomes was found within the rMDD group.
To examine whether results could have been modulated by the type of stimulus shown, we re-ran the analysis and included the two-and three-way interaction terms group * condition and group * condition * outcome. Neither of these interaction terms were significant (see table Appendix, Table A1).
H2. Blunted pupillary reactivity in expectation of gains and losses (e.g., after a choice has been made, but before the outcome has been presented) In order to investigate whether the rMDD group displayed blunted pupillary reactivity to gains and losses, we ran a mixed linear regression with pupil dilation before the stimulus was presented as outcome variable and outcome on previous trial (shifted outcome), and group (rMDD/healthy control), as fixed effects, and block and id as random effects. The best model was pupilresponse ∼ shifted outcome þ group þ (block|ID). As can been seen in Table 4, there were no significant effects for either predictor variables.
H3 (lower learning rate in the rMDD group) and H4 (less exploration in the rMDD group) As can be seen in Table 2, the rMDD group did neither display reduced learning rates (H3) and nor less exploration (H4). However, there was a significant difference between mean scores, where the rMDD group displayed lower mean scores than the healthy control group. There was also a significant difference between proportion of correct choices between groups (see Figure 3 and Table 2).

Discussion
Previous studies have documented blunted responses to reinforceres in individuals with ongoing MDD, but there is an ongoing discussion about whether these abnormalities reflect a depressive state or whether they persist in remission. We addressed this question in a preregistered study, by comparing reinforcement learning and pupillary responses to gains and losses in individuals with MDD in remission (rMDD) and healthy controls using a probabilistic response task. Results showed that individuals with rMDD differentiated less between rewards and losses than never depressed individuals with their pupil dilation response. Further, the rMDD group displayed lower mean scores at the last trial than the healthy control group as well as a smaller proportion of correct choices over trials. This adds to a growing literature suggesting that reward processing abnormalities previously found in acute MDD persist in remission (Pechtel et al., 2013;Whitton et al., 2016). To the best of our knowledge, this is the first study to demonstrate that pupil dilation, a peripheral index of arousal, is atypically sensitive to reinforcers in rMDD. Below, we discuss the results and the implications in relation to the hypotheses.
In healthy controls, pupil dilation responses were higher to losses than to gains. This is in line with previous research demonstrating that pupil dilation increases to motivationally salient or attention capturing events (Samuels & Szabadi, 2008). It should be noted, though, that this pattern of results is different from other physiological measures used in studies of reward processing in MDD, such as the BOLD signal in the ventral striatum, where gains typically result in higher responses (Schultz, 2007). In contrast to healthy controls, individuals with rMDD did not differentiate significantly between positive and negative outcomes. This result, which is in line with our hypothesis, suggests that individuals with a lifetime history of MDD have a reduced sensitivity to the hedonic value of stimuli in the environment.
Our results are consistent with the theory that altered responsiveness to the reinforcement value of stimuli in the environment persists after remission from MDD and is present in risk groups for MDD. Using different methodologies, similar conclusions were reached by two previous studies (Pechtel et al., 2013;Whitton et al., 2016). In contrast, typical reinforcement sensitivity in rMDD was reported (McFarland & Klein, 2009). Differences in methods could account for the discrepancies, since subjective ratings of affect rather than objectively observable measures of reward processing were used by McFarland and Klein (2009). Further, McFarland and Klein (2009) used a reward processing paradigm in which participants had no actual control over the stimulus contingencies, and learning was therefore not possible. In contrast, the present and other studies finding attenuated reinforcement sensitivity in rMDD have examined reinforcement processing in volatile environments where learning is possible (Pechtel et al., 2013;Whitton et al., 2016). Our study demonstrates, for the first time, Note. For participants with data from both conditions (face/no face), the mean value was computed for the last trials, for participants with data from only one condition, the single value on the last trial was entered in the t test. Figure 3. Percentage of correct choices over all trials (y-axis), with standard error of the mean, for each block (x-axis) and group (legend), where golden represents healthy control (control) and pink rMDD.
that pupil dilation is a sensitive index of reinforcement processing abnormalities in rMDD. Interestingly, blunted reinforcement processing in rMDD was found in response to losses rather than gains, whereas studies using ERPs found altered responses to gains (Weinberg & Shankman, 2017;Whitton et al., 2016). Our results therefore suggest that rMDD may be characterized by attenuated responses to both gains and losses, although these two types of atypicalities may have different neural correlates. Contrary to our hypothesis that pupillary expectation response would be smaller in rMDD compared to healthy controls for the expectation of gains and losses, we did not find any significant differences. This suggests that whereas processes related to reward administration or omission are affected in rMDD, expectation ("wanting") may be intact. However, since this is the first study examining the question in rMDD, more research is needed. A possible explanation for the lack of group differences in anticipatory responses is the fact that the contingencies changed every 15 trials, and hence all participants learned that contingencies could change, leading to less expectation in their pupillary response. It is also possible that a more consistent anticipatory pupillary response would be found during the expectation period if a subsequent motor response rather than passive viewing would have been required, as in a previous study in individuals with ongoing MDD (Schneider et al., 2020).
Neither were our third and fourth hypotheses regarding lower learning rate (lower αvalues) and less exploration (higher β-values) in rMDD compared with healthy controls supported by the data, despite the fact that individuals with rMDD showed overall worse task performance. A recent study (Heo et al., 2021) investigated whether subclinical individuals displayed aberrant value integration and value-action conversion. To this end, the authors introduced separate learning rates for model-based and model-free reward learning (RL) that allowed investigation of the ability to arbitrate between each RL system. The experimental paradigm was designed in such a way that two behavioral patterns could be mapped to each set of models: The indicator of the model free RL was choice consistency, and choice optimality was an indicative of model-based RL. Results showed that choice consequence and optimality as well as the ability to change between the model based and model-free RL were correlated with degree of depression. In relation to the current modeling results, it is plausible that the changing value-based environment, captured with model-free RL, was not well suited to detect the behavioral differences in the rMDD group. The fact that pupillary responses captured differences between rMDD and healthy controls in the current setup allows for the possibility that pupillary dilation could serve as an index for the shifting between model-based and model-free RL. However, this remains to be investigated empirically.
The group difference in pupillary responses between individuals with rMDD and healthy controls was not modulated by the type of feedback (face or nonfacial symbol). However, the conclusions which can be drawn regarding this effect are limited by a relatively small sample size. It should also be noted that the facial images used in the study showed expressions of either joy or anger. Previous eye tracking studies in individuals with ongoing depression (e.g., Klawohn et al., 2020) have shown atypical attention to facial expressions of sadness, which may be a more disorder congruent emotional expression than anger or joy. An interesting question for future studies is therefore whether facial expressions of sadness modulate reward learning in individuals with ongoing or remitted MDD.

Limitations and suggestions for future studies
In the current study, we did not have a continuous measure of symptoms of MDD and other mental health conditions to complement the binary classifications. Because of this, we were not able to examine whether interindividual variation in subclinical depressive symptoms or specific symptom dimensions such as anhedonia or apathy were related to the observed result.
It should be noted, though, that all participants were interviewed by trained psychologists, who used a validated semi-structured interview to determine the presence of MDD symptoms. It is therefore not likely that clinically meaningful . Effect plot of the interaction between outcome (x-axis) and group (right panel) on feedback pupillary response (y-axis). An effect plot takes the lower order terms as well as the random effects into account by plotting the marginal effects of the target variables setting the remaining covariates to their means. The means are hence the marginal means of each estimate (four in total) taking the remaining covariates and the random effects into account and the confidence intervals thereof.  subclinical symptoms would have been detected by continuous self-rating scales. Importantly, previous studies of reward processing in rMDD have not found any relation between self-rated residual symptom level and reward processing (Pechtel et al., 2013;Whitton et al., 2016). The design with mixed expected and unexpected uncertainty did not allow for separate estimates of different types of uncertainty. Future research should investigate the ability in rMDD to estimate the different types of uncertainty in relation to both positive and negative outcomes. An additional limitation is that participants were able to freely move their gaze during the experiment. Therefore, the possibility that eye movements may have influenced pupil size cannot be ruled out (Brisson et al., 2013;Hayes & Petrov, 2016). However, it should be noted that the groups did not differ in the number of fixations or gaze position relative to the center of the screen and that only very weak correlations were observed between pupil size and gaze position. Therefore, group differences in eye movements are not likely to explain the results. Finally, it should be noted that a relatively high number of participants in the current sample had a life time history of MDD (∼38%), which can be compared to an estimated life time prevalence of 10-20% in recent studies (Hasin et al., 2018;Lim et al., 2018). A potential reason for the relatively high prevalence of rMDD in the sample may be the fact that participants were recruited from a database of research volunteers administered by a medical university, which may have attracted individuals with depressive symptoms. Unfortunately, no data on the educational or professional background of the participants were collected. An interesting question for future studies is whether treatment history modulates pupillary responses and reward learning parameters in rMDD. This question should ideally be addressed in longitudinal studies. Despite these limitations, the present study contributes to our understanding of reward processing in depression by demonstrating a persisting reduction in reward sensitivity and lower scores on last trial as well as a smaller proportion of correct choices in individuals with a history of depression, also after full symptomatic remission. Our results indicate that pupil dilation is a feasible marker of MDD-like altered reward processing.

Implications
Pupil dilation is a noninvasive, relatively inexpensive method, which is potentially applicable in a clinical environment. The present results show that pupillary response has the potential to serve as an index of rMDD, and possibly also as a marker of trait vulnerabilities to MDD. An interesting venue for future research is to examine pupillary responses to rewards in individuals with ongoing MDD undergoing various treatments. Another pertinent research area is to disentangle the causes of aberrant decision makingis it a trait, state, or a marker of anhedonia? As noted in the introduction, altered reward processing may be implicated in the risk for relapse. Recent psychological treatments address reward processing impairments directly (Craske et al., 2016), and an important question is to examine the extent to which they are successful. Relatedly, research has shown that blunted reward learning is predictive of treatment outcome of antidepressant medication , but how does pharmacological treatments with selective serotonin reuptake inhibitors and serotonin and noradrenaline reuptake inhibitors affect noradrenergic transmission and decision making? Since both types of medication affect noradrenergic transmission, pupil dilation may be a particularly feasible method for understanding their effect on reward processing.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S1355617722000224