Myopia drives reckless behavior in response to over-taxation

Governments use taxes to discourage undesired behaviors and encourage desired ones. One target of such interventions is reckless behavior, such as texting while driving, which in most cases is harmless but sometimes leads to catastrophic outcomes. Past research has demonstrated how interventions can backfire when the tax on one reckless behavior is set too high whereas other less attractive reckless actions remain untaxed. In the context of experience-based decisions, this undesirable outcome arises from people behaving as if they underweighted rare events, which according to a popular theoretical account can result from basing decisions on a small, random sample of past experiences. Here, we reevaluate the adverse effect of overtaxation using an alternative account focused on recency. We show that a reinforcement-learning model that weights recently observed outcomes more strongly than than those observed in the past can provide an equally good account of people’s behavior. Furthermore, we show that there exist two groups of individuals who show qualitatively distinct patterns of behavior in response to the experience of catastrophic outcomes. We conclude that targeted interventions tailored for a small group of myopic individuals who disregard catastrophic outcomes soon after they have been experienced can be nearly as effective as an omnibus intervention based on taxation that affects everyone.


Introduction
The real world comprises many situations where one is unsure about the outcomes ensuing from one's actions. These situations of risk are often structured such that a particular course of action results almost all of the time in small gains but also, on rare occasions, in catastrophic losses that can easily offset any previously accumulated gains. Choosing such courses of action is dangerous, yet in many situations people recklessly engage in them. For instance, people still text while driving or ride a bicycle without wearing a helmet. A recent paper (Yakobi et al., 2020, henceforth: YCNE) investigated the effectiveness of monetary incentives in the form of taxation as a means to regulate reckless behavior. YCNE studied situations where moderate taxation of a moderately risky option would lead to the desired effect of swaying people toward a safer option, but excessive taxation could drive people toward an even riskier, non-taxed option. Consequently, taxation was expected to produce a U-shaped pattern of reckless behavior, with increased recklessness for levels of taxation that are either too low or too high.
YCNE investigated this U-shaped pattern of taxation in two experiments using a decisions-from-experience task. In this task, participants made repeated decisions between three initially unknown options, comprising one relatively safe option, one moderately risky option that was subject to a tax, and one inferior, highly risky but non-taxed option (see Appendix for details). After each choice, participants would see the outcomes of all three options, allowing them to learn about the underlying properties of the options, but only the outcome of the chosen option affected the participant's bonus. Varying the level of taxation between three amounts (representing no, moderate, and excessive taxation), the expected U-shaped pattern emerged. YCNE put forth "reliance-on-small-samples" (Erev & Roth, 2014) as a mechanistic explanation of this result. According to this mechanism, people base their decisions on a random sample of past outcomes from memory. Because small samples have a natural tendency to under-represent rare events, this mechanism produces (as-if) underweighting of rare events and, in turn, preference for reckless behaviors that offer the best outcome most of the time.
YCNE successfully demonstrated how, in decisions from experience (see , for a recent meta analysis), policies based on economic incentives can backfire. They attributed this to a specific cognitive mechanism, where people base their decisions on a small, random sample of past experiences. Building on their work, this article puts forth an alternative cognitive explanation, one that arguably rests on weaker assumptions and enables analysis of individual differences in people's response to taxation.

Models of reckless behavior
To identify the psychological processes that best describe people's reckless behavior, YCNE evaluated several models embodying the reliance-on-small-samples hypothesis (Erev & Roth, 2014) and a so-called full-data model. The full-data model takes all previous experiences into account and deterministically predicts choice of the option that has yielded the highest average outcome. As illustrated in Figure 2, people following the full-data model should quickly develop a strong preference for the safe option as the cumulative likelihood of experiencing catastrophic events increases. However, as can also be seen, people's actual preferences developed more moderately. Furthermore, people appeared to dislike both of the two risky options less than predicted by the full-data model. Two tendencies in the data are likely responsible for these behavioral patterns: stochasticity of choices and (as-if) underweighting of rare events. Small-sample models elegantly account for these patterns using a single mechanism: A small sample of outcomes introduces stochasticity, For some comparisons, they also included the accentuation of differences model (Spektor et al., 2019). However, for the focus of the present investigation, this model is not of relevance. rendering choice proportions less extreme, as well as (as-if) underweighting, accounting for higher-than-expected preference for the risky options under taxation. Consequently, the small-sample models were found to clearly outperform the full-data model across all conditions (see YCNE Tables 1 and 2). On a qualitative level, however, it is important to note that the full-data model captured the patterns of results rather well (see also YCNE Figures 3 and 5). Moreover, the samplesize parameter in the small-sample models was estimated to be between 24 and 47, and the overall best-performing model was an ensemble model that averages the predictions of a two-stage sampling model with those of the full-data model. These findings suggest that models that take into account many (or even all) samples might in principle be able to accurately describe people's behavior and are consistent with results from other decisionsfrom-experience paradigms, where the choice of the option with the higher average mean (also known as the natural-mean heuristic) is considered the benchmark model .
An alternative to the full-data and small-sample models exists in recency-based models as formalized in the framework of reinforcement learning (Sutton & Barto, 1998). Recencybased models also produce probabilistic choices and (as-if) underweighting of rare events, however, via a different psychological mechanism. Such models assume that people keep track of a long-run reward expectation of option that is updated at each time with incoming reward (or punishment) , . If people observe a better-than-expected reward, they adjust upward and vice versa. A popular and simple implementation of this mechanism is given by the delta-rule model (Gershman, 2015): In this model, the learning rate controls the degree to which the expectations are updated. When is constant over time, the model inevitably produces recency, which means that recent experiences receive more weight than earlier ones. The extent of recency varies with the value of . For instance, = .10 implies that an experience ten epochs ago retains about 38% of its original weight, whereas the same experience's weight essentially drops down to zero under = .90. Thus, also controls the number of experiences that effectively influence choices, and with that (quite analogously to the small sample models), the degree of (as-if) underweighting of rare events. The value of also has a limited effect on stochasticity; however, models of this class typically include extra parameters for additional sources of choice stochasticity. The recency-based account can be regarded as an instance of "reliance on small samples", yet it differs in important ways from the sampling-based models used by YCNE to implement this notion, which has implications for both theory and practice. First, the recency-based account can be considered more (cognitively) parsimonious. In contrast to sampling-based models, it does not require an explicit representation of all past experiences or a process of sampling from memory. Instead, people have only to memorize a single value and carry out only a minimal set of operations after each choice. Second, in contrast to the small-sample accounts, choices in the recency-based account always reflect all experienced information, even if their influence becomes negligible the further away they are. Third, whereas in sampling-based accounts each experience has equal sway in the long run, the recency-based account predicts that recent outcomes will influence choices more than earlier ones.
We assessed whether the recency-based account can accurately describe the data of YCNE, including people's responses to varying levels of taxation (see Appendix for technical details and https://osf.io/q7pkf/ for the full analysis code). We fitted the delta-rule model to the aggregate choice proportions of YCNE's three between-subject conditions. The model's predictions were derived by determining, across all participants, the proportion of trials for which was highest. A single parameter ( ) was used to fit, overall, 14 independent choice proportions (6 from Experiment 1, 4 from each condition from Experiment 2). A learning rate of = .16 yielded the best fit with a resulting mean squared error of .006. Most predictions fell within the 95% confidence interval of the observed choice proportions and the model accurately accounted for the qualitative patterns of taxation (see Figure 3). Moreover, when we used the model to predict the data of one experiment on the basis of the respective other experiment, we observed mean squared errors of .006 (Experiment 1) and .010 (Experiment 2), outperforming all sampling-based models evaluated by YCNE except for the I-SAW2 model, which achieved a slightly better performance in Experiment 2 (see Table 1 for all within-and cross-experiment predictions). According to these aggregate-level analyses, the recency-based account given by the delta-rule model captures the aggregate data at least as well as the sampling-based accounts. However, aggregate-level analyses always bear the risk of misrepresenting the mechanisms that actually are at work at lower levels of analysis, sometimes leading to drastically wrong conclusions (e.g., Regenwetter & Robinson, 2017;Wulff & van den Bos, 2018;Birnbaum, 2011). Moreover, they can obscure crucial individual differences in both behavior and mechanism. This can be particularly problematic when a single identified mechanism serves as the basis for behavioral interventions. In the next section, we therefore use the delta-rule model to evaluate people's behavior at the individual and trial level. Reinforcement learning = delta-rule reinforcement-learning model. Naïve sampler = smallsamples model used by Yakobi et al. (2020) in Experiment 1. Two-stage naïve sampler = small-samples model that first eliminates one of the two riskier options and then compares the winner with the safe option, as used by Yakobi et al. (2020) in Experiment 1. Extended two-stage naive sampler = Two-stage Inertia, Sampling and Weighting model (I-SAW2) used by Yakobi et al. (2020) in Experiment 2.

Individual differences in recency and reckless behavior
To test the recency-based account more rigorously and address possible aggregation problems, we fitted the delta-rule model separately to each individual's trial-level choices. To achieve this, the model had to be equipped with an additional mechanism that maps subjective expectations to choice probabilities, accounting for the stochasticity in people's behavior (Hey & Orme, 1994). We implemented an -greedy (Sutton & Barto, 1998) choice rule which predicts the choice of the option with the highest subjective expectation with probability 1 − and a randomly selected option with the error probability . In analyses reported in the Appendix, we found the -greedy choice rule to fit participants' behavior better than a popular alternative, the softmax choice rule, and, more importantly, to produce substantially lower parameter correlations, implying a cleaner separation of the psychological mechanisms. Fitting separate learning rates and error probabilities to each individual's choices using maximum likelihood, we observed an overall sum of Bayesian information criteria (BIC; Schwarz, 1978) of 90,101. This value was considerably lower than that of an aggregate model fitting all trial-level choices using a single learning rate and error probability (BIC = 109,759) and that of an aggregate baseline model assuming random guessing (BIC = 126,780). Furthermore, we found the delta-rule model to produce lower BICs for 91.1% (224 out of 246) of individuals than an individual-level baseline model.
The better performance of the individual-level models suggests meaningful individual differences, which also came through clearly in the distribution of individual-level parameter estimates: Learning rates followed a bimodal distribution (see Figure 4a), such that a vast majority of people fell into two clearly distinct groups: myopic and emmetropic learners. Myopic learners (32%) are characterized by a high learning rate of = [.85, 1], implying that only the last one or two observations form the basis of their choices. Emmetropic learners (64%), on the other hand, are characterized by a low learning rate of = (0, .15], implying that even the most distant experiences are still factored into their choices. The distribution of error rates, by contrast, was clearly unimodal and reflected a maximization rate of 70%, which is in line with previous research (Harless & Camerer, 1994). Furthermore, error rates barely covaried with learning rates ( = .09), suggesting that the estimated learning rates reflect systematic differences in people's tendency to focus on recent experiences and are not merely the result of identifiability problems known for many computational models (Spektor & Kellen, 2018  To evaluate whether the individual differences in learning rates reflect clear and systematic differences in behavior, we plotted the modal choices of all individuals ordered by their estimated learning rate, separately for all conditions ( Figure 5). This analysis revealed that whereas most emmetropic individuals quickly learned to choose the safe option (dark blue), especially under high taxation, most myopic learners exhibited persistent preferences for whichever risky options offered the better outcome most of the time (gray = moderate risk, yellow = high risk). These patterns were most pronounced in the presence of an attractive safe option in Experiment 2.
The sustained preference for risky options observed for myopic individuals suggests that they might not have learned at all from their experiences. However, using a mixedeffects regression accounting for participant random effects nested within condition, we found preferences for the safe option to be substantially elevated immediately after the observation of an accident ( = 2.59, < .001) , but not one ( = 0.82, = .45) or two ( = 1.14, = .58) trials later, relative to all other trials. Thus, consistent with the high learning-rates estimates, myopic individuals learned about and reacted to accidents, but then discounted them very quickly as they continued. Emmetropic individuals, by contrast, showed an increased preference for the safe option not only immediately after the accident ( = 2.30, < .001), but also one ( = 1.38, = .017) and two ( = 1.37, = .026) trials later. Furthermore, consistent with the lower learning rate of emmetropic individuals, preference for the safe option right after the accident was somewhat less pronounced than for myopic individuals.
The existence of two groups of individuals has critical implications for our understanding of reckless behavior in the face of taxation. Considering only moderate-and high-taxation situations, the data showed that myopic individuals experienced, on average, 3.08 accidents, whereas emmetropic individuals experienced only 1.82 accidents (see Figure 6). More importantly, compared to moderate taxation, myopic individuals suffered 0.8 accidents more under high taxation, whereas emmetropic individuals suffered only 0.36 more accidents. These analyses suggest that myopic individuals not only suffered considerably more accidents in general, but also that they were much more susceptible to the negative effects of over-taxation.

Experiment 1
Moderate tax

Myopic
High tax

Myopic
High tax

Myopic
High tax

Discussion
It is well established that people tend to choose as if they underweight small-probability events when they make decisions based on experience. This finding forms the basis of the so-called description-experience gap  and it is the key to understanding people's responses to taxation in this case. As-if underweighting of rare events implies that people tend to prefer the option that yields the best outcome most of the time (Wulff et al., 2015;Erev et al., 2020). Under excessive taxation of moderately risky behaviors, an even more reckless option can suddenly become the option that is better most of the time, resulting in an increased preference for this option. The present investigation shows that different mechanisms embodying as-if underweighting can provide a good qualitative and quantitative account of how taxation affects behavior in such settings. Moreover, it uncovered the existence of important individual differences that could be of greater import than the question of which mechanism best accounts for people's behavior. Specifically, there were two distinct groups of people, myopic and emmetropic learners, who responded to the experience of accidents in qualitatively distinct ways. Accounting for these individual differences is crucial for understanding behavior and for deriving effective policies to prevent accidents due to over-taxation. Analyses of aggregate behavior are always at risk of misrepresenting people's actual behavior (Wulff & van den Bos, 2018;Regenwetter & Robinson, 2017;Birnbaum, 2011) and, in the present investigation, this risk was real. The learning rate obtained by fitting the recency-based model to the aggregate choice proportions of both studies suggests a steady decay of the weight of past experiences, where an experience ten epochs ago receives about 20% of its original weight (see Figure 4b). However, there were almost no individuals who were accurately described by such a weighting scheme. Instead, individuals seem to assign to past outcomes a weight that is either well above that of the aggregate estimate, or one that is essentially zero. These differences imply that the groups effectively base their decision on different experiences. Furthermore, they suggest that they could have relied on different mechanisms.

Emmetropic
Emmetropic individuals might have made their choices using a recency-based mechanism with gradually diminishing weights as formalized in the delta-rule model. However, they could have also recruited a stochastic variant of the full-data model or a samplingbased model with a large sample size. All three mechanisms actually are able to account for the behavior of emmetropic individuals equally well because, in a stable environment, a large sample of both recent and random samples will be representative of all observations. Myopic individuals, on the other hand, cannot have relied on either the full-data model or a pure -sampling process. For them, only the recency-based account is able to capture the high weight given to the single most recent outcome.
Despite the recency-based account's ability to fit the behavior of emmetropic and, especially, myopic individuals, we think that it does not actually provide a complete account of their psychology. For instance, recency is often attributed to either memory limitations or adaptations to assumed changes in the environment (see Wulff & Pachur, 2016;Bornstein et al., 2017;. However, neither of these two represents a compelling account of the two extreme forms of recency observed here, namely practically no recency (emmetropic) and maximum recency (myopic). Rather, it is likely that other factors not included in the recency-based account, such as risk preferences or goals (see, e.g., Hertwig et al., 2019), also play a role. For example, the differences between emmetropic and myopic individuals can also be construed as the pursuit of shortversus long-term goals (see Wulff et al., 2015;Lopes, 1981) or as maximization versus probability matching strategies (Gaissmaier & Schooler, 2008;van den Bos et al., 2009). Moreover, there exist at least two behavioral phenomena that are difficult to reconcile with the assumptions of the delta-rule model or other reinforcement-learning accounts for that matter. First, people have been shown to possess accurate declarative memory representations of experienced samples beyond the subjective values stored by the delta-rule model . Second, people have been shown to expect temporal dependencies in the outcome sequence, which can produce the wavy-recency patterns presented by YCNE. Such expectations of dependencies cannot be accounted for by any model assuming stochastic independence in the outcome distributions between choices, including the delta-rule model. To account for these phenomena, cognitive models must be equipped with mechanisms that go beyond pure small-sample or recency-based mechanics.
Notwithstanding these challenges, our findings join similar results of previous studies (Spektor & Kellen, 2018;Erev & Haruvy, 2015) in demonstrating the existence of strong individual differences in experience-based settings. These differences should be accounted for in future modeling efforts, ideally using larger and more diagnostic data sets. One promising avenue to increase diagnosticity with respect to the question of sampling-or recency-based accounts exists in reversal-learning tasks in which reward contingencies undergo sudden, drastic changes. In such situations, only recency-based accounts will allow decision makers to adaptively respond to changes in the environment (e.g., Hampton et al., 2006).
Finally, returning to the topic of reckless behavior, we believe that the presence of two groups of people relying on potentially different mechanisms has crucial implications for policy development. We have shown that myopic individuals are already at a much greater risk of suffering accidents than emmetropic individuals and that this gap widens under higher levels of taxation. A targeted policy addressing myopic individuals-for instance, by using boosts (Hertwig & Grüne-Yanoff, 2017)-might be effective over and beyond an omnibus policy addressing everyone equally. The data suggest that had the smaller group (32%) of myopic individuals acted like emmetropic individuals, a total of 95 accidents would have been prevented. In contrast, placing everyone, myopic and emmetropic individuals, under a moderate (rather than a high) level of taxation prevented 115 accidents. Even more accidents (197) would have been prevented by the combination of both; that is, if everyone was placed under a moderate level of taxation and everyone had acted in a emmetropic fashion. This suggests that the overall best policy to prevent reckless behavior and accidents likely recruits both omnibus and targeted strategies. Yakobi et al. (2020) conducted two experiments using a variation of the -armed bandit problem (Sutton & Barto, 1998). In this task, participants repeatedly chose between three monetary lotteries for a total of 100 periods. Initially, participants had no information available about the options' outcome distributions. After each choice, participants were presented with random draws from each option, but obtained only the outcome of the chosen option (full feedback; see Figure 1 for a schematic illustration). Thereby, individuals were able to learn about the properties of the outcome distributions over time.

Experimental details
In every condition, participants faced a choice between a safe option, a medium-risk option, and a high-risk option. Depending on the experiment, the safe option yielded 3 points with a probability of .45 or 0 otherwise (Experiment 1), 0.60 points with certainty (Experiment 2, unattractive safe option), or 1.35 points with certainty (Experiment 2, attractive safe option). The medium-risk option yielded 2 points minus a tax with a probability of .97 and an outcome of -20 points with a probability of .03, the so-called accident. The amount of tax was implemented as a within-subject factor and varied between 0, 0.4, and .8 points in Experiment 1 and .4 and .8 points in Experiment 2. The high-risk option always yielded 1.5 points with a probability of .94 and the accident otherwise.

Modeling details Fitting the delta-rule model to aggregate choice proportions
We fitted the delta-rule model separately to the aggregate choice proportions of each of the three between-subject conditions by minimizing mean squared error (i.e., Experiment 1, Experiment 2: Unattractive safe option, Experiment 2: Attractive safe option). Initial expectations 0, for each option were set to 0, which is the standard procedure in the literature (e.g., Gershman, 2015;Spektor et al., 2019). On each trial, expectations are updated according to the delta-learning rule: where , denotes the reward of the respective option and ∈ [0, 1] is the only free parameter in the model and determines the degree of recency. In the case of = 1, individuals only consider the most recent trial, and lower values of lead to an increasingly linear weighting scheme of all past observations. A single was used to model chosen and non-chosen options.
Predictions for aggregate choice proportions were determined as the proportion of trials in which each of the options had the highest value. These predicted choice proportion were evaluated against the observed ones using mean squared error.

Fitting the delta-rule model to trial-level choices
Maximum-likelihood estimation was used to fit the delta-rule model to trial-level choices. This required that the model be endowed with a probabilistic choice rule. We used agreedy model that chooses a randomly selected option with a probability of and the option with the highest value otherwise. Formally, let on trial be the option with the highest expectation, then the probability of choosing it will be given by ( , ) = (1 − ) + × 1 3 . If there are multiple options with the highest expectation, then the model selects one of these at random. Learning followed the same implementation as for aggregate choice proportions. Parameters were estimated using the Nelder-Mead algorithm as implemented in the scipy Python library. We ran the algorithm 100 times with random starting values and kept the best-fitting result.
Choice of choice rule A popular alternative to -greedy, is the softmax choice rule (e.g., Gershman, 2015;Spektor et al., 2019): Unlike -greedy, the softmax choice rule implements a gradual trade-off between exploration and exploitation that co-depends on the value of the sensitivity parameter and differences between the options. Specifically, choices under softmax become more deterministic if the value differences between the options become large and the higher .
For two important reasons we relied on an -greedy choice rule rather than softmax: First, under a softmax choice rule models are known to have poor parameter identifiability (Stewart et al., 2018;Spektor & Kellen, 2018;Gershman, 2016) due to high correlations between the sensitivity and learning-rate parameters. Low parameter identifiability is highly problematic for research questions focused on interpreting parameter values, especially efforts involving the classification of individuals. Recovery analyses reported below show that identifiability was no problem under an -greedy choice rule. Second, the -greedy choice rule actually provided a better quantitative account of the individual-level data than the softmax choice rule, with 145 out of 246 individuals (59%) being better fit by the former. The -greedy choice rule also provided a better fit overall, with a sum of BICs of 90,101 compared to 97,717 for the softmax choice rule. These results are in line with reports that the -greedy choice rule is well suited to capture behavior in full-feedback paradigms (e.g., Yechiam & Busemeyer, 2005).
Parameter recovery Reinforcement-learning models are known to have poor parameter identifiability, which translates in poor recoverability (see Gershman, 2016;Spektor & Kellen, 2018, for attempts to improve identifiability). To confirm that this would not affect parameter estimation for our delta-rule model using the -greedy choice rule, we ran a parameter-recovery study. For each sequence of outcomes of an individual in the original study, we have drawn 10 random parameters from all estimated parameters across studies and simulated the choices of a "virtual" participant whose choices stemmed 100% from the reinforcement-learning model with an -greedy choice rule and the respective parameters. We re-fitted that virtual participant's choices to obtain the recovered parameters. In total, the parameter recovery was therefore based on 2,460 sets of parameters. The parameter recoverability was excellent: Both parameters yielded a near-perfect correlation between the data-generating parameters and the recovered parameters, = .96 and = .95 (see also Figure A1).