Hostname: page-component-6766d58669-h8lrw Total loading time: 0 Render date: 2026-05-18T18:04:45.269Z Has data issue: false hasContentIssue false

Information search with situation-specific reward functions

Published online by Cambridge University Press:  01 January 2023

Björn Meder*
Affiliation:
Max Planck Institute for Human Development, Center for Adaptive Behavior and Cognition (ABC), Lentzeallee 94, 14195 Berlin, Germany
Jonathan D. Nelson
Affiliation:
Max Planck Institute for Human Development, Center for Adaptive Behavior and Cognition (ABC), Lentzeallee 94, 14195 Berlin, Germany
*
* Email: {meder, nelson}@mpib-berlin.mpg.de or {bmeder, jonathan.d.nelson}@gmail.com.
Rights & Permissions [Opens in a new window]

Abstract

The goal of obtaining information to improve classification accuracy can strongly conflict with the goal of obtaining information for improving payoffs. Two environments with such a conflict were identified through computer optimization. Three subsequent experiments investigated people’s search behavior in these environments. Experiments 1 and 2 used a multiple-cue probabilistic category-learning task to convey environmental probabilities. In a subsequent search task subjects could query only a single feature before making a classification decision. The crucial manipulation concerned the search-task reward structure. The payoffs corresponded either to accuracy, with equal rewards associated with the two categories, or to an asymmetric payoff function, with different rewards associated with each category. In Experiment 1, in which learning-task feedback corresponded to the true category, people later preferentially searched the accuracy-maximizing feature, whether or not this would improve monetary rewards. In Experiment 2, an asymmetric reward structure was used during learning. Subjects searched the reward-maximizing feature when asymmetric payoffs were preserved in the search task. However, if search-task payoffs corresponded to accuracy, subjects preferentially searched a feature that was suboptimal for reward and accuracy alike. Importantly, this feature would have been most useful, under the learning-task payoff structure. Experiment 3 found that, if words and numbers are used to convey environmental probabilities, neither reward nor accuracy consistently predicts search. These findings emphasize the necessity of taking into account people’s goals and search-and-decision processes during learning, thereby challenging current models of information search.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright
Copyright © The Authors [2012] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Figure 0

Figure 1 Statistical environments to differentiate usefulness of features A and R under symmetric vs. asymmetric reward functions. In each environment, there are four stimuli (‘plankton’), constructed by combining two binary features, A (“eye”) and R (“claw”). The numbers above the items indicate their frequencies; the numbers below indicate the probability of belonging to Category x or y, respectively. The table at right provides detailed information on the two environments.

Figure 1

Figure 2: Information-search task illustrated. First one has to decide which feature to view (A or R, here “eye” and “claw” of plankton stimuli, respectively). The numbers show how likely one is to encounter a particular feature value, as well as the posterior probabilities of the two categories, given the feature value. Below the tree, the utility gain (Equation 5) of features (A, R) and feature values (a1, a2, r1, r2) is shown, for symmetric and asymmetric rewards. The height of the bars indicates the amount of utility gain: the width represents the frequency of occurrence. For example, in Environment 1, under the symmetric reward function, feature r1 entails a high utility gain (0.440), but the probability of encountering this feature value is low (0.123). The tables provide detailed information on the two environments.

Figure 2

Figure 3: Payoff functions in a binary classification task (Category x vs. Category y)

Figure 3

Figure 4: Classification-learning task illustrated. A stimulus (“plankton”) is shown and must be categorized as x or y (“Species A” or “Species B”). In Experiment 1, if the item is correctly categorized, feedback in form of a smiley appears; if incorrectly classified, a frowny face appears. The learning task in Experiment 2 was virtually the same, except that instead of a smiley or frowny face, points were associated with correct and incorrect classifications, with the amount of points earned depending on the reward function (small inset picture, at top right). Erroneous classifications were associated with zero points.

Figure 4

Table 1 Search-task views to Feature A, which maximizes accuracy, across Experiments 1, 2, and 3

Figure 5

Table 2 Subjects’ median probability estimates.

Figure 6

Table 3 Learning difficulty across Experiments 1 and 2.

Figure 7

Table 4 Informative search-task classification decisions.

Figure 8

Table 5 Expected values (and standard deviations) of Features A and R (in €).

Figure 9

Figure 5 Information-search behavior: data and theoretical models. Dark grey represents Feature A, light grey Feature R. Empirical search-task results are displayed in the top row (% of subjects preferentially viewing Features A vs. R) and next-to-top row (mean views to Features A vs. R); subsequent rows show predictions of alternate informational OED models (Table A1). MaxVal and ZigVal (Martignon et al., 2008), two heuristic models, also prefer Feature R. None of these models captures the differences between Experiment 1 and Experiment 2, as none of these models makes different predictions according to the procedure during the categorization learning task. The final row, Learning-phase Reward, captures the idea that following experience-based learning people preferentially view whichever feature would have been most important, relative to the reward structure and goals in the learning task (see text and Figure 6).

Figure 10

Figure 6 Figure 6: Decision trees that might be established during the learning task.Depending on the goal of the classification task (maximizing overall accuracy in Experiment 1 vs. maximizing rewards in Experiment 2), features’ relative usefulness differs. In Experiment 1, subjects were trained to choose whichever category is most probable, given the presented stimulus. To most efficiently achieve this, with minimal feature views, Feature A should be the root node. By contrast, in Experiment 2 subjects learned to classify under asymmetric rewards, with the goal of categorizing stimuli in a way that maximizes expected reward. This goal is most efficiently achieved by first querying Feature R, which has higher usefulness than Feature A (i.e., higher utility gain). (In fact, categorizing stimuli based on the state of Feature R alone is sufficient to maximize expected rewards. Therefore, in the trees, both states of Feature A lead to the same decision.)

Figure 11

Table A1 Alternative optimal experimental design (OED) models.

Figure 12

Table A2 Analysis of Baron and Hershey’s (1988) scenarios in which study subjects chose which of two medical tests (T1 or T2) was most useful (Experiment 1, Cases 5–11).

Figure 13

Figure A1 Learning data from Environment 2, Experiments 1 and 2. Each subject is one row; each column is one feature configuration, sorted according to their frequency from left to right. Trials are plotted from top to bottom, and from left to right, for a particular subject and a particular configuration. In each trial, a decision that is consistent with which category is most probable (Experiment 1) or is most rewarded (Experiment 2), is plotted with a white rectangular pixel. Suboptimal decisions are plotted with black rectangular pixels. The top two panels show learning data from Experiment 1, in which people’s task was to classify stimuli according to which category is most probable (i.e., with no explicit reward function during learning). Most people (38/40) achieved the learning criterion. The bottom two panels show learning data from Experiment 2, in which an explicit asymmetric reward function applied in the learning phase. Only 19 out of 40 people achieved the learning criterion. The results show that subjects struggled a great deal with the conflict configuration (second column from left), for which accuracy and reward conflict (i.e., subjects had to choose the less likely category in order to maximize expected reward).