Introduction
I am grateful to the commentators for their thoughtful and insightful responses to my article. They have provided valuable and often surprising perspectives and enriched the discussion on the current state of visual attention research. Here, I will highlight important points of agreement or extensions of the ideas in the target article, as well as lingering points of confusion or disagreement. The main points of contention concerned whether – whatever the perceived problems with the state of attention research – we should call it a crisis and work towards a paradigm shift. First, the next section provides a brief refresher on the main points of the target article and clarifies those points based on commentary.
Clarifying the main points
In the target article, I argued that visual attention is in a Kuhnian crisis for a number of reasons. First, a number of supposedly attentional phenomena can instead be explained by the rich but lossy encoding of peripheral vision in terms of summary statistics. Due to the quirks of peripheral vision, one must not only worry about the observer’s eye movements (a point amplified by Godwin, Hart, & Barnhart), but also about the details of the stimulus and the task. Still other supposedly attentional phenomena might be predicted by an ideal observer. These results are problematic for an influential theory, Feature Integration Theory (FIT), and necessitate rethinking what we have learned about visual attention. Second, this points to problems with the methodologies of the visual attention paradigm not yielding the promised results. Third, this situation has facilitated a proliferation of types of attention and often vague proposed mechanisms. In short, the signs that visual attention is in crisis (Box A, target article) extend beyond merely non-attentional explanations for some supposedly attentional phenomena (Chapman, Addleman, & Störmer).
Nonetheless, a number of phenomena seem to require explanation by an attention-like mechanism. I proposed a list of such phenomena (Box B, target article) and, based on that analysis, suggested two candidates for a new theory. My preferred option reframes all perception as requiring performing a task, with limits on task complexity. Making sense of this requires rethinking tasks. One may be unaware of the task, and rather than corresponding exactly with one’s nominal task, it may often involve a sort of multitasking. If a task exceeds the complexity limit, the observer’s brain must simplify it, leading to poorer performance. One might want to use “attention” to refer to the flexible mechanism that chooses and implements a task, subject to this limit.
In the end, the target article does not outright dismiss a mechanism one might want to call “attention” (Ma et al.), and, in fact, suggests a new way to think about an attention-like limit and the mechanisms for dealing with that limit. I agree that inattentional blindness necessitates an additional attention-like mechanism (Most); that is why it appears in Box B. Similarly, as the target article states, search tasks can encounter task limits. In this sense, I agree with Wolfe that “any plausible model of visual search must include visual selective attention,” though I remain unclear what Wolfe or other commentators mean by “selective attention.” Wolfe also raises the topic of attention-like limits on perception even at the fovea; this already appears in the list in Box B under “Dual task.” I am not surprised that these errors are stochastic, given the noisy nature of vision. Perhaps Wolfe would like to add other phenomena to Box B, which seems reasonable. I do see, however, several difficulties. It is not clear to me at this point what lessons we can learn about attention from visual search. I am particularly wary of deriving mechanisms from the subjective experience of searching item-by-item. Considerable processing in peripheral vision seems to happen without awareness. Even if this processing proceeds in parallel, the different items might accumulate evidence at slightly different rates, leading to awareness of their identities at different times, a bit like popcorn popping.
A number of commentators point out that attention is not just about vision (Posner, Dudarev & Enns, Chapman, Addleman, & Störmer, and Prasad & Hommel). Prasad & Hommel, in particular, see this as a critical error, overly subdividing the field. I largely agree and merely attempted to stick to what I know and limit the scope of the target article. See Rosenholtz (2020), for instance, where I relate task complexity limits to working memory. Similarly, I wholeheartedly agree with commentators who point out that we need to worry about both sensory and attentional aspects of perception over time, not just space (Dux, Dell’Acqua, & Wyble; Junker & Huber; Belledonne & Yildirim; Denison). Again, this was not a theoretical division, merely limiting the scope to aspects I better understand. In the same vein, Koevoet & Van der Stigchel rightly protest my focus on classification tasks; this is simply easiest for me to conceptualize and easiest to explain to a broad audience. (I disagree that the target article is limited to categorical classification tasks; see Figure 8).
The nature of the visual representation would fundamentally determine the complexity of a given task. Stimuli with evolutionary importance might have developed more advantageous representations (Capparini, To, & Reid). Here, I have far more agreement with Chapman, Addleman, & Störmer than they suggest. Under limited task complexity, capacity limits in visual processing would be “predicted by the organization of the visual representations themselves,” plus the nature of the limit. If the visual representation makes two classes well separated by a linear classifier, one would imagine that discriminating between those classes would be simple by most definitions of task complexity. If one were limited in the number of neurons one could use, then performance would depend on how well the best N neurons discriminated between cats and dogs, rather than how well the entire brain could discriminate between the two. In addition, I have argued for the utility of thinking of the visual system as almost never merely doing tasks like cat vs. dog discrimination. Particularly in the real world, but also in the laboratory, vision at the bare minimum attempts to monitor the situation and keep track of the general layout of the scene. It is this compound task that potentially encounters complexity limits. Performance would also depend on the usual suspects, such as the strength of the observer’s prior information and the effort they exert (Koevoet & Van der Stigchel).
In a related note, Most describes the role of attentional set as opposed to task limits in inattentional blindness. On the surface, task limits might merely be a constraint and attentional set the implementation of task requirements subject to that constraint. However, in rethinking the lessons from FIT, we must rethink whether attentional set might include far more sophisticated features (Kristjánsson & Chetverikov). Most points to experimental results in which observers better react to a blue motorcycle than a gold one when following blue arrows; surely, he suggests, this is feature-based attention. As a description of results, sure. As evidence that the visual system can only attend to simple features, I would wager not. The observer likely simultaneously gives preference to processing features relevant to the driving task. The task set might appear to include simpler features because monitoring blue stuff generally requires less effort than monitoring blue arrows.
Agreement on a number of issues
Commentaries expressed general agreement with a number of the target article’s main points. This includes the importance of peripheral vision and ideal observers as potential explanatory factors, the state of Feature Integration Theory, and the proliferation of ill-defined or underspecified attentional mechanisms. In addition, several commentators resonated with the second half of the target article: the suggested list of attentional phenomena, thoughts about tasks, and task limits.
Vanunu & Ratcliff broadly agree that peripheral vision accounts in part for change blindness and inattentional blindness. Wolfe concurs that at least some search puzzles may have been due to peripheral vision confounds. Posner agrees broadly that the field has sometimes confounded sensory and attentional effects. Carrasco, Chapman, Addleman, & Störmer, and Kristjánsson & Chetverikov reiterate the need to control for both perceptual factors and what can be explained by an ideal observer. Carrasco rightly points out that her laboratory has spent decades controlling for perceptual factors in order to study attention.
In spite of Wu’s prediction that “Commentators will likely debate her crossing out moves…,” essentially no essays obviously defended Feature Integration Theory. Kristjánsson & Chetverikov and Denison describe FIT as having been in trouble for a long time. From the commentaries, we might infer that others agree; Carrasco, for instance, concurs with past shortcomings in interpreting search experiments. Posner, in fact, cites evidence that a lack of binding can occur even with visual attention. To me, this looks like a Goldilocks effect, as described in the target article. Despite this lack of disagreement, FIT lives on in the field in the form of the assumptions that often go unquestioned. We should be alert to ideas about automatic processing, the availability of only simple features without attention, the need for attention for binding, and FIT-like notions of the purpose and mechanisms of selection.
A number of commentaries agreed with the focus of previous essays on the detrimental effects of the proliferation of ill-defined (Carrasco), underspecified (Chapman, Addleman, & Störmer; Wu) types of attention or attentional mechanisms (Anderson, 2011; Hommel et al., 2019; Zivony & Eimer, 2021; Chun, Golumb, & Turk-Browne, 2011). But authors also argued for the correctness of their favorite distinctions, for example: Posner for alerting, orienting, and executive function; Dudarev & Enns for the correspondence between multiple reasons to prioritize and multiple kinds of selectivity; Carrasco for the exogenous (involuntary)/endogenous (voluntary) and feature/spatial/temporal distinctions; and Denison for multiple mechanisms generally. The sections on “Paradigm Shift = Bad?” and “Paradigm Shift Good” debate whether this precludes a unifying theory.
Several commentaries found the list of critical attentional phenomena valuable (Box B). Belledonne & Yildirim appreciate the connection between the phenomena listed and a fundamental part of attention: how goals impact our perception. Denison agrees that more modern theories of attention, for example, the normalization model and extensions thereof (Reynolds & Heeger, 2009; Jigo, Heeger, & Carrasco, 2021), do not attempt to capture all of the attentional phenomena one might want, and the list provides “a useful guide to theory development.” However, in some cases, reading between the lines, a number of commentaries seem to want to add phenomena to that list. I will address this along the way, and in more detail in the subsection on “Paradigm Shifts and Ambitious Science.”
Some of the more interesting agreement concerns how we might think about tasks, task-based selection, and even task limits. First, in terms of tasks and complex templates, Kristjánsson & Chetverikov cite evidence for summary-statistic-based, complex attentional templates, a welcome change from FIT’s claim that one can only select simple features. Tomasello resonates with complex tasks in discussing attention to the situation, rather than to objects. Morsella et al. describe what I would consider a task-based view of attention, which they call “priority-based,” in which one is likely unaware of the complex tasks performed by the system. Duncan & Theeuwes concur that their results show that one is not in full conscious control of the task set but rather processes information outside of the nominal task (see Box A, target article). Similarly, Chapman, Addleman, & Störmer describe attentional effects without an explicit nominal task.
In terms of formulating attention in terms of task-based selection and task limits, we can see agreement on several points. Wu finds it reasonable to think of task performance and selection as inseparable. Koevoet & Van der Stigchel stress the importance of considering effort if one is to study decision complexity, and I wholeheartedly agree. Belledonne & Yildirim describe a theory with a moment-to-moment evolution of attention over time, consisting of mini tasks or “targetable singleton computations,” each of which takes unit time. One might think of this time requirement as a different form of complexity limit. Poth & Poth point to number of end effectors as an important additional limit, and I agree that this may have guided the development of the attentional system. Along these lines, several commentators from the domain of perception and action resonate with a number of the claims in the target article, from thinking of attention as selecting tasks (O’Bryan & Song) to notions of decision complexity; Prasad & Hommel describe this as “old news,” which I will take as a form of agreement. Krauzlis independently arrived at a task-based selection proposal based on physiology research, but cautions that those results also support the flexible pooling account.
The fact that similar ideas have arisen from thinking about perception for action (Dudarev & Enns, Prasad & Hommel), consciousness (Morsella et al.), evolution (Tomasello), and physiology (Krauzlis) gives me hope that we are on the right track.
Suggested extensions
A number of commentaries went beyond agreement with ideas in the target article to suggest extensions to those ideas. This includes the role of perceptual factors, the utility of banning “attention,” and thinking about task complexity. Many such suggestions lie outside of my expertise. However, I will speculate about attentional blink, the topic of three commentaries.
Francis & Thunell, for instance, mention that their concerns about statistical power apply to other attentional paradigms, not just object-based attention. They point in particular to studies that examine modulations of already small effects, e.g., looking for some sort of group difference in a spatial cueing effect.
Capparini, To, & Reid report that both peripheral vision and attention change a great deal during development. They suggest it is important to extend understanding of peripheral vision to development, where many studies look at attention but far fewer examine peripheral vision. One needs to understand possible confounds in the same way as in the target paper.
Martelli, Mancuso, & Zoccolotti note that allocentric neglect leads to errors that sound a bit like crowding. They ask whether there might be a peripheral vision component. This question has particular relevance since search tasks are a popular experimental paradigm for studying the topic.
In an interesting collection of commentaries, Mattei and Junker & Huber suggest that attentional blink (AB) might be due at least in part to temporal crowding. Dux, Dell’Acqua, & Wyble, on the other hand, describe AB as obviously non-perceptual and teaching us something about the time course of attention.
I do think that Mattei and Junker & Huber have a valid point that some AB phenomena, such as lag-1 sparing, may have perceptual explanations. Experimenters create stimuli by presenting a sequence of discrete symbols, but the visual system inputs not a list of symbols but a spatiotemporal signal. Early in vision, it represents this as energy in multiple bands, not as items for selection. We know far less about temporal crowding than about spatial, but it seems likely that vision pools over significant portions of time as well as over space, even at the fovea. This could lead to temporal crowding with familiar signatures. In fact, the temporal phenomena of AB can sound eerily like spatial crowding: The stimulus appears to contain separate items, and yet, puzzlingly, it can be difficult to individuate and report them under certain circumstances. Crowding has well-known dependence on stimulus similarity, which could conceivably make discerning two letter targets separated by a more distinct number harder than discerning two adjacent letter targets. Even without crowding, sequential, similar target letters, such as a K, H, or B, may be grouped by motion into a K-H morph, or even a K-H-B morph, like a flip book animation. Retrospectively, the observer may be able to parse these chunks into a reasonable guess as to the individual letters, whereas a K-6-H sequence with more complex motions may not create such a comprehensible chunk.
That said, I agree with Dux, Dell’Acqua, & Wyble that better performance when one has to report only the second target is a clear sign of an attention-like mechanism, and probably deserves to be listed in Box B. It does seem like we learn something about the timing of some kind of attentional or task-switching mechanism, though it might be helpful to understand temporal crowding to disentangle the timings of perceptual as opposed to attentional mechanisms.
Hertzmann, meanwhile, takes note of a different point in the target article: a summary-statistic encoding has still-unrealized implications for understanding much of later visual processing. In particular, he rethinks 3D vision across eye movements through the lens of both a lossy encoding in peripheral vision and preferentially preserving ecologically valid information across fixations. This seems like a promising approach.
Most finds value in the exercise of banning the word “attention” when studying individual biases and anxiety disorders, though for a somewhat opposed reason. He suggests that researchers in that field may be finding a confusing pattern of results because they are examining different kinds of attention. One might also want to do Francis & Thunell’s analysis on what power is necessary to use a dot-probe task to study what sounds like individual differences or impact of training.
Finally, Mattei deserves a shout-out for reasoning about task complexity and about perception occurring as the result of a task. He focuses on ensemble perception and, by extension, whether attention is needed for action. Vanunu & Ratcliff also discuss ensemble perception, and correctly suggest that the ability to report the mean size of a set of circles cannot serve as an alternative to attention nor as a paradigm shift. However, this confuses the rich image statistic representation in my peripheral vision model with the task of reporting ensemble properties; see also Kristjánsson & Chetverikov, who clarify the difference between simple ensemble statistics and the summary statistics in the model.
Two other sources of confusion worth discussing
Several commentators associate attention with awareness. Dux, Dell’Acqua, & Wyble, as an example, describe the observer as faced with far too much information “for it all to be processed up to the level of consciousness.” The possible association between attention and awareness has been much debated (see Koch & Tsuchiya, 2007; Cohen, Dennett, & Kanwisher, 2016; Rosenholtz, 2020). I would presume that consciousness faces limits beyond those that govern attention. We know from perception for action that quite a lot of perception can occur without awareness. Along these lines, Hertzmann might be cautious about assuming that when observers fail to notice redirected walking in VR that is because their visual systems have thrown away inconsistent information; apparently researchers in that field note that even observers who fail to notice the trick appear more likely to experience motion sickness (N. Williams, personal communication, March 15, 2025). Personally, I would not attempt to fix the mismatch between awareness and action by proposing two separate systems (Most), as this adds considerable complexity to the theory.
More benignly, Duncan & Theeuwes have mistaken my point about the 40-ms effects in their distraction paradigm. They say that in the real world, 40 ms can matter a lot. I wholeheartedly agree, particularly in applications such as vision in driving (Wolfe et al., Reference Wolfe, Kosovicheva and Stent2021). The target article did not actually say that I thought that 40 ms is not important, but rather that the observer might be unaware of or unconcerned by that additional time. In other words, the effect might depend on the effort the observer was willing to put in to ignore the distraction (Koevoet & Van der Stigchel). Along these lines, O’Bryan & Song report that, in real-world tasks, observers may not be distracted by irrelevant salient stimuli, and, in fact, sometimes show the opposite effect.
“All models are wrong, some are useful”
This famous quote from George Box (Box, Reference Box, Launer and Wilkinson1979) is apt for several of the commentaries. These ask both about the quality of the peripheral vision model and the falsifiability of task complexity.
First, Bornet, Herzog, & Doerig provocatively ask whether we should “start a revolution with a refuted model.” Work by these researchers has demonstrated that the Texture Tiling Model (TTM) for peripheral vision appears not to preserve enough information to predict performance on some Mooney face tasks. They have also argued that TTM cannot explain grouping effects in Vernier acuity tasks. I argue that Vernier acuity is a special hyperacuity task, and quibble with whether they have performed the definitive test, but I necessarily leave this detailed discussion for another venue. Their suggestion echoes the concern that attention influences crowding (Carrasco), though those results are consistent with cueing improving performance rather than changing the peripheral representation. Bornet et al. also state that TTM is unlikely to be a good model of vision in general. TTM is not intended as a model of vision in general; one would say the same of a model of the retina, or of V1.
Even if we take as a given that TTM is wrong in these and no doubt other ways, it is nonetheless useful. In particular, the claim that visual attention is in crisis relies predominantly on predictions for visual search, scene perception, and dual-task performance. We have backed up all of these predictions with peripheral vision experiments, plus, in some cases, intuitions from TTM to help us understand how the sparse dual-task displays relate to the crowded search ones. Hulleman, together with Olivers (2017), provide corroborating evidence. FIT is in trouble – and attentional mechanisms need rethinking – without predictions from TTM. What TTM does provide is the glue that allows us to see that a single model can predict all of these results; in other words, that peripheral vision more parsimoniously explains that entire set of phenomena than attention does. Furthermore, a model can clearly be “wrong” and support a paradigm shift; just look at Copernicus’ version of heliocentrism.
Ma et al. ask whether the peripheral vision explanation for some phenomena gives us “greater precision in understanding cognition and behavior” than the old attentional explanation. They suggest asking “how much variance in [supposedly] attentional tasks can be accounted for by peripheral factors alone.” I certainly agree that we should not accept a new handwavy theory in place of the old, but TTM is a useful model in just this way. Unlike old word-model predictions for things like visual search, with ill-defined attention mechanisms, TTM can make quantitative predictions. Zhang et al. (2015), for instance, show that peripheral mechanisms explain 75% of the variance for the 10 search conditions studied. (We should not, I think, presume that all the remaining 25% of the variance is attention.) In addition, Chang & Rosenholtz (2016) showed the model could predict in advance the results for five new pairs of search conditions.
On the other hand, Koevoet & Van der Stigchel and Chapman, Addleman, & Störmer worry that task complexity remains ill-defined, and the latter ask whether that theory is testable. I agree that we would need to nail down how to operationalize task complexity, though I have found it useful to get intuitions even without that precision. Rosenholtz (2017) discusses testability in more detail. In brief, for a given measure of complexity, and a candidate visual representation, such as from a pre-trained neural network, one would want to ask whether, for example, hard dual tasks correspond to higher complexity than easy dual tasks. Here, Koevoet & Van der Stigchel have, I think, a critical point: one should do one’s best to normalize the effort exerted by the observers across conditions.
Paradigm shift = Bad?
Overwhelmingly, commentators most agreed with each other either in declaring visual attention not in crisis or not in need of a paradigm shift, and/or in suggesting that such a paradigm shift would be pointless or detrimental to science. Here, I review those arguments and respond briefly. In Paradigm Shift Good, I argue for the feasibility and benefits of a paradigm shift.
Someone already fixed the problems, and the field is making progress
In this category, commentaries suggest that there is no crisis and no need for a paradigm shift because, for example, Wolfe’s theory (Poth & Poth) has resolved the theoretical issues, or researchers have developed normalization theories of attention (Carrasco, Denison), or bottleneck models (Dux, Dell’Acqua, & Wyble) to explain a subset of behavioral and physiological effects. I believe the field can develop a coherent theory that explains a wider range of phenomena. These researchers also point to extensive, existing attempts to control perceptual factors by keeping the stimulus constant while varying the task. Such methodological changes do represent an important step. Better understanding of perceptual factors can help, by allowing one to study a broader set of phenomena while accounting for stimulus differences using a model. Controlling for decision effects – i.e., what would an ideal observer do – poses greater difficulties. Cueing does change the task (Wolfe). When a cue or prior knowledge reduces uncertainty about the location or identity of a target that can improve performance even in an unlimited capacity system; as a result, performance improvements may not result from selection in the traditional sense. Studying attention is hard.
Commentaries also objected to calling for a crisis or a paradigm shift because the field continues to make progress. Kristjánsson & Chetverikov point to improved understanding of how the brain prioritizes and gathers information. They suggest that loose definitions have been sufficient to guide a diverse set of experiments. Posner and Ma et al. point to progress in neuroscience, Dux, Dell’Acqua, & Wyble in attentional blink, and Carrasco regarding effects of selection on spatial frequency and contrast sensitivity. The list could go on. Herein, however, lies a misunderstanding. Labeling a field “in crisis” does not mean the field has not progressed. Ptolemaic astronomers productively made celestial measurements, and these were useful for guiding later theory. The questions are whether the old paradigm has fatal issues – a claim that no commentary really disagreed with – and whether a new paradigm consisting of theory, methods, and puzzles to solve has truly replaced it.
Contrary to these defenders of the current paradigm, Francis & Thunell argue against a paradigm shift because the anomalies might not be real, but rather the result of underpowered experiments. This certainly sounds to me like a sign of crisis, albeit of a different sort. I doubt this is the entire story, though it plausibly suggests delaying integration of some empirical results while waiting for the field to sort this out.
Concern about a paradigm shift
Several commentaries express what seems to me excessive concerns about the consequences of a paradigm shift. They worry that it would cause us to lose understanding of the different brain areas involved, of how attention influences sensory and memory processing, and of how the brain makes use of top-down information (Posner). It would risk a replication crisis and disrupt scientific progress by “preventing incremental science” (Poth & Poth). Re-interpreting existing experimental results “invalidates the entire approach of hypothesis testing” (Duncan & Theeuwes). A paradigm shift does not require a collective amnesia of past work, and healthy science does rethink the meaning of past experiments based on new understanding.
Attention is an apple and paradigm shifts are for oranges
A number of commentaries appear to argue that attention is simply not the kind of thing that can require a paradigm shift. Is attention simply self-evident? “Essential” because “it is a resource-limited process” (Ma et al.)? Commentaries state the obviousness of prioritization (Poth & Poth) and a need to focus – when someone selects something to respond to, they attend to that thing; “one hardly needs experiments” (Wu). (Some of these comments may simply reflect the misconception that the target article did away with attention entirely.) Commentators describe attention as effect or outcome, what we need to explain rather than the cause (Morsella et al., Dudarev & Enns, Wu). Kristjánsson & Chetverikov suggest we consider attention a general concept, like perception, and Denison posits that attention might not be a theory. To all of this, I ask: what non-obvious concept worthy of scientific study does require a theory? Let’s discuss that instead.
No point in a unifying theory
Other commentaries suggest that seeking a unifying theory is misguided. Wu asks whether there has ever been an agreed-upon attention paradigm, noting differences between dominant paradigms in behavior as opposed to physiological studies. Many different reasons to prioritize information supposedly correspond to many kinds of selectivity (Dudarev & Enns). There simply exist different kinds of attention and many different kinds of mechanisms, as indicated by use of different brain areas (e.g., Posner, Carrasco), with different timing and performance characteristics (Carrasco). Pinning down attention too narrowly can lead the field astray (Kristjánsson & Chetverikov). The next section addresses this apparent proliferation and asks what it means to have a unifying theory.
Paradigm shift good
Having gone through a multi-year exercise of rethinking visual attention, I understand that it looks like a significant amount of work. Discarding old understandings can be difficult and even painful, especially if we have put substantial effort into the old paradigm. The reader might question whether the rewards justify the effort. Having seen the benefits in understanding a broad array of phenomena and making predictions, I believe the effort is well worthwhile. Let us start by addressing the question of whether there might yet be hope for a unifying theory. I argue that we should not worry too much about the apparent diversity of mechanisms.
Is the proliferation of mechanisms important?
What does it mean for there to be a unifying theory, with one capacity limit and one flexible mechanism for dealing with that limit? At its most basic level, it means that we get predictive value from thinking that way. In vision science, linear systems analysis provides a classic example. One can make quite good predictions about detection thresholds for a diverse set of patterns – circles, checkerboards, noise – by measuring thresholds for narrow spatial frequency bands, and reasoning about how one combines those spatial frequencies to form the more complex patterns (e.g., Watson, Reference Watson2000). To a first approximation, one need not worry about circle-detection, checkerboard-detection, and noise-detection mechanisms. Trained neural networks provide a more modern example. If one can train a network to extract generally useful features, and based on those features predict recognition of squirrels, apples, and toys, that provides a unifying theory of object recognition (a model that is, of course, wrong, but useful). Note that this does not imply that squirrel recognition occurs in the same brain area as apple recognition, nor that performance is the same for both tasks.
There are a number of reasons why the apparent diversity of attentional mechanisms might not be real. Purported attention mechanisms may have proliferated for a number of reasons:
-
(1) The dominant theory at the time may have predicted the existence of multiple mechanisms
-
(2) Lumping together perceptual, attentional, and decision-making phenomena may have made it appear that more attentional mechanisms were needed
-
(3) Confusing empirical results with mechanisms
-
(4) Not adequately considering inherent task demands
Consider the exogenous/involuntary vs. endogenous/voluntary distinction as an example (for simplicity, I will refer to this as exogenous vs. endogenous). I do not intend this discussion as a debunking of that distinction; Carrasco, for instance, argues for it on the basis of a differential effect on the contrast sensitivity function, and I will not address that here. Rather, this distinction provides a useful example because it demonstrates several of the potential pitfalls.
FIT predicted a difference between automatic, preattentive processing and processes that required attention. As a result, early work may not have adequately questioned the exogenous/endogenous subdivision. Now that we are rethinking FIT and automatic processes, this deserves a second look.
One can change the task from spatial attention to feature attention, but that does not mean that observers use different mechanisms to perform those tasks. Similarly, one instructs observers to follow an endogenous arrow cue, whereas internal goals might drive them to follow an exogenous cue. But this does not mean that the two do not have shared mechanisms. If they appear to utilize different brain regions, that might be because of non-attentional processes the two tasks do not share, such as arrow perception or interpreting the experimenter’s instructions.
One often sees arguments for an exogenous/endogenous distinction on the basis of difference in timing (e.g., Cheal & Lyon, 1991). However, exogenous vs. endogenous cues arguably have inherently different time courses. Observing response to an exogenous cue after 50 ms, and to endogenous cues after 400 ms might not mean the two require different mechanisms. If one endogenous arrow pointed to a second endogenous arrow, which finally pointed to the target location, that complex cueing might require 600 ms. Most of us would not consider that difference in timing evidence of a new attentional mechanism.
Paradigm shifts and ambitious science
A call for a paradigm shift is an appeal for a return to a bold, ambitious science of visual attention. This would mean a return to the promise that understanding a single capacity limit might allow one to predict performance for a wide range of tasks, from simple laboratory experiments through complicated, ecologically valid real-world perception for action. Ultimately, this is the promise of science: that we not merely test hypotheses and describe empirical results but build a deeper understanding that enables intuitions and extrapolation to experiments not yet run. In addition, it can be deeply satisfying to ponder a field’s worth of empirical results and attempt to make sense of them all.
Of course, not everyone needs to work towards this ambitious goal. Some may prefer to collect the empirical data that others use to build a theory. Other scientists may elect to develop a theory of a more targeted set of phenomenology. Science benefits from a diversity of approaches, skills, backgrounds, and personalities.
Those who do want to aim higher do not need to take as given my list of phenomena in need of explanation. Dudarev & Enns may want to add neural and behavioral results of planning and executing an action. Duncan & Theeuwes may want to add more distraction phenomena, Wolfe examples from search. Godwin, Hout, & Barnhart may want to add some aspects of overt attention back into the fold. Carrasco and Chapman, Addleman, & Störmer might argue for including shifts in representation and appearance as a result of attention. Importantly, note that the intent of the list is not to capture all phenomena that might involve attention. (Recall, after all, that in the task-complexity theory, all perception requires attention, i.e., requires performing a task.) Rather, the list aims to be conservative by excluding phenomena that might have important sensory or decision confounds or otherwise be primarily governed by another mechanism. In so doing, we hope to minimize the chances of going down a blind alley in search of a theory.
Similarly, I have found the task complexity way of thinking quite useful. But others may prefer a different unifying theory. Bornet, Herzog, & Doerig and Hulleman seem to prefer something closer to the flexible pooling idea, which Krauzlis also supports. Others may wish to (continue to) develop their own ambitious theory, such as Belledonne & Yildirim, Chapman, Addleman, & Störmer, or Wolfe.
I invite other researchers to tackle the crisis in visual attention head-on and work together to drive the paradigm shift forward. This ambitious endeavor needs diverse perspectives and contributions from throughout the scientific community. Together, we can push the boundaries of our understanding and make significant strides in cognitive science.
Acknowledgments
I would like to thank Jen Corbett, Niall Williams, Zoe Xu, and Robin Adelson for helpful discussions. Special thanks to Ted Adelson for his unwavering support and insightful conversations. Additionally, I am grateful to my fellow speakers at the 2024 Vision Sciences Society Symposium, “Attention: accept, reject, or major revisions?” Alon Zivony, Britt Anderson, Wayne Wu, and Sarah Shomstein, for their camaraderie and moral support. I greatly appreciate their shared enthusiasm for discussing attention.
Target article
Visual Attention in Crisis
Related commentaries (30)
(Temporal) Visual Attention NOT in Crisis
40 ms matters
A new algorithm of human attention
Attention and visual search: No crisis here
Attention in evolutionary perspective
Attention is doing just fine! Just don’t take it too seriously
Attention is still a productive framework
Attention, the homunculus, and the Greek theater effect
Banishing “Attention” from the study of temporal attention
Beyond the Blink: How Task Complexity, Temporal Crowding, and Ensemble Perception Reframe the Debate on Attention and Action
Building attention on a firm foundation
Crisis, contextualized: A much broader theoretical shift is needed
Development is a pathway for understanding visual attention and peripheral function
In defense of attention: why perceptual selection cannot be replaced by decision boundaries
Interactions between cortical and subcortical circuits for visual attention
Is allocentric neglect an attentional disorder?
Is attention a theory?
Lossy processing principles in 2D and 3D vision
Low experimental power makes a crisis in visual attention inevitable, but easy to address
Mechanistic disunity as attention in crisis
No crisis when attention is the outcome of selective action
Pay attention to eye movement behavior
Peripheral vision and attention: A longstanding dissociation
Putting effort into task complexity
Seeing attention in inattentional blindness
Spurious crisis versus sustainable science
Starting a revolution with a refuted model?
The (mis)use of the gate metaphor for attention
The interplay between selective attention and summary statistics
Visual attention as an integrated sensorimotor process
Author response
Toward a paradigm shift in visual attention