Hostname: page-component-7bb8b95d7b-w7rtg Total loading time: 0 Render date: 2024-09-20T12:54:37.276Z Has data issue: false hasContentIssue false

Movement vigor: Frameworks, exceptions, and nomenclature

Published online by Cambridge University Press:  30 September 2021

Rory John Bufacchi
Affiliation:
Center for Life Nano- & Neuro-Science, Italian Institute of Technology (IIT), 00161Rome, Italyrory.bufacchi@iit.it, giandomenico.iannetti@iit.itwww.iannettilab.net Department of Neuroscience, Physiology and Pharmacology, University College London (UCL), LondonWC1E 6BT, UK.
Gian Domenico Iannetti
Affiliation:
Center for Life Nano- & Neuro-Science, Italian Institute of Technology (IIT), 00161Rome, Italyrory.bufacchi@iit.it, giandomenico.iannetti@iit.itwww.iannettilab.net Department of Neuroscience, Physiology and Pharmacology, University College London (UCL), LondonWC1E 6BT, UK.

Abstract

Shadmehr and Ahmed cogently argue that vigor of appetitive movements is positively correlated with their value, and that value can therefore be inferred by measuring vigor. Here, we highlight three points to consider when interpreting this account: (1) The correlation between vigor and value is not obligatory, (2) the vigor effect also arises in frameworks other than optimal foraging, and (3) the term vigor can be misinterpreted, thereby affecting rigor.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

1. Vigor is not an obligatory readout of value

The optimal foraging framework states that organisms maximize utility. Maximizing utility brings about a correlation between vigor of movement toward a target, and the reward offered by that target. On the basis of this correlation, Shadmehr and Ahmed suggest using vigor as a readout of value. However, one must be cautious when drawing such a conclusion, because it relies on reverse inference; it overlooks the basic tenet that correlation does not necessarily imply causation. We feel this is particularly important, because there are many exceptions to the vigor-value correlation. An armchair ethologist might say that neither the leopard stalking its prey, nor the poker player bluffing a hand, show the target's value through vigorous actions. Indeed, the vigor-value relationship is only obligatory when comparing two identical situations that only differ in the value of an object. There are many other reasons for vigor of a movement to change even if the target's value is constant. This weakens Shadmehr and Ahmed's claim that vigor can be used to reliably infer value.

Although the authors do discuss some other factors that modulate vigor, such as reward history and stimulus uncertainty, they provide such a compelling account of the correlation between vigor and value that it is easy to forget that there is no obligatory one-to-one mapping between the two. To clarify this point, we will consider cases where the usual positive correlation between expected value and movement vigor is reversed. These examples serve to highlight that care must be taken when using vigor to infer value.

An inverse relationship between vigor and expected reward: Vanishing expected reward. Shadmehr and Ahmed emphasize that smaller expected reward results in lower vigor. This is not necessarily the case when the reward is only obtainable for a limited amount of time.

Consider a situation where a reward is likely to disappear within a short time-window (e.g., multiple dogs trying to eat from the same bowl with limited food). Here, a fast appetitive movement has a higher probability of obtaining the reward than a slow movement. If the reward is expected to be available for a shorter time, the agent should move faster, but is less likely to obtain the reward; the expected reward is smaller. Therefore, in such situations, one sees higher vigor for smaller expected reward (Kue, Avgar, & Fryxell, Reference Kue, Avgar and Fryxell2013). The early bird gets the worm.

This scenario reverses the usual relationship between expected reward and vigor, while it remains entirely consistent with optimal foraging theory. To demonstrate this, assume that the probability P that the reward is present decreases linearly over time T (Fig. 1A):

(1)$$P( T ) = \left\{{\matrix{ {0\quad \quad \quad \quad\hskip10.5pt {\rm for}\quad \quad T < 0}\hfill \cr {1-\displaystyle{T \over {T_{{\rm end}}}}\quad \quad {\rm for}\quad \quad\hskip0.5pt 0 \le T \le T_{{\rm end}}} \hfill \cr {0\quad \quad \quad \quad \hskip10.5pt {\rm for}\quad \quad T > \;T_{{\rm end}}\;\;\;\;\;\;\;}\hfill \cr } } \right.$$

Therefore, the expected reward E[r] decreases as a function of time:

(2)$$E[r] = \left\{{\matrix{ \!\!\!{\alpha ( {1-cT} ) \,\,\,\,\,\,\,\;\;\;\;{\rm for}\,\,\;0 \le T \le \displaystyle{1 \over c}} \cr {0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\,\;{\rm for}\,\,\;T > \displaystyle{1 \over c}\;\;\;\;\;\;\;\;} \cr } } \right.$$

where c = 1/T end, and α is the magnitude of the reward if it is obtained. We now demonstrate that a decrease in expected reward can lead to an increase in vigor. Following the authors' convention, we write utility J as the temporally discounted sum of the expected reward E[r] and the effort e = −AT − B/T:

(3)$$J( T ) = \left\{{\matrix{ {\displaystyle{{\alpha ( {1-cT} ) -AT-B/T} \over {( {1 + \gamma T} ) }}\quad{\rm for}\,\,\;0 \le T \le \displaystyle{1 \over c}}\hfill \cr {\displaystyle{{0-AT-B/T\;\;\;\;} \over {( {1 + \gamma T} ) }}\hskip5pt\quad\quad\quad{\rm for}\,\,\;T > \displaystyle{1 \over c}\;\;\;\;\;\;\;\;\;} \hfill\cr } } \right.$$

where A specifies base metabolic rate and B movement cost. γ is the time-discount factor. We can find the optimal movement duration T* by setting dJ(T)/dT = 0:

(4)$$\;\;\;T^\ast{ = } \;\displaystyle{{B\gamma + \sqrt {\alpha Bc + \alpha B\gamma + AB + B^2\gamma ^2} } \over {\alpha c + \alpha \gamma + A}}$$

This shows that as the reward vanishes sooner (i.e., c increases), the utility J decreases, but the optimum movement speed V* = 1/T* increases (Fig. 1B). In other words, when the expected reward is smaller, movement speed is faster.

Figure 1. Decreasing expected reward can also enhance vigor. (A) We consider a situation in which the probability that a reward is still present decreases linearly with time, proportionally to a constant c (e.g., when many dogs try to eat a limited amount of food from the same bowl). (B) The utility of an appetitive movement toward a reward depends on the movement duration. For each value of c, the optimum duration is different (the actual reward is a constant value, here set to 20). An expectation that the reward vanishes more quickly (i.e., with higher c), corresponds to a shorter optimal movement time (black circles). (C) Therefore, when the reward is expected to be present for a longer time, vigor is lower. (D) Analogously, when expected reward is lowered by time-limiting its availability, movement vigor increases.

This effect is ubiquitous in marketing: make a product available for a short period and watch your sales soar. Individuals are faster and more likely to choose a product they think might go out of stock, even if its value is less than its alternatives (Byun & Sternquist, Reference Byun and Sternquist2012; Inman, Peter, & Raghubir, Reference Inman, Peter and Raghubir1997; Maimaran & Salant, Reference Maimaran and Salant2019). Thus, the positive vigor-expected value correlation is not obligatory, and vigor cannot be easily exploited as an indirect measure of an expected reward as Shadmehr and Ahmed affirm.

Another example: defensive escape behavior. Most of the examples in the book relate to appetitive behavior, which leads to obtaining a reward. But another example of disruption of the positive correlation between vigor and value is that of defensive and avoidance behaviors, in which the increased negative valence of a state often results in increased vigor.

Consider a situation in which a given area of space comes with a non-zero probability of punishment per unit time (e.g., a room with a cat for a mouse). The mouse will be more incentivized to exit the area if the probability of harm per unit time Ph is greater. Assuming Ph is constant, the expected harm from predation is then:

(5)$$E[h] = \int_0^{T_{{\rm esc}}} {\beta P_h\,{\rm d}T} = \beta P_hT_{{\rm esc}}$$

where β is the harm done if the agent is harmed. Utility of the escape can be written as:

(6)$$J( T ) = {-}\beta P_hT-AT-B/T$$

Thus, the optimal escape duration T* decreases as Ph increases (Fig. 2):

(7)$$T^\ast{ = } \;\displaystyle{{B\gamma + \sqrt {B( {A + \beta P + B\gamma^2} ) } } \over {A + \beta P}}$$

Therefore, the more threatening the environment, the more vigorous the escape, even though the absolute utility is always negative. This emphasizes that vigor does not depend on absolute utility per se, but rather on differential utility between the current state and the state after the action (Friston et al., Reference Friston, Schwartenbeck, FitzGerald, Moutoussis, Behrens and Dolan2013; Jocham, Hunt, Near, & Behrens, Reference Jocham, Hunt, Near and Behrens2012).

Figure 2. Increasing expected harm can enhance vigor. (A) We consider an environment with a fixed probability per unit time (P/t) that an agent will be harmed (e.g., a room with a cat, for a mouse). As such, the expectation of harm increases linearly with time (the harm that the agent would experience if the dangerous event occurs is a constant value set at −300). (B) The utility of escaping from the dangerous region depends on the movement duration. For each value of P/t, this optimum duration is different, because moving also entails a cost. Here, greater probability of harm leads to a shorter optimal movement time. (C) Therefore, when probability of harm increases, vigor increases. (D) Analogously, expected harm correlates positively with the vigor of a harm-reducing movement.

To better appreciate the difference in the effect of cost and reward on vigor, we should start thinking about vigor as a velocity (i.e., a vector) rather than a speed (i.e., a positive scalar). A threatening situation can increase the velocity of a movement, but only of a movement in the direction that decreases cost. Therefore, when attempting to interpret vigor as a readout of value, one should be conscious of the effect of that movement.

Role of dopamine and serotonin in modulating vigor. The authors explain beautifully how dopamine levels in SNr rise in rewarding environments and increase vigor. Serotonin produced in the dorsal raphe nucleus contributes to this effect by tracking reward history and acting as an antagonist to dopamine: it promotes sloth. Highly-rewarding environments thereby increase vigor, whereas low-reward environments produce sloth. However, this account is incompatible with the effect of a threatening environment on vigor that we just demonstrated. Perhaps the solution lies again in not considering vigor to be an absolute (i.e., only positive) scalar variable, but by always bearing in mind its direction. This can resolve the seemingly confusing effect of serotonin, which promotes vigor in stressful environments (Seo et al., Reference Seo, Guru, Jin, Ito, Sleezer, Ho and Warden2019). Where dopamine could be seen as promoting attraction, and therefore vigor in appetitive movements, serotonin might be seen as promoting avoidance, slowing appetitive behavior when the threat-level is low, and promoting escape behavior when the threat-level is high.

2. The vigor effect also arises in frameworks other than optimal foraging

Shadmehr and Ahmed explain why the vigor effect exists through the ethological optimal foraging framework. Other frameworks such as reinforcement learning and active inference can equally provide a why, while also having more explanatory power. However, they are more complex, difficult to understand, and thus make it harder to gain clear behavioral insights.

Reinforcement learning (RL). Like in the optimal foraging framework, movements in RL are selected to maximize utility. As a consequence, RL also shows that, all things being equal, more valuable stimuli can result in more vigorous movements (Niv, Reference Niv2009; Niv, Daw, Joel, & Dayan, Reference Niv, Daw, Joel and Dayan2007). In RL an agent is given the goal to maximize cumulative discounted future reward R by interacting through actions A with its environment. The value Q of performing an action a in a state st at time t under a policy π is the expectation E of the discounted cumulative reward R:

(8)$$Q^\pi ( {s_t, \;a} ) = E^\pi \left[{\;\mathop \sum \limits_{i = t}^\infty \,\gamma_{{\rm RL}}^i R_{t + i}\;\vert \;s_t, \;a\;} \right]$$

where $\gamma _{{\rm RL}}^i < 1$ is the standard RL discount factor.

This means that RL can result in formulae of utility of action (or value of actions, in RL terminology) very similar to those of optimal foraging. For example, Eq. (3) – an expression of utility under optimal foraging – is the first term of RL Eq. (8), in an RL scenario under the following circumstances:

  1. 1) Time steps i of variable length Ti represent one reaching movement

  2. 2) Action ai specifies Ti

  3. 3) We define $\gamma _{{\rm RL}}^i = 1/( 1 + \gamma T_i) $

  4. 4) We define R i = α(1 − cT i) − AT i − B/T i

Therefore, as Shadmehr and Ahmed themselves mention, “a reinforcement learning framework […] allows for a more dynamic model of learned reward and punishment values in changing environments. For example, in a reinforcement learning framework, one can consider more realistic scenarios in which the value of states in the future depends on the current action.” On the other hand, optimal foraging is useful exactly because it simplifies situations to a point where they can be manipulated intuitively, and with more tractable maths.

Active inference. Both RL and optimal foraging can be understood as part of wider frameworks postulating that biological entities aim to keep themselves alive and pass their genes to future generations. Such frameworks state that agents keep certain variables within specific bounds, such as heart rate, temperature, and bodily integrity (Ashby, Reference Ashby1952). Selecting actions to maximize utility is an important part of this story, but not the whole of it. Instead, such wider frameworks show that an agent trying to stay alive will act to maximize utility (like in RL and optimal foraging) but also to explore its environment (Clark, Reference Clark2013). For example, in active inference a direct consequence of staying alive is that “agents should sample […] the parts of the sensory environment that resolve most uncertainty about the causes of their sensations” (Parr & Friston, Reference Parr and Friston2017).

Shadmehr and Ahmed show a strong relationship between certain features of a visual stimulus and the vigor of saccading toward it. They explain this effect through image reward: people prefer looking at faces over white noise. However, saccades contribute strongly to exploring the environment, and outside a laboratory they don't often lead directly to a reward. Furthermore, saccade vigor to aversive/threatening images is higher than to neutral stimuli (Schmidt, Belopolsky, & Theeuwes, Reference Schmidt, Belopolsky and Theeuwes2015), and there is little difference in saccade vigor when pleasant and unpleasant scenes are matched (Nummenmaa, Hyönä, & Calvo, Reference Nummenmaa, Hyönä and Calvo2009). Should one then conclude that aversive and disgusting scenes are also rewarding to the viewer?

This question highlights an issue with the central tenet of the authors' perspective – in some instances, forcing observations to fit a framework not designed to accommodate them becomes detrimental (Chomsky, Reference Chomsky1959). Unlike escapes and vanishing rewards, we feel that the vigor of saccades, and especially fast saccades to aversive stimuli, is not easily understood under the optimal foraging framework. Frameworks that do not solely rely on utility to explain actions might be more useful in this case. Notably, active inference provides a strong framework for modeling saccades (Friston, Adams, Perrinet, & Breakspear, Reference Friston, Adams, Perrinet and Breakspear2012; Parr & Friston, Reference Parr and Friston2018).

3. On the use of the term vigor

Throughout the book, the authors give different meaning to the word “vigor.” At the beginning, vigor is defined as the inverse of the sum of reaction and movement times. Although this seems to be the running definition, at some point (between-subject) vigor is defined as “the relationship between the velocity-amplitude function for movements of one individual with respect to those of the mean of the population” – a definition clearly excluding reaction times. Elsewhere, it is stated that “manipulating the activity of fovea-related cells in the colliculus alters the reaction time and vigor of the macrosaccade,” and “an increase in the effort expenditure […] should dampen changes in vigor and reaction time.” In these instances, the authors use vigor as a direct synonym of rotational speed, separately from reaction time.

Besides the definitional inconsistencies in the book, the term “vigor” is commonly understood as denoting strength or liveliness (as shown in the Oxford dictionary, reflecting the Latin etymology). In scientific writing the rationale for choosing a word should be clear. One must be careful of introducing new terms on literary merit. An epistemological perspective could contribute to this discourse: The incontrovertibility of terms rule (Gardiner & Java, Reference Gardiner and Java1993) states that within a field of science there should be a one-to-one mapping between terms and meanings. The occasional many-to-many mapping of the word “vigor” in this book might lead to confusion.

Given that there is currently no term that succinctly summarizes reaction and movement time in the way that vigor does, vigor defined as “the inverse of reaction time plus movement time” can be a novel addition to the scientific lexicon of motor control, provided one considers the caveats we have discussed. However, any empirical study demonstrating a correlation between vigor and a given experimental variable would not be complete without dissecting whether the effect of that variable is on either reaction time, movement time, or both. Therefore, we feel that the use of “vigor” as a scientific, rather than an evocative, term is up for debate.

Financial support

This study is supported by the Wellcome Trust Strategic Award (GDI, grant number COLL JLARAXR) and by the ERC Consolidator Grant (GDI, grant number PAINSTRAT).

Conflict of interest

None.

References

Ashby, W. R. (1952). Design for a brain an introduction to cybernetics. Chapman & Hall LTD.Google Scholar
Byun, S. E., & Sternquist, B. (2012). Here today, gone tomorrow: Consumer reactions to perceived limited availability. Journal of Marketing Theory and Practice, 20(2), 223234. https://doi.org/10.2753/MTP1069-6679200207CrossRefGoogle Scholar
Chomsky, N. (1959). A review of B. F. Skinner's verbal behavior. Language, 35(1), 2658.CrossRefGoogle Scholar
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181204. https://doi.org/10.1017/S0140525X12000477CrossRefGoogle ScholarPubMed
Friston, K., Adams, R. A., Perrinet, L., & Breakspear, M. (2012). Perceptions as hypotheses: Saccades as experiments. Frontiers in Psychology, 3(MAY), 120. https://doi.org/10.3389/fpsyg.2012.00151CrossRefGoogle ScholarPubMed
Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., & Dolan, R. J. (2013). The anatomy of choice: Active inference and agency. Frontiers in Human Neuroscience, 7(SEP), 118. https://doi.org/10.3389/fnhum.2013.00598CrossRefGoogle ScholarPubMed
Gardiner, J. M., & Java, R. I. (1993). Recognising and remembering. Theories of Memory. https://doi.org/10.4324/9781315782119-6Google Scholar
Inman, J. J., Peter, A. C., & Raghubir, P. (1997). Framing the deal: The role of restrictions in accentuating deal value. Journal of Consumer Research, 24(1), 6879. https://doi.org/10.1086/209494CrossRefGoogle Scholar
Jocham, G., Hunt, L. T., Near, J., & Behrens, T. E. J. (2012). A mechanism for value-guided choice based on the excitation-inhibition balance in prefrontal cortex. Nature Neuroscience, 15(7), 960961. https://doi.org/10.1038/nn.3140CrossRefGoogle ScholarPubMed
Kue, D., Avgar, T., & Fryxell, J. M. (2013). Density- and resource-dependent movement characteristics in a rotifer. Functional Ecology, 27, 323328. https://doi.org/10.1111/1365-2435.12065Google Scholar
Maimaran, M., & Salant, Y. (2019). The effect of limited availability on children's consumption, engagement, and choice behavior. Judgment and Decision Making, 14(1), 7279.Google Scholar
Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139154.CrossRefGoogle Scholar
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and the control of response vigor. Psychopharmacology, 191(3), 507520. https://doi.org/10.1007/s00213-006-0502-4CrossRefGoogle ScholarPubMed
Nummenmaa, L., Hyönä, J., & Calvo, M. G. (2009). Emotional scene content drives the saccade generation system reflexively. Journal of Experimental Psychology: Human Perception and Performance, 35(2), 305323. https://doi.org/10.1037/a0013626Google ScholarPubMed
Parr, T., & Friston, K. J. (2017). The active construction of the visual world. Neuropsychologia, 104(July), 92101. https://doi.org/10.1016/j.neuropsychologia.2017.08.003CrossRefGoogle ScholarPubMed
Parr, T., & Friston, K. J. (2018). Active inference and the anatomy of oculomotion. Neuropsychologia, 111(October 2017), 334343. https://doi.org/10.1016/j.neuropsychologia.2018.01.041CrossRefGoogle ScholarPubMed
Schmidt, L. J., Belopolsky, A. V., & Theeuwes, J. (2015). Potential threat attracts attention and interferes with voluntary saccades. Emotion (Washington, D.C.), 15(3), 329338. https://doi.org/10.1037/emo0000041CrossRefGoogle ScholarPubMed
Seo, C., Guru, A., Jin, M., Ito, B., Sleezer, B. J., Ho, Y. Y., … Warden, M. R. (2019). Intense threat switches dorsal raphe serotonin neurons to a paradoxical operational mode. Science (New York, N.Y.), 363(6426), 539542. https://doi.org/10.1126/science.aau8722CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. Decreasing expected reward can also enhance vigor. (A) We consider a situation in which the probability that a reward is still present decreases linearly with time, proportionally to a constant c (e.g., when many dogs try to eat a limited amount of food from the same bowl). (B) The utility of an appetitive movement toward a reward depends on the movement duration. For each value of c, the optimum duration is different (the actual reward is a constant value, here set to 20). An expectation that the reward vanishes more quickly (i.e., with higher c), corresponds to a shorter optimal movement time (black circles). (C) Therefore, when the reward is expected to be present for a longer time, vigor is lower. (D) Analogously, when expected reward is lowered by time-limiting its availability, movement vigor increases.

Figure 1

Figure 2. Increasing expected harm can enhance vigor. (A) We consider an environment with a fixed probability per unit time (P/t) that an agent will be harmed (e.g., a room with a cat, for a mouse). As such, the expectation of harm increases linearly with time (the harm that the agent would experience if the dangerous event occurs is a constant value set at −300). (B) The utility of escaping from the dangerous region depends on the movement duration. For each value of P/t, this optimum duration is different, because moving also entails a cost. Here, greater probability of harm leads to a shorter optimal movement time. (C) Therefore, when probability of harm increases, vigor increases. (D) Analogously, expected harm correlates positively with the vigor of a harm-reducing movement.