In response to the focal article by Hernandez et al. (Reference Hernandez, Melson-Silimon and Zickar2025), which raises compelling ethical considerations for recognizing animals as workers, this commentary highlights a critical gap: the lack of standardized tools to measure animal performance in occupational roles. Unlike human workers, whose performance and well-being are assessed using structured assessments and surveys, animal labor remains largely unmeasured by job-specific behavioral tools. Research highlights persistent challenges in measuring animal behavior, particularly a lack of reliable and valid tools, as well as inconsistent terminology—issues that are especially evident in working dog evaluations but are applicable across species (Brady et al., Reference Brady, Cracknell, Zulch and Mills2018).
To extend I-O psychology principles to animal workers, structured tools such as Behaviorally Anchored Rating Scales (BARS) can be adapted to measure performance in occupational roles objectively. BARS are performance-appraisal tools that associate each numerical rating with specific, observable examples of on-the-job behavior, representing different levels of effectiveness for the evaluated dimension (Smith & Kendall, Reference Smith and Kendall1963). They use job-specific behavioral anchors to depict observable actions that support feedback, reduce bias, and enhance objectivity by standardizing behavioral observations (Schwab et al., Reference Schwab, Heneman III. and Decotiis1975).
Potential advantages of BARS over existing animal evaluations
Expanding the principles of industrial-organizational psychology to address the demands of animal workers requires precise and standardized tools to measure job-specific behaviors. Currently, most animal evaluations rely on generalized temperament scales or anecdotal checklists, which capture traits such as aggression or fearfulness without directly linking them to specific occupational tasks (Tetley & O’Hara, Reference Tetley and O’Hara2012; Kaiser & Müller, Reference Kaiser and Müller2021). As these tools remain overly generalized or anecdotal, they lack the precision, clarity, and standardization required to evaluate performance and welfare within discrete roles (Debnath et al., Reference Debnath, Lee and Tandon2015). For instance, the Canine Behavioral Assessment and Research Questionnaire (C-BARQ) effectively assesses aggression and fear in dogs. However, it does not provide explicit connections to job performance criteria (Serpell & Hsu, Reference Serpell and Hsu2001). Similarly, the Horse Grimace Scale identifies pain responses in horses under stress but does not explicitly measure occupational task performance (Dalla Costa et al., Reference Dalla Costa, Minero, Lebelt, Stucke, Canali and Leach2014).
BARS offer a meaningful advancement in performance measurement because they directly link observable, job-specific behaviors to clearly defined performance anchors. By anchoring ratings to explicit behavioral examples, BARS significantly reduce subjective interpretation and bias, thereby increasing interrater reliability (Schwab et al., Reference Schwab, Heneman III. and Decotiis1975; Smith & Kendall, Reference Smith and Kendall1963). For example, rather than rating broad traits such as general fearfulness, a BARS developed for detection dogs might specify anchors such as “maintains focus during scent detection despite environmental distractions” or “responds promptly to handler cues under high-stress conditions” (Klieger et al., Reference Klieger, Kell, Rikoon, Burkander, Bochenek and Shore2018). Instruments such as the Performance Monitoring Instrument (PMI) have begun to shift toward behavioral criteria but still lack the detailed, anchored performance levels characteristic of BARS, which limits their precision and consistency (Rooney & Clark, Reference Rooney and Clark2021).
Additionally, BARS uniquely integrates ethical considerations into performance assessments by embedding welfare indicators directly into the rating system. Observable signs of stress, such as withdrawal, excessive panting, or retreating behavior, can serve as actionable anchors to prompt timely intervention and promote humane treatment (Landy & Farr, Reference Landy and Farr1980; Campbell & Wiernik, Reference Campbell and Wiernik2015). This approach aligns with Browning and Veit’s (Reference Browning and Viet2021) argument that animals should be granted moral consideration based on their capacity to express consent or discomfort through behavioral responses. When such signals are interpreted as meaningful forms of communication rather than performance failure, BARS become not only evaluative tools but instruments for safeguarding animal autonomy and welfare.
Implementing BARS in the context of animal workers could be designed to capture the working relationship between humans and animals, an aspect often overlooked in traditional assessments and the literature. Performance in animal work settings is inherently co-produced, such that animal behavior can be heavily influenced by handlers’ cues and treatment, intentional or otherwise. At the same time, handlers may also have limited control due to factors such as an animal’s temperament or acute stress responses. Jamieson et al. (Reference Jamieson, Baxter and Murray2018), for example, found that detection dogs exhibited significantly higher detection accuracy and fewer stress behaviors when paired with familiar handlers, underscoring the direct influence of human factors on animal performance. Similarly, research by Hemsworth et al. (Reference Hemsworth, Coleman, Barnett and Borg2000) in agricultural contexts has shown that the behaviors of stock people significantly impact livestock productivity and welfare. These findings underscore the importance of evaluating human behaviors in conjunction with animal behaviors, thereby making performance assessments more comprehensive and reflective of actual working relationships.
To address these measurement gaps effectively, BARS designed for animal workers should incorporate both human and animal behavior anchors. Human behaviors, such as the misuse of commands or failure to respond to stress cues, should be explicitly evaluated alongside those of animals. This dual-anchor approach not only provides more accurate and complete assessments but also promotes accountability and targeted intervention.
Collectively, BARS represent a significant advancement within the broader domain of performance measurement by addressing critical limitations in existing tools through the integration of behavioral specificity and psychometric rigor (Campbell & Wiernik, Reference Campbell and Wiernik2015), ethical safeguards and recognition of human–animal interdependence (Watters et al., Reference Watters, Krebs and Eschmann2021), and longstanding principles of structured evaluation (Debnath et al., Reference Debnath, Lee and Tandon2015). This unified approach not only enhances the objectivity and validity of assessments but also reinforces humane treatment by embedding welfare-sensitive criteria into performance evaluation. By aligning evaluations with observable, job-relevant behaviors, and actionable outcomes, BARS exemplify core principles of effective performance measurement such as reliability, fairness, and role specificity. Using BARS for animal workers directly advances performance-measurement scholarship that has historically argued that strong appraisal systems must minimize criterion deficiency and contamination while providing behaviorally specific feedback that supports development (Landy & Farr, Reference Landy and Farr1980; Smith & Kendall, Reference Smith and Kendall1963). By replacing broad temperament checklists with job-relevant behavioral anchors (e.g., “maintains focus during scent detection despite distractions”), BARS extend these well-established principles to a previously neglected labor force, demonstrating that foundational performance-appraisal theory may be applicable across species. This not only fills a measurement gap in animal work settings but also furnishes a novel way of testing and refining established performance-measurement models, underscoring BARS’ enduring relevance more than 5 decades after their introduction.
Illustrative example: BARS for search and rescue dogs
Search and rescue (SAR) dogs perform demanding tasks that require exceptional focus, reliability, and adaptability in high-pressure environments. Effective evaluation of SAR dog performance is crucial for operational readiness, and BARS provides task-specific behavioral anchors that enhance consistency and objectivity (Klieger et al., Reference Klieger, Kell, Rikoon, Burkander, Bochenek and Shore2018). To illustrate this approach, Figure 1 presents a conceptual BARS example for SAR dogs, demonstrating how aligning behaviors with performance levels enables precise and actionable assessments to guide selection, training, and management.

Figure 1. BARS Example for SAR Dogs: Search Task Focus and Operational Reliability.
Note. This conceptual BARS example illustrates how task-specific behavioral anchors can assess SAR dog performance. It is not a validated instrument but is adapted from BARS development practices (Kell et al., Reference Kell, Martin-Raugh, Carney, Inglese, Chen and Feng2017), foundational methodology (Smith & Kendall, Reference Smith and Kendall1963), and working dog literature (Brady et al., Reference Brady, Cracknell, Zulch and Mills2018).
This structured BARS example provides clear criteria for evaluating SAR dogs during selection, training, and performance management. In selection, evaluators can objectively identify candidates who demonstrate job-specific behaviors (e.g., resilience to distraction, navigation skills); for training, specific anchors can highlight areas needing development (e.g., improving handler responsiveness, reducing environmental distractions); and for performance management, BARS facilitate consistent monitoring to detect performance declines or welfare concerns early. This structured approach to behavioral evaluation promotes both operational effectiveness and ethical oversight in high-stakes contexts (Landy & Farr, Reference Landy and Farr1980).
Challenges and future directions
Although BARS and other standardized, task-specific tools offer a promising path toward objectively evaluating animal performance and welfare, several challenges remain, including standardizing behavioral definitions across varied animal roles and species (Watters et al., Reference Watters, Krebs and Eschmann2021). Defining “good performance” is not universal; desirable behaviors vary by species and role. For instance, hypervigilance (typically associated with stress in animals) may be desirable for detection dogs but may be inappropriate in therapy dogs that work with children (Lazarowski et al., Reference Lazarowski, Haney, Brock, Fischer, Rogers, Angle, Katz and Waggoner2018).
Another challenge is ensuring that performance measurement tools are applied ethically and interpreted appropriately across diverse animal roles. For example, behaviors must be assessed in context; what appears as noncompliance may reflect stress, confusion, or miscommunication rather than poor performance. This highlights the importance of well-trained evaluators who can accurately recognize subtle behavioral cues without anthropomorphism or bias (Watters et al., Reference Watters, Krebs and Eschmann2021).
Future steps for the research, development, and implementation of BARS and other performance measurements should involve effective collaboration among I-O psychologists, veterinarians, and ethologists. I-O psychologists can apply their knowledge of statistical techniques to assess reliability and validity, along with their skills in constructing psychometric instruments like BARS. Veterinarians can provide species-specific insights, ethical guidance, and an understanding of behavior that can help avoid misinterpretations. Ethologists can prevent anthropomorphism or anthropomorphic bias and ensure that behaviors are interpreted in the correct context.
To ensure BARS are both effective and scientifically sound, future research should include empirical validation addressing reliability, content validity, and criterion validity. If the goal of measuring performance is to inform future decisions, then demonstrating that BARS scores predict relevant outcomes would support their descriptive and predictive utility. Incorporating training histories for individual animals may also allow BARS to detect subtle differences in behavioral intensities over time.
Conclusion
Adapting BARS for working animals promises potential meaningful gains in validity, reliability, and ethical oversight. Current animal evaluations rely on broad temperament scales and ad hoc checklists that vary by species and observer, creating well-documented gaps in standardization, terminology, and evidence of psychometric quality (Brady et al., Reference Brady, Cracknell, Zulch and Mills2018; Tetley & O’Hara, Reference Tetley and O’Hara2012). A BARS framework could align observable actions with role-relevant anchors (e.g., “maintains focus during scent detection despite distractions” for SAR dogs), which could reduce rater subjectivity, clarify performance expectations, and incorporate welfare indicators that treat stress cues or voluntary withdrawal as actionable data rather than failure (Klieger et al., Reference Klieger, Kell, Rikoon, Burkander, Bochenek and Shore2018). Such metrics may transform performance appraisal into a shared human–animal responsibility, guiding selection, feedback, and humane intervention while safeguarding autonomy and consent. Successful implementation may require species-specific behavioral definitions, rater training, and multidisciplinary validation. However, the payoff may be a measurement system that elevates both scientific rigor and ethical standards across occupational contexts (Campbell & Wiernik, Reference Campbell and Wiernik2015; Watters et al., Reference Watters, Krebs and Eschmann2021), promoting animal welfare and agency.
Competing interests
We have no known conflict of interest to disclose.