Introduction
Due to their phylogenetic proximity to humans (Capitanio & Emborg Reference Capitanio and Emborg2008), non-human primates (NHPs) have long served as important models in biomedical and neuroscientific research, contributing significantly to our understanding of human physiology and cognition (Descovich et al. Reference Descovich, Richmond, Leach, Buchanan-Smith, Flecknell, Farningham and Vick2019). They continue to play a pivotal role in current research, with approximately 5,800 primates, predominantly macaques (Macaca mulatta and M. fascicularis), used in 2022 for scientific purposes in Europe (European Commission [EC] 2022). While macaques constitute less than 0.1% of total animal use, their interest and indispensability in the study of numerous diseases and in the development of treatments has been widely documented and advocated (Friedman et al. Reference Friedman, Ator, Haigwood, Newsome, Allan, Golos and Bianchi2017; Mitchell et al. Reference Mitchell, Thiele, Petkov, Roberts, Robbins, Schultz and Lemon2018; Treue & Lemon Reference Treue and Lemon2023).
Nonetheless, as the 20th century heralded advancements in biomedical research and neurosciences, a parallel need emerged for a heightened awareness of animal ethics and a greater understanding of animal welfare (Baumans Reference Baumans2005). The recognition of animals as sentient beings has sparked intense debate worldwide regarding the legitimacy of their use in research (Duncan Reference Duncan2006; Proctor Reference Proctor, Carder and Cornish2013). These debates are fuelled by ethical concerns (Livingstone Reference Livingstone2022) and the practical challenges of meeting their behavioural, environmental, and social needs in laboratory settings (Carvalho et al. Reference Carvalho, Gaspar, Knight and Vicente2018). Among the animal models currently used in research, NHPs are the most controversial due to their close genetic relationship to humans and their human-like traits (Mitchell et al. Reference Mitchell, Hartig, Basso, Jarrett, Kastner and Poirier2021). Therefore, in order to attain a better understanding and wider social acceptance of the research in which NHPs play a role, it is imperative to improve the living conditions of NHPs and to communicate transparently about their welfare (MacArthur Clark et al. Reference MacArthur Clark, Clifford, Jarrett and Pekow2019). Beyond the ethical realm, ensuring the welfare of these animals is of crucial importance for obtaining valid scientific results (Prescott & Lidster Reference Prescott and Lidster2017; Truelove et al. Reference Truelove, Martin, Langford and Leach2020; Buchanan-Smith et al. Reference Buchanan-Smith, Tasker, Ash, Graham, Robinson and Weiss2023; see also the report of the Scientific Committee on Animal Health and Animal Welfare on the Welfare of Non-Human Primates used in Research 2002). Indeed, several studies have highlighted that inadequate living conditions can lead to various physiological and behavioural alterations. These changes not only indicate compromised welfare but also influence the reliability of collected data (Descovich et al. Reference Descovich, Richmond, Leach, Buchanan-Smith, Flecknell, Farningham and Vick2019; Cassidy et al. Reference Cassidy, Hannibal, Semple and McCowan2020). This relationship is summarised by Poole’s (Reference Poole1997) assertion that “Animal welfare equals good science”.
In this context, the increased interest in animal welfare issues has led to legislative regulations in many countries and the establishment of animal ethics committees (Baumans Reference Baumans2004). This focus on welfare has also underscored the need to apply the 3Rs rule (Replacement, Reduction, Refinement), described by Russell and Burch (Reference Russell and Burch1959) and included in European legislation in 2010 (Directive 2010/63/EU). This directive, along with other legislative texts and guidelines, emphasised the high priority for animal welfare considerations by establishing a legal framework to regulate the living conditions of captive animals (Blokhuis et al. Reference Blokhuis, Keeling, Gavinelli and Serratosa2008; Ingenbleek et al. Reference Ingenbleek, Immink, Spoolder, Bokma and Keeling2012). However, an accurate definition and assessment of welfare is required to support the legislation and to ensure the effectiveness of any efforts made to maintain or improve animal welfare (Lundmark et al. Reference Lundmark, Berg, Schmid, Behdadi and Röcklinsberg2014). In 2025, a new concept of positive animal welfare was provided to facilitate a comprehensive and holistic approach to welfare assessment: the welfare of an animal is defined as “the animal flourishing through the experience of predominantly positive mental states and the development of competence and resilience. [It] goes beyond ensuring good physical health and the prevention and alleviation of suffering” (Rault et al. Reference Rault, Bateson, Boissy, Forkman, Grinde, Gygax and Jensen2025). The notion of ‘positive mental states’ that appears in this definition is a key component of welfare; an assessment focusing solely on the characterisation of the animal’s environment and using resource-based measures is therefore insufficient (Welfare Quality® Protocol 2009). Moreover, animal welfare must be assessed at the individual level with quantifiable animal-based indicators, such as behaviour and health, which reflect the animal’s response to its environment and captivity conditions (Dawkins Reference Dawkins2006). The combining of these resource- and animal-based measures provides a more comprehensive understanding of welfare as a dynamic and multidimensional entity, revealing the complex interaction between animals and their environment (Jones et al. Reference Jones, Donnelly and Dawkins2005; Leach et al. Reference Leach, Thornton and Main2008).
Although significant efforts have been made in scientific studies to identify effective indicators for NHP welfare assessment (Tasker Reference Tasker2012; Kirchner & Bakker Reference Kirchner and Bakker2015; Truelove et al. Reference Truelove, Martin, Langford and Leach2020; Prescott et al. Reference Prescott, Leach and Truelove2023), there is still a lack of standardised protocols to assess these animals in laboratory settings. To address this issue, it is essential to draw upon existing and successful frameworks developed in other sectors, and more specifically in the context of farm animals, for which welfare assessment protocols are historically more advanced. The Welfare Quality® protocol was developed to assess welfare in livestock species — particularly pigs, cattle, and poultry — through a combination of animal-based and resource-based measures collected at the individual level and repeated over time (Veissier et al. Reference Veissier, Botreau and Perny2010). Its strength lies in its structured framework, which groups indicators under four principles: good feeding; good housing; good health; and appropriate behaviour. Similarly, the AWIN (Animal Welfare INdicators) protocol extended this approach to other species, such as horses, sheep, goats, and turkeys, integrating behavioural and physiological indicators while maintaining field applicability (AWIN 2015). These protocols, underpinned by the Five Domains model (Mellor et al. Reference Mellor, Beausoleil, Littlewood, McLean, McGreevy, Jones and Wilkins2020), have demonstrated the feasibility and scientific robustness of multi-dimensional welfare assessments, and served as conceptual models for developing the PWIN framework for macaques.
In this article, we detail the content and development process of the protocol we propose to implement with the Primate Welfare INdicators (PWIN) project. This adaptation of the Welfare Quality® and AWIN frameworks (Welfare Quality® Protocol 2009; AWIN 2015) seeks to provide reliable, easily observable, and complementary species-specific indicators for M. mulatta and M. fascicularis. This project aligns with the 3Rs principle, focusing on fine-tuning laboratory settings and practices for NHPs. The numerous studies synthesising the scientific knowledge on macaques and outlining their fundamental needs in order to enhance their living conditions (NC3Rs 2017; Cassidy et al. Reference Cassidy, Hannibal, Semple and McCowan2020; Truelove et al. Reference Truelove, Martin, Langford and Leach2020; Prescott et al. Reference Prescott, Leach and Truelove2023), have enabled us to select the appropriate indicators for implementation in the PWIN protocol. In addition to the questionnaire section commonly used to assess resource- and animal-based measures, the PWIN project incorporates behavioural observations for each individual. This non-invasive approach is often overlooked in current animal welfare assessment guidelines (Descovich et al. Reference Descovich, Richmond, Leach, Buchanan-Smith, Flecknell, Farningham and Vick2019). The appeal of behavioural observations lies in their capacity to capture nuanced aspects of subjective experiences and emotional states, thereby highlighting the importance of conducting individual welfare assessments by recognising the unique needs, experiences, and perceptions of each animal (Fraser Reference Fraser2009).
Materials and methods
As previously described, the conceptual foundation of PWIN was inspired by established animal welfare assessment protocols, and the key principles were selected based on the Five Domains model (Mellor et al. Reference Mellor, Beausoleil, Littlewood, McLean, McGreevy, Jones and Wilkins2020). An in-depth review of the literature using research databases (PubMed and Google Scholar) led to the selection of scientifically validated indicators for NHPs. Some of these indicators had been previously implemented in welfare assessment protocols for farm, laboratory and zoo animals. Additionally, two preliminary studies were conducted in 2020 to identify welfare indicators that were suitable for NHPs, to organise them according to their relevance, feasibility and reliability, and to determine which behaviours would be included in the ethogram (A Romain, M Valenchon & O Petit, personal communication 2021). Feasibility was a key consideration in selecting welfare indicators for PWIN (Scott et al. Reference Scott, Nolan and Fitzpatrick2001), ensuring that the data could be easily collected by laboratory staff with appropriate training and performance evaluation. In line with this approach, only non-invasive indicators were chosen; any measurements that would involve physical restraint or procedures requiring laboratory analysis were therefore excluded, and remote observations were used to minimise disturbance to the animals.
Since welfare assessment should rely upon resource- and animal-based data (Blokhuis et al. Reference Blokhuis, Veissier, Miele and Jones2010; Czycholl et al. Reference Czycholl, Büttner, Grosse Beilage and Krieter2015), the PWIN assessment tool has been designed to balance these two types of measures. Resource-based measures focus on the physical and social environment (environment-based), as well as husbandry practices and human-animal interactions (practice-based) (Whay et al. Reference Whay, Leeb, Main, Green and Webster2007). Animal-based measures assess the animal’s behaviour, physiological state and health, providing insights into their adaptation and overall welfare through characteristics such as age, sex, and body condition (Dawkins Reference Dawkins2004). As a result, the final version of the PWIN tool comprises two distinct yet complementary sections (Figure 1): first, a questionnaire section designed to gather information on the environment, care practices and the animals’ physical conditions and, secondly, an observation section with behavioural observations that provide useful insights into the animals’ mental and emotional states.
Examples of the type of measures and method of data collection used in the Primate Welfare Indicators (PWIN) welfare assessment tool for macaques (Macaca mulatta and M. fascicularis).

The full version of the protocol is available in Appendices A–G in the Supplementary material, and includes the questionnaire designed to assess criteria based on both the environment and the animal, along with the ethogram used for behavioural observations.
Questionnaire section
The use of questionnaires in animal welfare assessments is crucial for capturing qualitative and quantitative data across various dimensions of welfare (Gartner 2023). These tools enable systematic data collection from multiple observers, enhancing reliability and minimising bias. The choice to incorporate questionnaires into our welfare assessment protocol stems from their ability to provide a holistic view of animal welfare, their non-invasive nature, ease of implementation, and adaptability across different facilities and contexts (Gartner 2023). These attributes ensure that our evaluation method will be robust, sensitive to individual needs, and practical for routine use in enhancing animal welfare outcomes.
These questionnaires are designed to be applied to each macaque housed in the laboratory in order to meet the need for information on welfare status at the individual level.
Questionnaire development
The development of the tool started in 2020 with the transcription of parameters used in evaluation protocols for horses and farm animals to the evaluation of NHPs living in laboratory settings and narrowing down the selection of welfare indicators suitable for macaques (Welfare Quality® Protocol 2009). This work led to the identification of 33 welfare indicators. This selection was further informed by recommendations for primate care, legal regulations, and scientific literature. Recent studies by Truelove et al. (Reference Truelove, Martin, Langford and Leach2020) and Prescott et al. (Reference Prescott, Leach and Truelove2023) notably supported the identification and prioritisation of welfare indicators for laboratory macaques based on their validity, reliability, and feasibility.
This comprehensive approach led to a total of 91 selected indicators (for the categories and subcategories, see Table 1; the complete list of indicators is available in Appendix A; Supplementary material). Environment-based (e.g. possibility of vertical climbing), practice-based (e.g. health surveillance) and animal-based measures (e.g. signs of pain) were grouped into four categories: housing (space, housing complexity, light and temperature); nutrition (dietary composition, number of feedings, methods of food distribution); health (physical condition, body condition score); and behaviours (social housing/isolation, appropriate behaviours, cognitive stimulations). In Mellor’s Five Domains model (Mellor et al. Reference Mellor, Beausoleil, Littlewood, McLean, McGreevy, Jones and Wilkins2020), a fifth category considers the animal’s mental state as the consequence of the animal’s environment, physical condition, past and present experiences. Due to the difficulty of obtaining direct measurements for this fifth domain, it is rarely measured directly but can be inferred from the first four categories and from behavioural observations (Mellor et al. Reference Mellor, Beausoleil, Littlewood, McLean, McGreevy, Jones and Wilkins2020). The objective measures from the first four domains may highlight risks to the macaque’s welfare (e.g. limited, low-complexity space), while behavioural observations provide insights into the animal’s perception of these risks and how it copes with them (e.g. pacing). Consequently, the combination of these elements provides an insight into its overall state of welfare (Mellor Reference Mellor2012).
Categories, subcategories and number of indicators of the Primate Welfare Indicators (PWIN) protocol for macaques (Macaca mulatta and M. fascicularis)

Information registered in breeding records (age, sex, etc) is also collected.
The choice of indicators in each category will be discussed in greater detail in the following sections (see Housing, Nutrition, Health, Behaviour).
Housing
The housing category includes eight subcategories with a total of thirty-four indicators. This category focuses on the physical and abiotic conditions to which animals are directly exposed, as well as care practices (Table 2).
Summary of the ‘Housing’ subcategories in the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis), with examples of key indicators

Nutrition
The nutrition category includes three subcategories, with a total of thirteen indicators. This category mainly concerns the water and food available to animals, the cleanliness of the drinking and feeding devices and the food presentation methods (Table 3).
Summary of the ‘Nutrition’ subcategories in the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis), with examples of key indicators

Health
The health category includes three subcategories with a total of fifteen indicators. The PWIN protocol focuses on assessing the physical condition of the animals, including coat condition, physical or behavioural symptoms such as lameness or a prostrated posture, while also supporting ongoing health monitoring and preventive care programmes (Table 4).
Summary of the ‘Health’ subcategories in the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis), with examples of key indicators

Behaviour
The behaviour category comprises five subcategories with a total of thirty-two indicators. This category focuses on the behaviours exhibited by the animal, the presence or absence of abnormal behaviours, and the management of enrichment and training programmes (Table 5).
Summary of the ‘Behaviour’ subcategories in the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis), with examples of key indicators

Scoring system
The scoring system for the PWIN assessment tool is designed to provide a quantitative measure of animal welfare. Each indicator is evaluated by a question that is assigned a weight based on the importance of the indicator in relation to the animal’s welfare. While most indicators are assigned a standard weight of 1, some indicators are weighted differently to reflect their relative importance in assessing macaques’ welfare. Weight values were therefore set at 0, 0.5, 1, or 2 depending on their significance. The higher the indicator’s impact on welfare, the greater the weight will be. For example, the indicator addressing the social housing (housed alone or in a pair/group; Table 5) is assigned a weight of 2, given the crucial importance of social interactions for macaque species (Hannibal et al. Reference Hannibal, Bliss‐Moreau, Vandeleest, McCowan and Capitanio2017). Some items in the questionnaire are only informative and are not included in the calculation of the welfare score. These items are assigned a weight of 0 and are designed to support documentation and contextualisation of the assessment without influencing the numerical output. For example, certain questions invite the user to take a picture of specific structural or enrichment features of the housing environment.
The assignment of these weightings was the result of a structured development process combining a review of existing welfare assessment frameworks, scientific literature on primate welfare (Truelove et al. Reference Truelove, Martin, Langford and Leach2020; Prescott et al. Reference Prescott, Leach and Truelove2023; Paterson et al. Reference Paterson, O’Malley, Abney, Archibald and Turner2024), and iterative expert consultation involving veterinarians, ethologists, and technical staff from multiple facilities in France, involving both zoos and laboratories. The process aimed to ensure that the most welfare-critical aspects (e.g. access to social interactions) were given higher weights in the overall scoring.
Each question includes a range of responses, from two (a closed question, e.g. the presence/absence of a light gradient) up to eleven options for questions requiring a greater level of detail (multiple choice question, e.g. a list of enrichments that are available in the cage). The options utilise a scoring system on a four-level scale (Table 6).
Scoring system of the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis), with a scoring grid for each indicator

* Based on legislation, guidelines and/or scientific articles.
Therefore, the contribution of each indicator to the overall welfare score of its category is modulated by two distinct parameters: the weight (W) assigned to the indicator, and the score (S) associated with each response option, representing the welfare quality of the observed situation (Table 6). This dual mechanism ensures that indicators with greater relevance — such as those related to social housing, clinical symptoms, or abnormal behaviour — have a proportionally greater impact on the global welfare assessment. It thereby avoids uniform weighting across indicators and supports a more sensitive and context-relevant evaluation.
Multiplying the indicator’s score
$ {S}_i $
by the indicator’s weight
$ {W}_i $
yields the adjusted indicator’s score (
$ {AS}_i $
):
For a given subcategory comprising n indicators, the subcategory score
$ {SbS}_i $
is calculated as:
$$ Subcategorie\ Score\;\left(\%\right)=\sum_{i=1}^n{AS}_i\times 100 $$
Finally, for each category comprising 𝑛 subcategories, the final percentage score is calculated as:
$$ Category\ Score\;\left(\%\right)=\sum_{i=1}^n{SbS}_i\times 100 $$
After completing all the measurements for an animal, a bottom-up approach is used to generate an overall assessment of its welfare. The scores for individual indicators/questions are combined to produce a score for each subcategory (e.g. the space available) which are then aggregated to calculate category scores (e.g. housing). This approach allows for precise targeting of key concerns and situations that may adversely affect the welfare of the individual, which can be used to identify areas for improvement (Botreau et al. Reference Botreau, Veissier, Butterworth, Bracke and Keeling2007).
PWIN’s scoring system adopts a progressive approach that can go beyond regulatory compliance in terms of score justification. Instead of awarding maximum points when legal requirements are met, the highest scores are given when these standards are exceeded, thus providing a high level of welfare for the animals. For instance, while compliance with legal requirements for the space available for each individual has a weight of 1, a larger available space has a weight of 2 because it allows greater complexity of arrangements and a greater number of locomotor behaviours (International Primatological Society [IPS] 2007).
The main objective of these scores is to monitor changes and improvements over time, with the potential to support large-scale comparisons. The initial assessment serves as a baseline, providing a reference to measure the effectiveness of subsequent improvements. The state of welfare of an animal depends upon its living environment, past experiences and intrinsic characteristics (e.g. sex, age, species), all of which lead each individual to experience a given situation from its own unique perspective (Veissier et al. Reference Veissier, Aubert and Boissy2012). The PWIN tool therefore specifically targets the areas that require attention and recognises that establishing absolute reference values is challenging. Each animal effectively serves as its own benchmark within this framework. This comprehensive and systematic approach ensures thorough welfare assessments, supports continuous enhancements and ultimately enables laboratory teams to meet or even exceed higher welfare standards for NHPs.
Behavioural observations section
Since behaviour provides direct information regarding how each animal perceives and interacts with its environment, it is a critical component in the process of assessing animal welfare (Dawkins Reference Dawkins2004). Behavioural observations complement quantitative survey data by providing welfare indicators that might be overlooked in resource-based assessments (Truelove et al. Reference Truelove, Martin, Langford and Leach2020; Lutz & Baker Reference Lutz and Baker2023). This approach aligns with the recommendations of Truelove et al. (Reference Truelove, Martin, Langford and Leach2020) and Prescott et al. (Reference Prescott, Leach and Truelove2023), who advocated the use of multi-dimensional assessment frameworks that holistically evaluate both positive (e.g. social play, allogrooming) and negative (e.g. self-harm behaviours) behavioural welfare indicators. In addition to providing a more detailed understanding of each individual’s state of welfare, behavioural monitoring plays a pivotal role in facilitating the identification of specific environmental deficiencies (Tiefenbacher et al. Reference Tiefenbacher, Novak, Lutz and Meyer2005; Vandeleest et al. Reference Vandeleest, McCowan and Capitanio2011; Schapiro & Hau Reference Schapiro, Hau, Robinson and Weiss2023). For instance, abnormal behaviours, such as pacing and self-biting, usually occur as the result of pain or discomfort (Lutz & Baker Reference Lutz and Baker2023). The early detection of these stress markers is crucial in identifying the cause of distress.
Sampling method
When selecting the sampling method, we prioritised behavioural monitoring through objective, quantitative and repeated measurements and ensured that this method was both feasible and practical for future implementation in laboratory settings. The continuous focal sampling method allows the precise monitoring of an animal’s behaviours, relying upon a descriptive recording of behaviours exhibited by a focal individual over a pre-defined period of time (Altmann Reference Altmann1974; Brereton et al. Reference Brereton, Tuke and Fernandez2022). It is critical to obtain these measurements, since the significance of both abnormal and normal behaviours is often determined not only by their sole occurrence in the behavioural repertoire of an individual but also by their frequency and duration (Watters et al. Reference Watters, Krebs and Eschmann2021).
Continuous focal sampling is a demanding method and the most suited for our data collection, as it requires a thorough knowledge of species-specific behaviours and focusing solely on the animal (Brereton et al. Reference Brereton, Tuke and Fernandez2022). Knowledge of macaque behaviours must be supported by a rigorously designed ethogram, with the availability of precise definitions during observation (Table 7; Appendix B [see Supplementary material]), along with a training period for observers and the implementation of inter-rater reliability tests to ensure the reliability of the data collected (Koo & Li Reference Koo and Li2016). When combined with adequate observer training and concordance testing, this sampling method reduces observer bias by providing structured, objective criteria for behavioural recording, thus improving the reliability of the tool across a wide range of users (Amato et al. Reference Amato, Van Belle and Wilkinson2013).
Categories and behaviours used in the ethogram developed in the Primate Welfare Indicators protocol for Macaca mulatta and M. fascicularis

The choice of continuous focal sampling for this study involves repeated 10-min observations. This duration was selected following preliminary study in two captive settings during the early PWIN development in 2020. It offers notable advantages both regarding feasibility and representativeness: ten minutes are a short enough period to allow a practical integration into routine monitoring schedules in laboratories, with a repeatability that ensures the capture of an accurate and representative sample of the behaviours exhibited by the animal in various contexts.
Ethogram development
The primary challenge lies in developing a species-specific ethogram for M. mulatta and M. fascicularis in laboratory environments that is detailed enough to capture behavioural differences between each individual yet concise enough for practical usage by a wide range of users. The ethogram (Table 7; Appendix B [Supplementary material]) was informed by an extensive literature review on the natural behaviours of these species (Xu et al. Reference Xu, Xie, Li, Li, Wang, Ji, Kong, Zhan, Cheng, Fang and Xie2012; Polanco et al. Reference Polanco, McCowan, Niel, Pearl and Mason2021; Romain et al. Reference Romain, Le Texier, Lacroix, Jourjon, Petit, De Monte, Chambrier and Cognié2024) and aimed to identify and select behaviours relevant in captive conditions to ensure the reliability and precision of welfare assessments. A comprehensive welfare assessment should integrate both negative and positive behavioural indicators and consider their frequency, and duration, and should also identify the diversity of the behavioural repertoire. This multidimensional approach, which includes abnormal, anxiety-related and positive behaviours, provides a more holistic understanding of an animal’s welfare (Watters et al. Reference Watters, Krebs and Eschmann2021). Abnormal behaviours deviate from species-specific patterns and serve as significant markers of negative welfare, especially when these behaviours arise in captive environments (Broom Reference Broom1986; Mason Reference Mason1991). These include repetitive actions such as pacing or self-injury, which are often rare or absent in wild populations, and any behaviours that occur at unusually high frequencies in captivity (Novak et al. Reference Novak, Kelly, Bayne and Meyer2012). For instance, self-grooming, while normal, may become excessive under stress, potentially leading to hair loss (Honess et al. Reference Honess, Gimpel, Wolfensohn and Mason2005; Novak et al. Reference Novak, Hamel, Ryan, Menard, Meyer and Schapiro2017). The identification of these behaviours is essential, as they signal welfare challenges and may reflect maladaptive responses to stress (Mason & Latham Reference Mason and Latham2004). Anxiety-related behaviours, such as heightened vigilance (i.e. a high frequency of vigilance in the time budget), provide more direct insights into current stress conditions than abnormal behaviours that may be rooted in past experiences, and are critical for identifying welfare concerns (Tiefenbacher et al. Reference Tiefenbacher, Novak, Lutz and Meyer2005; Vandeleest et al. Reference Vandeleest, McCowan and Capitanio2011). A high level of inactivity has also been associated with compromised welfare conditions, such as prolonged social distress (Cassidy et al. Reference Cassidy, Hannibal, Semple and McCowan2020). Inactivity-related behaviours (e.g. resting, observing the environment) were included in the ethogram to support systematic recording and contextual interpretation. In contrast, positive behaviours, including allogrooming, social play or foraging, meet physiological needs and foster social connections and are thus indicators of positive welfare (Aureli et al. Reference Aureli, Preston and de Waal1999; Baker et al. Reference Baker, Bloomsmith, Oettinger, Neu, Griffis, Schoof and Maloney2012; Hannibal et al. Reference Hannibal, Bliss‐Moreau, Vandeleest, McCowan and Capitanio2017).
Based on this knowledge, a total of seven behavioural categories were designed in the ethogram to facilitate the reading and use of the ethogram by users with varying levels of ethological knowledge (researchers, animal technicians, veterinarians). Each category includes a set of behaviours — 46 in total — with titles and descriptions that minimise the risk for interpretation by the observer. Since the aim is to carry out direct observations of the macaques, we choose to incorporate a category of behaviours that are directed towards the observer within the ethogram, in order to account for the potential impact of the observer’s presence on the behaviours expressed by the animals. This addition allows for a more comprehensive understanding of how the presence of the observer may influence the macaques’ actions.
Recommendations for PWIN protocol implementation
A complete welfare assessment should involve three consecutive phases: (1) data collection; (2) data analysis; and (3) decision-making, during which appropriate actions may be implemented in the facility. This process constitutes a welfare assessment campaign. For data collection, we recommend one questionnaire and at least four sessions of behavioural observations of 10-min focal sampling per individual.
The frequency of campaigns should be determined by the dual need for regular monitoring and responsive adaptation to contextual changes. As a general recommendation, facilities should implement a complete welfare assessment at least once every six months for each individual. In this context, each of the three phases may extend over a two-month period to allow sufficient time for data collection, data analysis, and observation of the effects of implemented measures on macaques’ welfare.
This biannual interval should ensure continuity of care and enable the detection of gradual deteriorations in welfare, in line with standard good practices adopted in similar protocols (Can et al. Reference Can, Vieira, Battini, Mattiello and Stilwell2017; Paterson et al. Reference Paterson, O’Malley, Abney, Archibald and Turner2024). In addition, reassessment should be systematically conducted whenever a significant change occurs in the animals’ housing (e.g. the formation or separation of a pair or group, introduction or departure of a conspecific, initiation of an experimental procedure). Embedding flexibility into the assessment process allows for welfare to be re-evaluated opportunely in response to significant changes, ensuring that acute impacts are swiftly detected and addressed through timely interventions (Battini et al. Reference Battini, Stilwell, Vieira, Barbieri, Canali and Mattiello2015; Czycholl et al. Reference Czycholl, Büttner, Klingbeil and Krieter2018).
Welfare assessments schedule
We recommend establishing a pre-defined observation schedule to ensure balanced and systematic data collection. Each individual should be assigned to specific time slots using a pseudo-randomised approach, which guarantees an even distribution of observation sessions across individuals and throughout the day (Appendix G [Supplementary material]). This method helps to avoid temporal bias throughout the day and ensures that behavioural data are representative and comparable across the group.
The timing of welfare assessments during the day should be adapted to the type of data collection. Questionnaire-based evaluations can be completed at any time of the day, as they reflect structural or daily practice-related aspects (e.g. cleaning frequency, feeding schedule, enrichment programme) with low temporal sensitivity. In contrast, behavioural observations should be evenly distributed throughout the operational hours of the facility’s activity. This includes morning (e.g. 0800–1300h) and afternoon (e.g. 1300–1800h) periods of data collection. Observations should be equitably allocated across each 1-h time block (e.g. 0900–1000h, 1000–1100h) to ensure that behavioural patterns are captured across the full daily cycle. This distribution should follow a pre-established, pseudo-randomised observation schedule that balances effort across time blocks while avoiding systematic bias in temporal sampling. When it is not feasible to cover all these periods due to logistical constraints, sessions should at least be balanced between morning and afternoon to capture potential diurnal behavioural variation. This ensures temporal representativeness of the data and aligns with best practices recommended in similar protocols (AWIN 2015; Paterson et al. Reference Paterson, O’Malley, Abney, Archibald and Turner2024).
Data collection
Prior to data collection, training observations should be conducted to familiarise the observer with the ethogram and the process of continuous focal sampling observations. These sessions also serve to habituate the animals to the presence of observers in a controlled, non-interactive context. The training phase may vary depending on both the observer’s experience and the animals’ responses, but it is strongly recommended to perform at least two 10-min training sessions per observer prior to data collection. This approach, supported by recommendations from existing welfare assessment protocols (AWIN 2015; Paterson et al. Reference Paterson, O’Malley, Abney, Archibald and Turner2024), is intended to minimise behavioural disturbances and improve intra-rater reliability across assessment sessions. Exposure of the animals to observer presence without consequences or interaction should also be planned as part of a general habituation strategy.
Regarding the questionnaire-based assessment, we recommend using the version of the questionnaire provided in Appendix A [Supplementary material], in which the scores and weights are not visible during completion. This design choice is intended to avoid influencing the assessor’s responses and to promote objectivity (Tuyttens et al. Reference Tuyttens, de Graaf, Heerkens, Jacobs, Nalon, Ott and Ampe2014). Questionnaires may be printed in advance and completed directly in front of the macaque. Alternatively, questionnaires can also be filled out remotely, provided that the assessor has access to all relevant information required for completion (e.g. the macaque’s body condition score, coat condition, cleanliness of water dispensers). In cases where certain items are not applicable (e.g. no outdoor housing available) or when information is not available, the questionnaire offers response options such as ‘Not applicable’ or ‘Do not know’. These responses are excluded from the welfare score calculation, ensuring that they do not bias the overall result.
In relation to behavioural observation sessions, it is essential that the assessor is physically present in front of the animal and follows the standardised observation protocol described as follows. To minimise observational bias, we recommend using the ethogram presented in Appendix B [Supplementary material], in which analytical categories (e.g. abnormal behaviours) are concealed. Additionally, to limit the influence of observer presence on animal behaviour, observers should follow a standardised non-intrusive protocol prior to and during focal sampling. Before beginning each observation session, the observer should remain still and silent inside the room for a minimum of approximately two minutes, avoiding any sudden movement or direct eye contact with the animal. This short habituation period is intended to allow the animal to return to baseline activity levels before behavioural recording begins. During the observation itself, the observer should remain stationary and avoid any interaction with the animal, maintaining a neutral posture and minimal visibility. Any disturbances occurring during or immediately before the observation — such as staff movements, equipment noise, or social disruptions — should be noted in a comment section of the observation to facilitate interpretation of the data.
Each individual should ideally be observed for a minimum total of 40 min, divided into two sessions in the morning and two in the afternoon. Wherever possible, increasing the number of sessions per individual is strongly encouraged, as it enhances the reliability of the resulting time-budget estimates. Observations can be recorded using paper and pencil, but the use of a mobile application specifically designed for behavioural data collection is strongly encouraged, as it improves both accuracy and ease of use. During the 10-min focal sampling session, each behaviour expressed by the animal must be recorded along with its precise start and end time. All behaviours should be recorded as soon as they occur, regardless of their duration (i.e. even a brief expression qualifies). Repetitive behaviours (e.g. pacing) should be recorded when the same action or movement pattern is expressed at least twice consecutively. For example, a sequence of pacing on the floor should initially be recorded as ‘walking’ until the pattern is repeated, at which point it should be recorded as ‘pacing’. The process should be repeated every time the animal changes behaviour, until the full 10-min session is completed. Behavioural frequencies can then be calculated post hoc by counting the number of times each individual behaviour or behavioural category was recorded during the session. If no dedicated application is used, a stopwatch should be started as soon as the first behaviour of the focal individual is recorded. In this case, only abbreviated behaviours should be used (e.g. ‘W’ for ‘walking’) together with the corresponding time stamps (Appendix C [Supplementary material]).
Once collected, questionnaire data should be digitised, dated, and stored in a clearly organised format. For behavioural observation data, we likewise recommend using a structured Excel® spreadsheet to support the computation of accurate time budgets during analysis (Appendix D [Supplementary material]).
Data analysis
Questionnaire scoring should be carried out in accordance with the procedures described in Scoring system, using the weights and scores assigned to each indicator as presented in Appendix E (Supplementary material). The resulting scores can be compiled in a summary table similar to the questionnaire scoring sheet provided in Appendix F (Supplementary material).
Data collected from behavioural observations should be processed in two steps. First, data should be aggregated into behavioural categories (e.g. abnormal; Appendix C [Supplementary material]), and examined according to their durations, occurrences, and proportions within the individual’s time budget. Second, the analysis should be refined by examining specific behaviours within each category. For each 10-min focal session, time budgets should be calculated by summing the durations of all behaviours belonging to a given category and dividing this value by the total observation time (e.g. 50 s of feeding across 600 s of observation corresponds to 8.33% of time spent feeding). This computation should be applied to each behavioural category. The cumulative durations obtained by aggregating all focal sessions for a given individual over the entire study period may then be used to construct their overall activity budget.
Frequency-based budgets can be derived following the same principle, by dividing the number of occurrences of behaviours within a category by the total number of recorded behavioural events (e.g. 20 feeding events out of 185 total events yields 10.81% of behaviour frequency for feeding).
Once data are compiled, a first-level analysis should be conducted at the scale of welfare categories and subcategories to identify potential concerns. Scores should also be compared between individuals and/or with those obtained during previous assessment campaigns for the same individual. In case of suboptimal scores or marked differences in scores between individuals or across previous assessments, further examination should be carried out to isolate the indicators contributing to the issue. This should be followed by a contextual review to determine the likely causes (e.g. recent environmental or social changes, clinical issues) and can then be interpreted in light of behavioural data to evaluate potential impacts on observed activity.
Behavioural time budgets should be compared with data from previous assessment campaigns and/or with those of other individuals housed under similar conditions. The emergence or increase in abnormal behaviours, the reduction or absence of key welfare indicators such as affiliative or exploratory behaviours, or marked behavioural changes compared to previous assessments should prompt a targeted review of relevant questionnaire responses. Additionally, attention should be given to the time of day and contextual conditions in which specific behaviours are expressed, to identify recurring patterns. Behavioural observations are not associated with a numerical scoring system but serve as a complementary source of information. They contribute to the interpretation of questionnaire data by providing additional evidence regarding the animals’ behavioural engagement and potential welfare state.
Decision-making
Following data analysis, a collective decision-making phase should be initiated to define and prioritise actions to be implemented. In line with recommendations from the Welfare Quality® framework (Veissier et al. Reference Veissier, Botreau and Perny2009; Blokhuis et al. Reference Blokhuis, Veissier, Miele and Jones2010) and the Animal Welfare Assessment Grid (Wolfensohn et al. Reference Wolfensohn, Shotton, Bowley, Davies, Thompson and Justice2018), this stage should bring together the key staff members involved in the animals’ daily care and management. These include animal care staff, veterinarians, researchers, and behaviourists, whose complementary perspectives are essential for interpreting results and identifying appropriate measures.
The team should review the findings, establish priorities, and agree upon a set of corrective or preventive actions. Objectives should be defined at three temporal scales:
-
• Short-term actions, which can be easily and rapidly implemented to address immediate welfare concerns (e.g. addition of structures or visual barriers);
-
• Medium-term actions, requiring moderate planning and resource mobilisation (e.g. refinement of enrichment routines, behavioural monitoring plans);
-
• Long-term actions, focusing on structural or organisational aspects which require more time and resources to implement, such as facility design improvements or macaque training programmes.
The outcomes of this decision-making process should be documented in an action plan specifying implementation timelines and expected outcomes. A follow-up assessment should then be scheduled to evaluate progress and ensure the adaptive management of welfare conditions over time. This iterative cycle — assessment, analysis, decision, action, and reassessment — forms the foundation of a continuous improvement process consistent with evidence-based welfare management principles (Blokhuis et al. Reference Blokhuis, Veissier, Miele and Jones2010; Clegg et al. Reference Clegg, Borger-Turner and Eskelinen2015; Wolfensohn et al. Reference Wolfensohn, Shotton, Bowley, Davies, Thompson and Justice2018).
Workload considerations
The completion of one individual questionnaire generally takes about 30 min for a new assessor, and approximately 15 min for someone familiar with the tool and with access to all relevant information. For each macaque, a full welfare assessment campaign, including four behavioural observations and one questionnaire completion, requires approximately 1 to 1.5 h of the assessor’s time, depending on their level of experience with the protocol. In a facility housing around 20 macaques and involving five personnel in data collection, this equates to a total workload of 20 to 30 h per campaign, or 4 to 6 h per assessor. When this effort is distributed over the entire duration of an evaluation period (e.g. three months), the individual workload averages between 1.3 and 2 h per month.
The number of assessors involved in the process depends upon available staff and institutional logistics. While the protocol can technically be implemented by a single trained assessor, we strongly recommend involving multiple team members in the welfare monitoring process. This facilitates the distribution of workload and supports collaborative interpretation of results, fostering dialogue between professionals with complementary expertise.
As an initial implementation strategy, we recommend that facilities begin with welfare assessment of selected individuals or groups, either through random and/or targeted selection. Random selection involves assessing the welfare of a sample of the animals housed within the facility, with the sample size to be determined internally according to each facility’s constraints. Targeted selection may be applied as an alternative or complement, focusing on individuals or groups of particular interest to staff members (e.g. changes in social dynamics, recent housing changes). This approach will allow teams to gauge the additional workload and progressively incorporate assessments, data analysis, and team discussions into their operational routines. In the long term, random and/or targeted assessments may remain appropriate for facilities operating under logistical constraints or housing a large number of macaques, which may hinder the comprehensive implementation of the protocol for all individuals. To evaluate the effectiveness of implemented interventions, we recommend prioritising repeated assessments of the same individuals or groups across at least two welfare assessment campaigns before expanding sampling to further subjects. Importantly, insights from a single group may often support broader facility-level adjustments (Shepherdson & Wielebnowski Reference Shepherdson and Wielebnowski2015; Canadian Council on Animal Care [CCAC] 2021), provided that environmental, social, and management factors are comparable.
Examples of data visualisation
In the following section, examples of data visualisation are presented to illustrate the outcomes of both questionnaire-based and behavioural observation methods within the PWIN welfare assessment framework resulting from a preliminary testing phase. These two figures are based upon distinct fictitious individuals and are not derived from the same animal or dataset; they are presented independently and solely for illustrative purposes, to exemplify how data from the two components of the PWIN protocol can be visualised. For the questionnaire data, spider charts (or radar plots) are used to display scores across primary welfare categories and subcategories, providing a comprehensive overview of welfare indicators at both broad and detailed levels (Figure 2). These charts facilitate the rapid identification of strengths and areas of concern within the welfare profile by representing scores in a multidimensional format.
Example of a spider chart illustrating the scores obtained in the Health category using the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis).

In addition, behavioural observations are summarised through time-budget analyses (Figure 3), offering insights into the animal’s activity throughout the day. These time budgets visually represent the proportion of time allocated to various behaviours, thus enhancing the distribution of behavioural patterns and its potential welfare implications. Together, these visualisation techniques contribute to the holistic approach of welfare, supporting a more informed and targeted approach to welfare enhancement.
Example of a time budget created for a hypothetical animal using the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis).

Data collection and visualisation within the PWIN framework can be facilitated by the use of the Akongo© software (https://akongo.eu), a dedicated digital platform developed for the standardised recording, storage, and graphical display of animal welfare data. Akongo© enables users to input questionnaire responses and behavioural observations directly and automatically generates visual outputs, such as radar charts and time-budget diagrams (Figures 2 and 3). This tool enhances reproducibility and comparability of welfare assessments across facilities by ensuring consistency in data structure and visual reporting.
Discussion
The PWIN protocol has been designed as a species-specific welfare assessment tool for laboratory macaques. It integrates both resource and animal-based measures with positive and negative welfare indicators to provide a complete evaluation of welfare at the individual level (Figure 4). This approach allows for the collection of data pertaining to the animal’s perception of its environment, thereby improving individual monitoring and highlighting any potential deficiencies in the environment. The combination of questionnaire responses and behavioural data offers a more comprehensive understanding of the animal’s welfare, facilitating the identification of any areas that require improvement. By cross-referencing these measures, it becomes possible to pinpoint specific factors that may contribute to suboptimal welfare, thus guiding necessary interventions. Indeed, the incorporation of a scoring system that allows for detailed, progressive evaluations can only encourage continuous improvements in animal care.
Process of welfare assessment using the Primate Welfare Indicators protocol for macaques (Macaca mulatta and M. fascicularis).

The analysis of behavioural data within the PWIN framework focuses on both general activity patterns and specific behavioural expressions. The use of behavioural categories enables global comparisons with prior observations on the individual and/or expected trends (e.g. reduced abnormal behaviours following enrichment). It also limits the impact of intrinsic individual characteristics (e.g. age, sex, species) on the analysis, as each behavioural category includes behaviours with varying sensitivity to these factors (e.g. affiliative behaviours combining social play, allogrooming, and social resting). In contrast, examining specific behaviours within each category allows for the detection of atypical patterns or age-inconsistent expressions. Social play, for instance, is well documented as a highly age-dependent behaviour in primates, particularly frequent in juveniles and subadults where it contributes to developmental processes (Palagi Reference Palagi2011; Liao et al. Reference Liao, Sosa, Wu and Zhang2018). A lack of social play in subadults living in social groups may signal reduced social engagement or potential welfare concerns (Held & Spinka Reference Held and Špinka2011). On the contrary, its absence in older animals, especially when other affiliative behaviours are present, should not be viewed as problematic (Liao et al. Reference Liao, Sosa, Wu and Zhang2018). Behavioural data should therefore always be interpreted in light of individual characteristics (e.g. age, sex, species). Rather than establishing fixed thresholds for individual behaviours, welfare interpretation should be grounded in longitudinal monitoring and contextualised against age-specific behavioural profiles (Salas et al. Reference Salas, Tallo-Parra and Manteca2024). This approach allows the identification of deviations from expected patterns and supports appropriate, evidence-based follow-up.
Moreover, establishing behavioural reference values for time budgets in captive animals is inherently challenging. In the case of captive macaques, assessors frequently turn to time budgets documented in wild populations as a proxy for species-typical behavioural baselines (Melfi & Feistner Reference Melfi and Feistner2002; Howell & Cheyne Reference Howell and Cheyne2019). While the aim of welfare in captivity is not to exactly replicate natural conditions, naturalistic time budgets provide a valuable starting point for identifying behaviours that should be encouraged (e.g. foraging, social interaction, or exploratory behaviour), and which might warrant attention (e.g. prolonged inactivity or stereotypies). Instead of establishing strict thresholds for time allocated to specific behaviours, the focus should be placed upon ensuring that animals are able to express a full range of species-specific behaviours, and that deviations from previous patterns serve as the primary indicators of potential welfare concerns. Each animal’s behavioural background serves as its own reference point. Sudden increases in inactivity, declines in social interaction, or the appearance of stereotypies may signal welfare deterioration, particularly when they diverge from the individual’s prior behavioural benchmark. Over time, this approach enables the assessment of whether an individual’s behavioural trajectory is converging toward or diverging from what could be considered a normal behavioural pattern, informed by both naturalistic baselines, personal trends and comparison with other individuals with similar profiles (e.g. age, social context).
Accordingly, behavioural interpretation should be carried out with caution, as relying upon the duration or frequency of a single behaviour may lead to misrepresentation of the animal’s welfare state. For instance, while pacing is frequently used as an indicator of compromised welfare in captive primates, its validity remains debated. Poirier et al. (Reference Poirier, Oliver, Castellano Bueno, Flecknell and Bateson2019) note that it does not consistently reflect negative affective states and may be influenced by contextual factors (e.g. enclosure design, feeding anticipation). Therefore, within the PWIN framework, pacing and other locomotory stereotypies should not be interpreted in isolation but evaluated alongside other behavioural and contextual indicators to improve interpretative reliability. The same caution must be applied to allogrooming, which is recognised as a key affiliative behaviour in primates and is often used as a positive welfare indicator, as it reflects social bonding, tension reduction, and cooperative relationships (Schino Reference Schino2001; Silk et al. Reference Silk, Alberts and Altmann2003). However, its interpretation must remain context-dependent: elevated allogrooming rates or asymmetrical allogrooming patterns can also emerge in response to social instability or serve as a coping strategy under stress (Sonnweber et al. Reference Sonnweber, Ravignani, Stobbe, Schiestl, Wallner and Fitch2015; Aureli & Yates Reference Aureli and Yates2010). Careful consideration of social and environmental context is therefore necessary to distinguish affiliative from stress-related motivations. A similar consideration applies to inactivity-related behaviours (e.g. resting, observing the environment). While periods of inactivity are natural and necessary for primates, particularly during low-arousal periods of the day, sustained or contextually inappropriate inactivity may signal a welfare concern. Fureix and Meagher (Reference Fureix and Meagher2015) highlight that heightened levels of inactivity, especially outside of expected resting phases, can reflect affective disturbances such as boredom or depressive-like states across a range of species. Likewise, MacLellan et al. (Reference MacLellan, Nazal, Young and Mason2022) have demonstrated that prolonged inactivity serves as a robust negative welfare indicator in laboratory rodents. These findings support the broader principle that excessive inactivity, particularly when associated with reduced social or exploratory behaviour, may reflect compromised welfare (Cassidy et al. Reference Cassidy, Hannibal, Semple and McCowan2020). Such patterns — whether involving increases or decreases in inactivity — should prompt further investigation and appropriate measures (e.g. assessing enrichment effects, adjusting social context).
Another important consideration in behavioural interpretation is the presence of a human observer in front of the focal animal, which can influence its behaviour. While habituation protocols are implemented to reduce such effects, the observation of behaviours directed toward the observer (e.g. vigilance, threat displays, or attempts to interact) can provide valuable information on the quality of the human-animal relationship. Elevated frequencies of such behaviours may indicate heightened arousal, anxiety, or hypervigilance, particularly in individuals with a history of negative human contact (Coleman & Maier Reference Coleman and Maier2010; Coleman & Pierre Reference Coleman and Pierre2014). The PWIN protocol includes these types of behaviours in its ethogram (Appendices B and C [Supplementary material]) not only to account for observation bias, but also as potential markers of welfare and human-animal relationship quality. Their frequency and form should be interpreted in light of individual history, previous human-animal interactions, and facility practices. As suggested in studies on other captive species, observer-directed behaviour may also serve as an early indicator of chronic stress or unmet social needs (Waitt et al. Reference Waitt, Buchanan-Smith and Morris2002; Hosey Reference Hosey2008). Including these behaviours in the assessment framework helps capture the complexity of animal responses to human presence in laboratory settings.
The analysis of questionnaire-based data is also a key component for the successful implementation of the protocol. It is crucial to emphasise that the welfare scores generated by PWIN do not represent absolute or objective measures of an animal’s welfare state. Rather, they offer a structured, comparative framework to guide observation, stimulate discussion among stakeholders, and support longitudinal monitoring. In practice, the most informative use of PWIN arises from repeated assessments over time, allowing for the detection of trends or abrupt changes in individual or group profiles. The interpretative power of the tool lies in this dynamic dimension — welfare scores should always be interpreted relative to the animal’s own history and the environmental and procedural context.
As with behavioural observation sessions, when suboptimal scores are identified in categories or subcategories, interpretation should not be limited to isolated values. Instead, a transversal discussion between veterinarians, researchers, and animal care staff should be initiated to contextualise the findings and explore potential causes. This multidisciplinary exchange is crucial to avoid misinterpretation and to ensure appropriate, coordinated responses (Hemsworth et al. Reference Hemsworth, Mellor, Cronin and Tilbrook2015; Martinez-Yanez et al. Reference Martínez-Yáñez, Mora-Medina and Albertos-Alpuche2025). For example, a marked decline in an individual’s weight, reflected in lowered health and behavioural scores, may stem from multiple factors: dental pain, which can be assessed through the dental health indicator; or a sudden social disruption, such as the loss of a social partner. In such cases, corrective actions may include veterinary intervention, temporary dietary adaptation, enrichment adjustments, and increased behavioural monitoring. This iterative, collaborative approach aligns with recommendations from established frameworks, which emphasise continuous feedback and the importance of stakeholder engagement in welfare decision-making (Fernandes et al. Reference Fernandes, Blache, Maloney, Martin, Venus, Walker and Tilbrook2019).
Although excluded from the quantitative scoring, informational items (e.g. pictures of the housing layout) play a complementary role in the PWIN protocol. They enable fine-grained documentation of the environment and support the contextual interpretation of scored indicators. These qualitative elements can also be used to track changes over time and to provide visual or descriptive feedback to animal care teams. As such, they contribute to strengthening dialogue between observers and facility staff, improving transparency, and supporting continuous welfare improvement. While such documentation adds meaningful contextual value, it is not always feasible to collect these materials for all facilities due to logistical or confidential constraints. Nevertheless, their absence does not compromise the implementation or validity of the protocol.
Finally, while some structural or environmental indicators (e.g. space available) are relatively stable over time, their inclusion in each assessment contributes to consistency in longitudinal monitoring. Using a fixed set of indicators — whether static or dynamic — ensures comparability across repeated evaluations and allows detection of meaningful changes in welfare-relevant dimensions (Botreau et al. Reference Botreau, Veissier, Butterworth, Bracke and Keeling2007; Wolfensohn et al. Reference Wolfensohn, Shotton, Bowley, Davies, Thompson and Justice2018; Arndt et al. Reference Arndt, Goerlich and van der Staay2023). Moreover, indicators perceived as stable may still evolve due to facility renovations, regulatory updates, or institutional changes, justifying their continued assessment within a comprehensive framework.
The implementation of the PWIN protocol in large primate facilities raises challenges related to feasibility and sampling design. In facilities housing several hundred to thousands of macaques, full individual assessments may not be feasible and sampling strategies could therefore be adopted. One possible option would be to follow existing welfare assessment frameworks, such as Welfare Quality®, which recommend assessing around 20–30% of individuals per social or housing category to obtain a representative overview (Welfare Quality® Protocol 2009; Battini et al. Reference Battini, Stilwell, Vieira, Barbieri, Canali and Mattiello2015; Hampton et al. Reference Hampton, MacKenzie and Forsyth2019). An approach inspired by the AWIN framework could also be considered, combining a general screening phase with simplified tools and a more detailed follow-up assessment when potential welfare risks are identified (AWIN 2015). Alternatively, in very large facilities, a sentinel-group strategy — similar to rodent health-monitoring systems, where one sentinel cage per 50–80 housing units is recommended (Mähler et al. Reference Mähler, Berard, Feinstein, Gallagher, Illgen-Wilcke, Pritchett-Corning and Raspa2014) — could provide an early-warning mechanism. Another possibility would be to select, within each social group, one individual per matriline, age class, sex, and social status, to capture variation within the population. Assessors should also maintain general visual monitoring of all group members, paying particular attention to low-ranking animals and to any signs of apathy, injury, or social withdrawal. Further studies will be needed to determine which of these approaches would offer the best balance between feasibility and representativeness for large facilities.
This protocol provides a practical framework for welfare monitoring, allowing research teams and animal care personnel to incorporate it into their daily routines with minimal disruption (Racciatti et al. Reference Racciatti, Feld, Rial, Blanco and Tallo-Parra2022). By ensuring that the process is non-invasive and readily implementable, the protocol enhances its feasibility for consistent use in laboratory settings. Moreover, the temporal dimension of welfare makes it essential to conduct behavioural monitoring over an extended period, using repeated behavioural measurements to capture any changes occurring in the environment that may have an impact on the macaques over time. Repeated sampling also enhances welfare monitoring by enabling assessments of enrichment strategies, as it reveals the impact of environmental modifications on stress levels and/or behavioural diversity (Baumgartner et al. Reference Baumgartner, Hüttner, Clegg, Hartmann, Garcia-Párraga, Manteca and Delfour2024).
The PWIN protocol is designed to evolve over time, integrating new approaches for the assessment of animal welfare or other methods of behavioural sampling (e.g. scan sampling), to further enhance its effectiveness. It is also designed in such way that it can adapt to changing standards in housing, care practices, and emerging scientific knowledge on macaque needs and animal welfare, thus ensuring that it will remain relevant for ongoing assessments of laboratory animal welfare. Regular updates and adjustments will improve the tool’s reliability and accuracy in evaluating welfare. Finally, the current protocol can also be adapted for other species in laboratory settings, extending to further NHP species as well as phylogenetically more distant animals.
The assessment of macaque welfare in research is essential for ethical, scientific and societal reasons. This is underscored by various ethical frameworks and legislative requirements that designate animal welfare as a critical consideration in research design and implementation (EC 2016). The sentience and cognitive complexity of NHP such as macaques require the prioritising of welfare to minimise harm and uphold ethical responsibilities (Beauchamp & Childress Reference Beauchamp and Childress1994). Ensuring robust welfare practices is also critical for maintaining the integrity of research: poor welfare conditions can result in confounding variables that significantly impact the validity and reproducibility of research findings (Carvalho et al. Reference Carvalho, Gaspar, Knight and Vicente2018; Descovich et al. Reference Descovich, Richmond, Leach, Buchanan-Smith, Flecknell, Farningham and Vick2019). Studies conducted under optimal welfare conditions are more likely to yield reliable and ecologically relevant results, enhancing their applicability to both human and primate health outcomes (Fraser Reference Fraser1995; Knight Reference Knight2011). The relationship between welfare and data quality highlights the necessity of promoting high welfare standards in research environments. Beyond its scientific implications, welfare evaluation plays a key role in preserving public trust and acceptance of research involving NHPs.
Animal welfare implications and conclusion
Transparent practices and adhesion to rigorous welfare standards demonstrate responsibility and bolster institutional credibility, addressing societal concerns regarding the ethical use of animals in science and also enhancing the quality of scientific research (EC 2016; Loss et al. Reference Loss, Melleu, Domingues, Lino-de-Oliveira and Viola2021; Ormandy et al. Reference Ormandy, Weary, Cvek, Fisher, Herrmann, Hobson-West and von Keyserlingk2019). Such operational tools, supported by ongoing training and refinement, ensure that welfare considerations are central to research activities.
For effective on-site assessments, it is essential that the protocol used not only meets criteria of validity, reliability and feasibility, but also allows for its repeated application over time by diverse observers including researchers, caretakers, and veterinarians (Knierim & Winckler Reference Knierim and Winckler2009). The routine use of such tools empowers staff to adopt multiple observational perspectives, thereby improving macaque living conditions.
The PWIN protocol will be deployed in laboratories that all have different protocols and installations and will be tested by a team of experts in animal behaviour as well as by each laboratory’s caretakers, researchers and veterinarians. The ultimate goal of our project is the widespread adoption of welfare protocols thereby promoting systematic evaluations that integrate animal welfare considerations into everyday laboratory practices.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/awf.2026.10061.
Competing interests
None.


