Rethinking the utility of the Five Domains model

The Five Domains model is influential in contemporary studies of animal welfare. It was originally presented as a conceptual model to understand the types of impact that procedures may impose on experimental animals. Its application has since broadened to cover a wide range of animal species and forms of animal use. However, it has also increasingly been applied as an animal welfare assessment tool, which is the focus of this paper. Several critical limitations associated with this approach have not been widely acknowledged, including that: (1) it relies upon expert or stakeholder opinion, with little transparency around the selection of these individuals; (2) quantitative scoring is typically attempted despite the absence of clear principles for aggregation of welfare measures and few attempts to account for uncertainty; (3) there have been few efforts to measure the repeatability of findings; and (4) it does not consider indirect and unintentional impacts such as those imposed on non-target animals. These deficiencies lead to concerns surrounding testability, repeatability and the potential for manipulation. We provide suggestions for refinement of how the Five Domains model is applied to partially address these limitations. We argue that the Five Domains model is useful for systematic consideration of all sources of possible welfare compromise and enhancement, but is not, in its current state, fit-for-purpose as an assessment tool. We argue for wider acknowledgment of the operational limits of using the model as an assessment tool, prioritisation of the studies needed for its validation, and encourage improvements to this approach.


Introduction
Animal welfare science remains a young and dynamic discipline.Approaches to conceptualising and measuring animal welfare have evolved considerably since the modern discussion of animal welfare began in the 1960s (Broom 2011).Like in any other branch of science, progress and evolution in our understanding of animal welfare rely upon challenge, debate and argumentation.However, there are suggestions that progress has been slowing in several major fields of science, becoming 'less disruptive' in recent decades.One recent meta-analysis showed that newer papers are increasingly less likely to break with the past in ways that push science in new directions (Park et al. 2023).This trend should be counteracted in animal welfare science.
In this paper, we highlight one animal welfare paradigm that, despite being nearly 30 years old, and widely used in many (but not all) global regions, has not been robustly challenged: the Five Domains model.We specifically explore the limitations of the Five Domains model when used as an animal welfare assessment tool.We provide a brief review of how the Five Domains has silently evolved from a conceptual model into an assessment tool.In doing so, we aim to provide a constructive review of the limitations of this approach, provide examples of misuse, make suggestions for refinement, and discuss alternative approaches.We consider the possibility that the absence of 'disruptive' studies challenging this paradigm may be slowing the evolution of the animal welfare discipline.We begin with a brief review of when, how and where the model has been used.

The history of the Five Domains model
The Five Domains model was proposed approximately 30 years ago as a conceptual framework to simplify animal welfare considerations for research animals (Mellor & Reid 1994).It was built on the foundations of the Brambell Report (Brambell 1965) and the Five Freedoms model (Farm Animal Welfare Committee 1979).The model was initially created to assess the impact of a proposed animal experiment or usage by considering all sources of possible welfare compromise, that is, negative welfare (at least on animals directly and intentionally impacted by humans).
The Five Domains model is based on the affective state (or feelings)-based conception of animal welfare called 'hedonism' in the philosophical literature (Appleby & Sandøe 2002), which is one of the mainstream views of animal welfare (Beausoleil et al. 2018), and is accounted for in a transparent way by the authors of the model (Mellor et al. 2020).
The model proposes four physical/functional domains (nutrition, environment, health and behaviour (recently renamed 'behavioural interaction' [Mellor et al. 2020]) and the fifth domain is the so-called mental state.The basic idea is that the welfare state reflects the sum of the animal's mental experiences (Harvey et al. 2022).Hence, the welfare status of an animal is a direct function of the feelings of the animal: the mental domain.The Five Domains model then, in turn, interprets the experiences of animals as the function of four other aspects or 'domains' of animals' lives: their nutritional state, the environment in which they live, their physical health, and their behavioural opportunities (Mellor & Beausoleil 2015).In short, the model says that animals' welfare is determined by the quality of their experiences, and our best evidence regarding the quality of their experiences comes from the four other domains.
Subsequently updated, the model has had various manifestations since its first conceptualisation (Mellor & Beausoleil 2015;Mellor 2017;Mellor et al. 2020) and has recently been extended to incorporate positive welfare states (Mellor & Beausoleil 2015) and human-animal interactions (Mellor et al. 2020).

From a conceptual framework to a measurement tool
Conceptual use of the Five Domains model aligns with what it was designed for.In this context, the outputs of the model are used to identify negative animal welfare impacts (risks) and positive animal welfare impacts (enhancements) associated with any animal manipulation activity.This conceptual way of applying the model has been framed as "risk assessment" (Sherwen et al. 2018) when the aim is to identify knowledge gaps and research priorities relating to negative welfare.However, the term "hazard identification" may be a more appropriate term in light of the risk analysis framework (EFSA Panel on Animal Health and Welfare 2012).Conceptual use of the model has also been framed as "identifying opportunities to promote positive welfare" (Kells 2021) when the focus is on positive welfare.Put another way, using the model as a conceptual tool can be thought of as a "focussing device" (Mellor 2017) for animal welfare discussions.Importantly, conceptual applications of the model do not attempt to quantify or rank welfare outcomes.But how do we go from thinking about different contributors to animal welfare to scoring them, ranking them or comparing them?This is much less clear.
This second type of application is importantly different from conceptual studies (although the distinction is not always recognised by the authors of such studies).In this case, the model is used as something it was not designed to be: a (purportedly) quantitative, scientifically robust animal welfare assessment tool that is often used to rank different techniques (Sharp & Saunders 2011).Over the past two decades an increasing number of authors and organisations have used the Five Domains model in this way.But, before this is discussed, it is necessary to reflect on how the model has been adapted into an assessment tool that is then used to assess animal welfare.

Generation of scores in assessments
Although there is considerable variation in the methods used by studies attempting to deploy the Five Domains model as a quantitative assessment tool, there are some common features (Mellor 2017).Mellor (2017) provides an instructive account of the operational details of using the Five Domains model as an animal welfare assessment tool, and an earlier account is provided by Sharp and Saunders (2011).First, an individual possible intervention (or even an individual animal) (Littlewood & Mellor 2016) or group of interventions (e.g. 14 rodent control methods) (De Ruyver et al. 2023) is defined for assessment.Second, a panel of experts (animal welfare scientists) or stakeholders (technicians, community members or representatives of advocacy groups) are assembled.Third, background/summary literature summaries are provided to each panel member by a convener.Fourth, the panel are asked to provide numerical/categorical scores for each domain, each technique and, in some cases, for more than one phase of the intervention studied, e.g.(a) prior to death, and (b) mode of death (De Ruyver et al. 2023).These scores are meant to reflect the magnitude of negative (and recently, positive) feelings an animal might experience in each of the Five Domains, both prior to death and via their mode of death (if a lethal method is used) (Baker et al. 2016).These scores are typically given on an ordinal scale of leastto-most suffering, e.g.1-8 (De Ruyver et al. 2023), 0-5 (Hampton et al. 2016a) or A-D (Littlewood & Mellor 2016) in a per-category scoring system.The outputs are then used to rank the techniques assessed, e.g. a score of 4D (4 on a 0-5 scale for suffering prior to death, and D on a scale of A-H for suffering due to mode of death) for ground-based chest shooting of wild dromedary camels (Camelus dromedarius) (Hampton et al. 2016a).Such outputs are described as "systematic, holistic, data-based assessments" (Beausoleil et al. 2022).

Contemporary applications
Since the 1990s, the Five Domains model has been applied to a variety of animal groups impacted by human activities.The model has been particularly widely used by investigators in New Zealand, Australia, and Europe (Table 1).However, the Five Domains model has far from a global monopoly on animal welfare assessment, with comparatively little use in the global regions of North and South America, Asia and Africa (Table 1).Today, the Five Domains model is being used more and more widely to understand anthropogenic effects on the welfare of a wide range of animals, including research animals, wildlife, livestock, and companion animals (Table 1).

Animals used in research
The Five Domains model was first used in a regulatory context to systematically assess the welfare impacts of animal research activities in New Zealand (Mellor & Beausoleil 2015).For example, integer scores are awarded out of five for each of the four physiological domains (A, B, C and D) and the aggregate of this score (a composite score) is then used to appraise the harm that is imposed on the research animals.It has subsequently been adopted by several animal research institutions, but this type of use is rarely published.

Free-ranging wildlife
The Five Domains has seen perhaps the widest uptake in the field of wildlife management.It has been adopted for over ten years to assess practices used to kill and remove introduced wildlife species in Australia (Sharp & Saunders 2011;Hampton et al. 2016a;Harvey et al. 2020Harvey et al. , 2021Harvey et al. , 2023)).It has since been applied to introduced wildlife in New Zealand (Beausoleil et al. 2016), nuisance ('pest') native wildlife species in the United Kingdom (Baker et al. 2016(Baker et al. , 2022) ) and other countries such as Belgium (De Ruyver et al. 2023).
The model has also been used as an assessment and ranking tool to examine internationally practiced management actions such as the use of guardian animals to protect livestock from attacks by wild predators (Allen et al. 2019).More recently, the model has been applied to inform management actions in emergency scenarios involving native conserved wildlife, such as whale strandings in New Zealand (Boys et al. 2022a,b).

Zoos and aquaria
There is growing use of the Five Domains for captive wildlife (Clegg et al. 2015;Sherwen et al. 2018), with significant uptake among zoos (Kagan et al. 2015;Ward et al. 2020).In fact, the World Association of Zoos and Aquaria's (WAZA) animal welfare strategy recommends that zoos and aquaria apply the Five Domains model to assess animal welfare (Mellor et al. 2015).

Companion animals and horses
There have been a limited number of studies to apply the Five Domains model to the welfare of pet dogs (Canis familiaris)

Problems with use of the Five Domains model as a welfare assessment tool
What began as a way of conceptualising all of the complex inputs that might affect animal welfare, clearly rooted in an affective state viewpoint has, as we have seen, been developed into an assessment tool producing scores purporting to compare and rank different human interventions on the welfare of a range of animals.However, we think there are several critical limitations of this kind of use of the Five Domains approach.These problems have not been widely acknowledged by the animal welfare community, and the growing use of the model as an assessment tool (Table 1) has gone largely unchallenged in the animal welfare literature, with few (if any) published studies critiquing these findings.We highlight some of the key methodological details of studies that have used the Five Domains model as an animal welfare assessment and ranking tool in Table 1.In populating this table, we searched for peerreviewed studies that have generated assessment scores using the model.We did not include non-peer-reviewed studies, e.g.Sharp and Saunders (2011), nor those peer-reviewed studies that applied the model to hypothetical animals or scenarios, e.g.Littlewood andMellor (2016), andHarvey et al. (2020).We describe some of the critical problems with such studies below.

Subjective experiences to objective scores
One of the central challenges of animal welfare science is how to translate something fundamentally subjective, animals' feelings, into objective terms.Simply, there is no direct way to measure the experiences or feelings of animals (Browning 2022b).One strength of the Five Domains model is that it starts with a clear and transparent notion of animal welfare, and the developers of the model recognise that this notion is subjective and therefore cannot be measured directly (Mellor 2016).So, how are animal welfare inferences made?This is a general challenge of quantitative animal welfare assessmentproviding an objective view of a subjective state (Fraser & Duncan 1998).
Therefore, even if the welfare of animals is determined by the overall quality of their experiences, the study of their welfare has to be focused on detectable indicators of these experiences (Fraser 2008).If mental experiences such as pain, breathlessness and fear cannot be measured directly, they must be cautiously extrapolated from observable indicators of the animal's physical or physiological state or its behaviour.This task is not straightforward (Browning 2022a).How these judgements are made, who is qualified to make them, how their accompanying scores are reached, and what evidence is considered in reaching these conclusions, are just some of the obstacles in the way of making defensible conclusions.
A common answer to this problem is to gather a panel to make the assessment.

Selection of panel members
The assessments generated from the Five Domains model are derived through the use of panels and suffer from the problems inherent to the use of expert opinion or 'eminence' (Hampton et al. 2016b).This is of particular concern for contexts in which there are gaps in scientific understanding, and model outputs may thereby fail to be evidence-based (Baker et al. 2016).Ten obvious questions here are: (1) who is selected to sit on a Five Domains panel?; (2) who decides on panel member selection?; (3) what criteria are used to include or exclude potential panel members?; (4) how many panel members are selected?;(5) how will the panel members interact?; (6) will a consensus be sought?; (7) if so, how will a consensus be reached?; (8) what happens if a consensus cannot be reached?; (9) who will scrutinise the panel's decisions?; and (10) how is impartiality assessed?Much hinges on the answers to these questions.
Humans are prone to confirmation and disconfirmation biases (Nickerson 1998)we interpret evidence to support conclusions we want to reach, and we see what we want to see.These biases certainly extend to animal welfare questions (Buddle et al. 2018), and the Five Domains model is not unique in suffering from these problems; they apply to any use of expert panels.So, the panel approach introduces considerable subjectivity to the process.For instance, some people sympathise more with certain animals compared to others.Likewise, some people start the process with an attachment to one or another of the proposed methods (e.g.zookeepers or other zoo employees may be inclined to favour existing zoo husbandry practices) (Sherwen et al. 2018).Conversely, panel members familiar with and supportive of existing practices may be unduly critical of newly developed or newly proposed alternative practices or technologies (Johnson et al. 2019).There are also gender and ethnic disadvantages to consider (e.g.men 'silencing' women) (Shpungin et al. 2012), as well as the rarely considered benefits of including indigenous knowledge and perspectives (Normyle et al. 2022).
Few of the studies that have used the Five Domains model as an assessment and ranking tool have reported how panel members have been selected (Table 1).This raises the important question of whether panel members are selected because of their expertise in animal welfare science, special knowledge/insight, relevance as a stakeholder, desire to volunteer, ideological alignment with colleagues, affirmative action criteria, personal relationship to the panel convenor, or something else.There is a serious concern that unstated conflicts of interest will be present in this environment, with a prime example being the possibility of a panel member that is funded by an industry and who demonstrates bias towards the funding agencies' interests (Van der Schot & Phillips 2013).

Experts or stakeholders?
It is often not clear whether panel members are selected as 'experts' or 'stakeholders.'Some published studies have specified the need for diverse and non-scientific backgrounds for panel members: "there would be merit in engaging panels or consultative networks with wide expertise and experience" (Mellor & Beausoleil 2015).Mellor et al. (2020) state that "Any assumption of the occurrence of negative affects must be supported by directly observed animalbased physical, physiological, clinical and/or behavioural evidence".The same authors go on to say, "This is equally the case for the presence of opportunities for animals to engage in rewarding behaviours."Clearly, there must be evidence, usually behavioural, that any such opportunities are actually used before their potential welfare-enhancing impacts could be considered.Only then can inferences be made about any aligned negative or positive effects.Finally, Mellor et al. (2020) posit that "This emphasises the general point that objective animal-based evidence (Domains 1 to 4) must form the foundations of any inferences about welfare-relevant affects (Domain 5)." Thus, the authors that developed and refined the conceptual model seem to be recommending that this evidence should be utilised by whoever is doing the assessment (individual or panel).So then why specify diverse and non-scientific backgrounds for panel members?The issue of scientific literacy must then be addressed, i.e. raising the question of whether non-expert panel members can comprehend the relevant evidence.This leads to questions regarding exactly what the criteria are for choosing ('inclusion criteria') or not choosing ('exclusion criteria') potential panel members.These are important factors to consider when regarding the findings of any panel, e.g.juries (Cullen & Monds 2020), but are rarely reported in Five Domains assessment studies (Table 1).

Consensus or majority, confidential or discussive?
The next key question is how panel members are giving their views and how collective decision-making is achieved.Views may be given individually (this may be confidential, anonymous or blinded), or in a group process.There are merits to the former, as used by De Ruyver et al. (2023) to avoid "groupthink" (Resnik & Smith 2020), a phenomenon that occurs when a group of individuals reaches a consensus without critical reasoning or evaluation of the consequences or alternatives.Groupthink is based on a common desire not to upset the balance of a group of people.Not all studies that have used the Five Domains model as an assessment and ranking tool have reported which approach they have used (Table 1).
For consensus methods, there is the issue of disproportionate influence of dominant persons (Gavrilets et al. 2016).In other words, all conclusions reached are stated to be the product of the deliberations of the panel members but may effectively reflect which panel members were most opinionated or domineering.Some Five Domains assessment studies have recognised this pitfall, and have instead required each panellist to independently generate scores, which are then discussed at a subsequent meeting of the panel, with final scores reported as average values (e.g.medians) (Beausoleil et al. 2016), rather than attempting to reach consensus.

Use of published and quantitative research
For many published Five Domains assessment studies, there are scant details provided of how previously published research is used to reach scores.As can be seen in Table 1, details of literature searches are rarely reported in studies that use the Five Domains model as an assessment and ranking tool.Mellor et al. (2020) refer, somewhat obliquely, to "scientifically informed best judgement" in describing how this process occurs.This is sound advice, but it leaves a number of important details to the discretion of the panel: who collates the scientific evidence, what evidence is deemed to be relevant to the context (e.g. are studies from other species considered), and who reviews whether the collation is appropriate?
There is considerable variation in how each Five Domains assessment reviews and presents available evidence to panel members.McGreevy et al. (2018), for example, stated that "each context leader supplied an overview of the context, as well as annotated references, to support welfare assessments during the workshop.This was distributed to the panellists as a handbook."Whereas, Sherwen et al. (2018) described their efforts to "systematically collect information from a team of experienced zoo personnel who included zookeepers, veterinarians, managers, and a welfare researcher/specialist to allow potential and/or current risks to animal welfare to be identified."While these efforts are admirable, these processes more closely align with roundtable discussions than controlled scientific reviews.

Repeatability
We are unaware of any studies that have tested the repeatability of Five Domains assessments.Reliability is a core tenet of the scientific method and is defined as the extent to which measures are repeatable and consistent, i.e. the similarity between repeated measurements of the same item (Windschnurer et al. 2008).This is a fundamental requirement of any reliable scientific measurement method.
Reliability for animal welfare assessments can be classified into interobserver and test-retest (or repeatability) reliability (Vaz et al. 2013).Interobserver reliability assesses the role that differences play among observers (Harley et al. 2021).Different, but similarly trained, observers should obtain the same results when assessing the same animals at the same time under the same circumstances but independently from each other (O'Callaghan et al. 2003).An interobserver reliability study of Five Domains panels would assess whether different panels performing assessments on an identical practice repeat the findings of a separate panel.Test-retest reliability characterises the consistency of the method over time and, thus, the repeatability of the results (Vaz et al. 2013).These measures have been quantified for other animal welfare assessment systems, e.g.Welfare Quality® (Friedrich et al. 2020), but never for the Five Domains, casting doubt over how reliable its outputs are.

Aggregation
To be able to reach a conclusion about the net welfare outcome of a welfare assessment based on the Five Domains model, it is necessary to present a model on how to aggregate the values of the different criteria scored by a panel.However, impact categories are ordinal, and differences between pairs of adjacent categories may not be linearly related (Baker et al. 2016).According to the statement of the latest (2020) outline of the model, this ambition cannot be fulfilled due to the "limits imposed by an inability to determine the relative impacts of different affects when evaluating the notional overall negative-positive affective balance represented by QoL (quality of life), thereby precluding the possibility of elaborating an all-inclusive QoL metric" (Mellor et al. 2020).
Given these limitations, all that can effectively be done is to assess how different aspects of the four first domains contribute positively or negatively to the overall welfare, but not how the many positive and negative inputs add up in terms of the net welfare of the animals in question.However, since conclusions in terms of net welfare outcomes are actually reached, the way aggregation is undertaken should be made transparent.Therefore, it is fair to ask how the scores awarded for each domain are weighted when added to one another.Further, do any of the domain scores strongly correlate with one another?That is, does one domain score predict the fifth Domain of mental state in a similar way to another domain score, making one of the two domain scores redundant?In a later section, we discuss how alternative welfare assessment models tackle the problem of aggregation in more robust ways.

Uncertainty
Conclusions reached through use of the Five Domains model are ultimately qualitative in nature but are commonly denoted by numerical scores (e.g. 3 out of 5).What is often lacking (aside from transparent explanation of how these scores are derived) is a measure of uncertainty, or the level of confidence that can be attached to these findings.In a few studies, panel members have been asked to nominate a confidence score (e.g.0-3), to reflect their confidence in the scores produced for each technique assessed (Beausoleil & Mellor 2015;Beausoleil et al. 2016;Harvey et al. 2020;Baker et al. 2022).However, it is not always transparent how these confidence scores are arrived at, and many studies make no attempt to estimate uncertainty (Table 1).
The developers of the model have suggested that accessing scientifically informed expert opinion should minimise this uncertainty, but this has not been tested.Consequently, the outputs of the model may suggest a certain level of precision, but this is not calculated and so should be interpreted with caution (Mellor & Beausoleil 2015).Baker et al. (2016) recognised this limitation in their assessment of wildlife control methods, noting that "some rankings appeared counter-intuitive, highlighting the need for objective formal welfare assessments." Another source of uncertainty is derived from the fact that Five Domains assessments often consider procedural documents stating how practices should be conducted (Baker et al. 2016), rather than considering animal-based data documenting how they are conducted (Hampton et al. 2016b), including how often adverse events occur (Hampton et al. 2019).In general, this approach has taken the form of checklist audits assessing compliance with conditions prescribed in procedural documents to allow simple reporting to stakeholders.Hence, there is substantial uncertainty around whether the procedures that are assessed via Five Domains assessments are actually those that are performed.

Unintentional and indirect impacts
As the Five Domains model was conceived for a laboratory context, it focuses on animals intentionally impacted by human activities (Mellor & Reid 1994).This works well enough for controlled environments such as laboratories.However, focus on a single species of animal becomes limiting in complex environments that contain multiple groups of animals (Fisher et al. 2019).In complex ecosystems, any single-species assessment excludes a large suite of processes that harm animals either unintentionally or indirectly (Hampton et al. 2022).For free-ranging animals, intentional and direct impacts on animals constitute only a fraction of the ways in which human activities harm animals (Fraser & MacRae 2011).After all, most of the ways in which humans harm animals are not intentional or even direct.In fact, most are derived from processes of which we may not be fully aware, such as the impacts of windows on wild birds (Loss et al. 2015) or extreme heat on wild bats such as grey-headed flyingfoxes (Pteropus poliocephalus) (Mo et al. 2021).This realisation has led to the idea of 'One Welfare' (Pinillos et al. 2016).
There is no current pathway for incorporating these processes into Five Domains outputs, or at least there has been no attempt to do so thus far.We found no published studies that accounted for these effects (Table 1).It is particularly unfortunate that the Five Domains has been so widely used for wildlife, where unintentional and indirect effects are so impactful (Fraser & MacRae 2011;Allen & Hampton 2020;Hampton et al. 2021).Nonetheless, claims have been made that the model allows holistic animal welfare assessments for wildlife contexts (Sharp & Saunders 2011), but yet fail to account for processes that can profoundly contribute to the suffering of vast numbers of animals.
One example is assessment of kangaroo (Macropus and Osphranter spp) management options (Stephens 2021) that fails to account for poisoning of wildlife scavengers (Hampton et al. 2022).Similarly, wild rodent control assessments have been conducted (De Ruyver et al. 2023) without considering secondary poisoning of non-target wildlife species from anticoagulant rodenticides (Fisher et al. 2019).We acknowledge that failure to account for unintentional harms is shared by many proposed assessment tools that were designed for domestic animal contextsthis is not a weakness unique to the Five Domains.

Misuse
We have concerns that the outputs of the Five Domains model may be communicated in a misleading way if they are presented as the product of a fully-fledged scientific assessment method.We are further concerned that this misrepresentation may progress to misuse (or even abuse) under certain circumstances.Due to the reliance on 'expert' opinion, the model is susceptible to manipulation by panel members with aligned professional interests advancing their own agendas by reaching a pre-conceived conclusion when assessing contentious practices.This may occur if panel members are hand-selected, and findings are communicated in opaque ways.Regrettably, the list of studies to have misused the model and/or failed to appropriately recognise limitations with model use includes some of the authors of this article (Hampton et al. 2016a;Allen et al. 2019).
At worst, the Five Domains assessment approach is highly manipulable if investigators wish to collude to reach a pre-agreed conclusion.In the very worst-case scenario, the outputs of a Five Domains assessment may amount to nothing more than the opinions of the loudest or most determined member of an opaquely selected panel, expressed as numerical scores without measures of uncertainty.

Overview of limitations
The limitations listed and discussed above are certainly not unknown (Beausoleil & Mellor 2015).We do not wish to imply that we are the first to identify them.They have been explicitly acknowledged by the developers of the model, who have suggested that the model helpfully advances the evaluation of animal welfare impacts, provided that its limitations are borne in mind (Beausoleil & Mellor 2015;Mellor 2017;Mellor et al. 2020).However, we observe that, increasingly, researchers applying the Five Domains model are not bearing these critical limitations (e.g.subjective experiences to objective scores, expert opinion, repeatability, aggregation, uncertainty, and unintentional and indirect impacts) in mind.
Failure to acknowledge the limitations of the Five Domains model gives rise to the representation of the model as a 'one-stop shop' for animal welfare considerations in some contexts.We are concerned that opportunistic organisations may use the Five Domains model as a public relations façade (Hampton et al. 2016b).This strategy has allowed claims that animal welfare concerns have been assessed or addressed without anything more than desktop exercises being undertaken.However, we feel that there are some achievable steps that could be taken to improve the scientific validity of the Five Domains assessment approach.

Suggestions for refinement
If the Five Domains continues to be used as an animal welfare measurement tool, there are several refinements that are necessary for it to become fit for this purpose.Like other such models, it will have its limitations which should be clearly stated.To enable refinement and transparency, a number of improvements should and could be made in the way that the model is used.
Firstly, studies are urgently needed to explore repeatability.There is a substantial body of literature on measuring repeatability and reducing variability in qualitative panel assessments (Vaz et al. 2013;Friedrich et al. 2020).Such studies could assess: (1) whether panels with different membership give the same scores to the same techniques assessed; (2) whether panel size influences scores; and (3) whether the same panel gives the same score when assessed at different time-points, and so forth.
Second, there is an onus on authors using the model to improve the transparency of their research by disclosing (at the minimum): (1) how their panel members were selected; (2) the size of their panel; (3) an overview of the literature provided to the panel; and (4) the process(es) used for resolving disagreement and reaching consensus.
Third, as shown by the study of De Ruyver et al. (2023), there is need for refinement (or at least transparency) in processes used to reach group-based final scores that either involve anonymous or confidential scoring in a democratic system or discussion/consensus.An alternative approach to those proposed previously may be a moderated discussion that culminates in anonymous scoring.The Delphi method uses a similar approach, whereby the vital elements include anonymity, controlled feedback, and iteration to refine stated opinions and reach consensus (Nasa et al. 2021).
Fourth, improved methods are required to translate subjective assessments into numerical scores, including: (1) whether the scores within domains can be assumed to be linearly related, e.g.does a score of 4 represent twice the impact as a score of 2, just as a score of 2 represents twice the impact as a score of 1?; and (2) how scores are aggregated between domains, accounting for what weighting is given to each domain.We acknowledge that there will never be a perfect solution to the problem of converting and summating numerical scores from subjective assessments.Each solution will be based on ethical and methodological assumptions that can be debated (Sandøe et al. 2019).However, what can be achieved is a solution where these assumptions are made transparent.Unlike what is the case now where conclusions about net welfare outcomes are drawn based on the Five Domains model in an opaque way.
Fifth, the way in which scientific uncertainty is estimated and communicated needs to be refined and made consistent.A simple approach to partially offset this problem is to convene two panels and compare their conclusions, or to publish and discuss the variability in the experts' scores (see, for example, Sandøe et al. [2022] for a way to do this).
Sixth, unintentional and indirect impacts should be accounted for, particularly for wildlife studies, as per the 'harms' model (Fraser & MacRae 2011).
Seventh, there is a need for better, well-validated, animal-based measures for many species and management contexts (Harvey et al. 2023).

Comparison to alternative approaches
There are several alternative animal welfare assessment approaches currently used that may be compared to the Five Domains model; each has its own strengths and weaknesses.Two notable examples are Welfare Quality® (Friedrich et al. 2020) and the Benchmark method (Sandøe et al. 2022).In contrast to the Five Domains, complex aggregation algorithms and repeatability measures have been developed for Welfare Quality® (Friedrich et al. 2020).The group behind the Welfare Quality® framework have developed a mathematical model to handle the various concerns relating to aggregation, notably to avoid positive values on some parameters being used to compensate for negative values on other parameters (Botreau et al. 2007a,b;Veissier et al. 2011).The group behind the Benchmark approach has developed a simpler approach where experts first value different conditions belonging to different domains and subsequently weigh the relative importance of the domains (Sandøe et al. 2022).However, many of the problems described above relating to selection of panel members may also apply to approaches such as Welfare Quality® and the Benchmark approach as well.
The 'harms' model (Fraser & MacRae 2011) is a useful way to visualise the ecosystem-wide consequences of any anthropogenic activity by systematically assessing its negative animal welfare impacts.The model was designed to explicitly include consideration of processes that harm animals but may not be perpetrated deliberately or widely known, sometimes referred to as 'invisible' harms (Finn & Stephens 2017).However, this model generates only a list of harmful processes, with no attempts made at grading severity or attempting aggregation.It is useful for visualising the breadth of animal welfare impacts associated with any activity (Hampton et al. 2021) and this facet could potentially be incorporated into Five Domains assessments if there was a desire to make assessments truly holistic.

Why has this not been challenged before?
Given some of the concerns we have identified above relating to the use of the Five Domains model as a scientific assessment tool, it is reasonable to ask why this has not been challenged more often (or at all) over recent decades.We can only speculate here.
The model is intuitive, relying largely upon its simplicity: any animal manipulation practice of interest can be assessed by sitting down with a few colleagues in a room for a day.In many cases, a publishable paper can be produced from these efforts, which can then be cited by others as 'evidence' that the assessment method is sound.Use of the model is undoubtedly convenient.It is quick and inexpensive to use and allows simple reporting to stakeholders.It allows animal welfare researchers to publish supposedly science-based assessments from the desktop without the inconvenience of those logistical elements required for animal-based research, including licences from institutional animal research committees.
The appeal for industry bodies is understandable too, given the costs of commissioning a Five Domains panel will be a fraction of that for a multi-year animal experiment or monitoring programme.The results can also be controlled more easily than independent research that produces incontrovertible results, through selection of panel members or consensus/democracy processes.But we contend that this convenience comes at a price, and scientific legitimacy is the most important price that the animal welfare community may pay.However, we are not suggesting that the majority of practitioners resorting to Five Domains assessments do so because they have ulterior motives or are lax, but because they are trying to weigh up welfare challenges that are currently difficult to compare empirically using the tools of science.Nonetheless, the approach remains concerningly susceptible to manipulation and misuse.

Animal welfare implications
For animal welfare to maintain scientific legitimacy, it is essential that the most objective and transparent methods (recognising that there are likely to be several competing models) are used to empirically investigate how animals are affected by human interventions and activities.This legitimacy will undoubtedly be eroded if animal welfare publications are seen to amount to nothing more than assertions of eminence from groups that include community members or stakeholders who may lack an understanding of animalbased evidence and its welfare implications, and whose impartiality may be called into question.We contend that the Five Domains model is appropriate for conceptualising animal welfare impacts and enhancements.It also seems reasonable to use this qualitative tool to highlight areas of welfare risk, as well as areas requiring further investigation, as proposed by Sherwen et al. (2018).It could also be used as a "hypothesis generating" (Biesecker 2013) device, whereby the conclusions from a Five Domains panel are adapted into a hypothesis relating to potential welfare impacts that may be occurring, which can then be explored using scientific methods (Baker et al. 2016).However, even in this context, limitations with use of the models do need to be appropriately recognised.In our view, the Five Domains model is not currently developed into a suitable method for quantifying or ranking animal welfare impacts.The problems we have outlined with this approach cannot be ignored.We suggest that using the outputs of the model, in its current state, to guide policy in the guise of a science-based approach is particularly fraught (Johnson et al. 2019).We think the model could be developed into a more credible and useful assessment tool if some of our above suggestions for improvements are followed and if results are reported in a way where there is full transparency about their limitations.

Conclusion
In conclusion, we want to be clear that we see great value in the Five Domains model as a way of thinking about animal welfare.It has undoubtedly made a substantial improvement on earlier influential paradigms in animal welfare such as the Five Freedoms.It is a wonderful starting point for animal welfare conversations and is fit to serve as the nucleus for decision-making processes.However, like the Five Freedoms paradigm before it (McCulloch 2013), the model is not currently fit-for-purpose as an assessment tool.We contend that some attempts to utilise it as such a tool, especially in poorly studied contexts, misuse the model, and may stifle empirical animal-based studies (Hampton et al. 2016b).To be clear, the limitations we have outlined here in varying degrees apply to other conceptual frameworks when used as an assessment tool.We perceive a reputational risk if animal welfare scientists knowingly continue to use an approach that is not transparent nor repeatable.If this trend is not arrested, we feel that there is a very real risk that animal welfare will increasingly be viewed as a pseudoscience (a practice mistakenly regarded as being based on scientific methods) (Truran 2013) by other stakeholders.After all, a key marker of scientific claims is that they are testable and therefore falsifiable.We look forward to seeing use of the Five Domains in its current form developed by the next generation of animal welfare scientists.If nothing else, we hope that this paper stimulates further discussion and some re-thinking or revision of the use of the Five Domains as an integrated assessment of animal welfare.

Table 1 .
Methodological details of peer-reviewed studies that have used the Five Domains model as an animal welfare assessment and ranking tool.NR = not reported.NA = not applicable.SOP = standard operating procedure.
*Two scores reported for some lethal wildlife control techniques: Part A = before death, and Part B = mode of death.