A framework for evaluating the effectiveness of conservation attention at the species level

Abstract It is essential to understand whether conservation interventions are having the desired effect, particularly in light of increasing pressures on biodiversity and because of requirements by donors that project success be demonstrated. Whilst most evaluations look at effectiveness at a project or organizational level, local efforts need to be connected to an understanding of the effectiveness of conservation directed at a species as a whole, particularly as most metrics of conservation success are at the level of species. We present a framework for measuring the effectiveness of conservation attention at a species level over time, based on scoring eight factors essential for species conservation (engaging stakeholders, management programme, education and awareness, funding and resource mobilization, addressing threats, communication, capacity building and status knowledge), across input, output and outcome stages, in relation to the proportion of the species’ range where each factor attains its highest score. The framework was tested using expert elicitation for 35 mammal and amphibian species on the Zoological Society of London's list of Evolutionarily Distinct and Globally Endangered species. Broad patterns in the index produced by the framework could suggest potential mechanisms underlying change in species status. Assigning an uncertainty score to information demonstrates not only where gaps in knowledge exist, but discrepancies in knowledge between experts. This framework could be a useful tool to link local and global scales of impact on species conservation, and could provide a simple and visually appealing way of tracking conservation over time.


Introduction
I n recent years there has been a surge in efforts to determine the effectiveness of conservation interventions (e.g. Christensen, ; TNC, ; Kapos et al., ; Black & Groombridge, ). It is essential to understand whether conservation interventions are having the desired effect, particularly in light of increasing pressures on biodiversity (Butchart et al., ) and the increased concern of donors that the projects they are supporting are working (Redford & Taber, ).
Most proposed methods focus on measuring effectiveness at either a project (e.g. O'Neill, ; Kapos et al., ) or organizational level (e.g. Christensen, ). Whilst assessing effectiveness at these levels is important, there is also a need to connect local successes with the global distribution and status of a species to understand the effectiveness of conservation directed at a species as a whole (Saterson et al., ; Kondolf et al., ; Redford et al., ). Although there is a shift to ecosystem-based approaches (MEA, ), most metrics of biodiversity, including measurement of conservation success, are expressed in units of species (Mace, ; Garnett & Christidis, ; Tobias et al., ). Hence there is a mismatch between the organizational or project-level evaluation tools available for conservationists and the species focus of many international conservation programmes.
Conceptual models for visualizing the components of a conservation intervention comprise the Stages of input, output, and outcome (two additional Stages, strategy and impact, are sometimes added to the beginning and end of the list, respectively; Woodhill, ; Ferraro & Pattanayak, ; Margoluis et al., ). Input denotes resources committed to an intervention, or plans for action (Woodhill, ). Outputs usually consist of tangible products, and are often reported as a measure of success, and can include money spent, awareness materials produced or people trained (Mace et al., ). However, presence of outputs does not guarantee they have had their desired effect. The outcome explicitly considers the impact that the outputs have had on the state of the target species. Although a better measure of success, outcomes are rarely reported as they are often intangible and therefore difficult to measure, or occur over longer time-scales than standard funding or reporting cycles (Kapos et al., ).
Conservation interventions are preceded by conservation attention; i.e. attention must be drawn to the plight of a species, and plans and preparations made, before interventions can be carried out. Sitas et al. () proposed an index of conservation attention that uses a simple scale to rate the conservation attention being directed at a species, measured by the presence and quality of a species action plan. This first attempt to understand conservation efforts at a broad scale across whole sets of species highlighted the important idea of conservation attention. However, the presence of a plan does not guarantee its actions will be carried out or that those actions will lead to successful conservation (Kapos et al., ). The ultimate goal of conservation (recovered species or ecosystems) can take several decades to be achieved (Redford & Taber, ). It can be demotivating to work towards a goal unlikely to be achieved in an individual's career-span (Mace et al., ), and where projects mention an intended time-scale it is seldom longer than a few decades (Redford et al., ). The Nature Conservancy's Site Consolidation Scorecard addresses this somewhat by creating a framework to measure the achievements of a protected area against predetermined criteria that, once met, are assumed to equip the protected area with the capacity to successfully conserve the biodiversity within its borders (TNC, ; Leverington et al., ). However, this scorecard approach has not been used for species conservation. It would therefore be of great utility to have an index of the effectiveness of conservation attention that charts progress over time from effective conservation inputs, through to outputs and outcomes, and that can be used to monitor the overall impact of conservation actions for a species and provide an early warning system if conservation attention has stalled.
One reason why the effectiveness of conservation attention is particularly difficult to assess is that most information regarding species conservation is not published, or is not easily accessible, sometimes even to others working on the same species (Brooks et al., ). Where data are published, the significant time lag between data collection and publication can mean the situation for a species has already changed from that reported in a publication (Leverington et al., ). Experts in species conservation are an untapped resource of up-to-date species information (Hockings, ) and can often draw on their knowledge of published and unpublished data to give an opinion on expected future developments, something that cannot be achieved by a non-expert reviewing the literature (Johnson & Gillingham, ). However, there are uncertainties associated with expert elicitation that include but are not limited to () a tendency towards overconfidence in estimates (O'Neill et al., ), () anchoring estimates to a provided or preconceived value or range, and subsequent inability to adjust value adequately in relation to the anchor (McBride et al., ), () discrepancies with terminology (Johnson & Gillingham, ), () possession of information but an inability to express it, indicating poor survey design (Martin et al., ) or framing of information differently according to presentation (e.g. numbers vs percentages; McBride et al., ), () confirmation bias, in which answers are likely to be interpreted in the context of an expert's pre-existing beliefs (McBride et al., ), and () accessibility bias, whereby some pieces of information are more easily recalled than others (Martin et al., ).
The Evolutionarily Distinct and Globally Endangered (EDGE) index prioritizes species for conservation through a combination of the level of threat they are facing and the amount of evolutionary uniqueness that would be lost were the species to go extinct (Isaac et al., ). The EDGE programme at the Zoological Society of London (ZSL) focuses on conservation of the top  EDGE ranked mammals, amphibians, corals and, soon, birds (EDGE programme, a). Programme goals include raising awareness of the species concerned, and tracking the progress of their conservation (EDGE programme, b). ZSL proposes to track this progress using species report cards that summarize all aspects of the conservation of a species (Sinfield, ). These cards are based on the format pressure-state-response, used for understanding threats and their impacts (e.g. Zalidis et al., ; Mace & Baillie, ; Roura-Pascual et al., ).
Because the EDGE programme mobilizes a large pool of experts, and because it is focused on improving the conservation status of species as a whole, it is the ideal case study for exploring the potential for a species-based index of conservation attention that combines the simplicity of a species report card approach with information on the progress of conservation action within a species' range, enabling practitioners to track the achievement of milestones. Here we present a novel framework for assessing the effectiveness of the conservation attention directed at a species across the whole of its range. The framework, which considers all on-the-ground interventions relating to a species rather than focusing on a particular organization or project, links action at a local scale with change in the global status of a species. It is intended for use by species experts (such as members of the IUCN Species Survival Commission specialist groups) to monitor the effectiveness of conservation attention over time.
We developed a questionnaire based on this framework and, as a test case, used expert elicitation to assess the status of conservation attention for a set of EDGE species. We investigated the robustness of different scoring methods in producing consistent rankings of the effectiveness of conservation attention. We described the patterns of conservation attention obtained for our case study species and considered the information they may provide about the characteristics of conservation efforts for species assessed using the framework. We tested expectations about effectiveness of conservation attention for our case study species; for example, we expected that conservation attention would initially be highest at the input Stage (compared to output and outcome Stages) for those species that had only recently begun to receive conservation attention, and that over time the degree of attention at the output and outcome Stages would increase and approach that of inputs. We also considered future applications of the framework and its potential contribution to the field of conservation evaluation.

Methods
In November  ZSL held a workshop to gather ideas for developing species report cards, with a particular focus on EDGE species. Break-out groups discussed possible approaches for measuring the effectiveness of conservation attention at the level of a species, building on the Index of Conservation Attention proposed by Sitas et al. (). One outcome of this workshop was a draft framework split into input, output and outcome Stages and a set of simple indicators for each. Following the workshop we reviewed the literature to produce a list of  Factors that influence each Stage and that are deemed essential preconditions for effective species conservation ( Table ). A subset of eight Factors was then selected for use in the framework ( Table ) and potentially appropriate thresholds for categorical Levels of achievement for each Factor were devised.
The resulting draft framework was refined for some wellknown EDGE species (or species for which much of the knowledge is relatively easily accessible; e.g. the Hispaniolan solenodon Solenodon paradoxus), using a combination of literature searches and interviews with individual experts. The framework was transcribed into a questionnaire in the form of a spreadsheet with drop-down lists for each Stage and Factor.
One hundred and seventy-one experts in mammals and amphibians prioritized highly by the EDGE programme (a) were identified and sent a copy of the spreadsheetbased questionnaire, with an introductory e-mail. Experts were identified through the EDGE community network or through the IUCN Species Survival Commission Species Specialist Groups directory. Experts were asked to complete the questionnaire that, in addition to the components of the framework, asked for details of evidence supporting a respondent's choices and their degree of confidence that each answer they had given was correct (the potential answers for each question were very high, high, medium, low, don't know), and whether the population trend of the species was increasing, stable, decreasing or unknown. Respondents were also invited to provide feedback on the questionnaire's content and ease of use.
Forty-two completed questionnaires were received, representing  species (Supplementary Material ). Initially,  of the questionnaires were scored using six methods, to compare the sensitivity of the index of the effectiveness of conservation attention to different scoring strategies (Supplementary Table S, Table ). The scoring involves TABLE 1 Factors considered to be important for effective species conservation, compiled through a literature review following a workshop that looked at ways of evaluating effectiveness of conservation attention at the species level, with a selection of the literature where each is discussed. The first eight Factors (in italics) are addressed in the framework. Law and policy, and project management and leadership, are not included, as explained in the text.

Factor
Selected references   (Table ). Scope is the extent of a species' range across which a specified Level of a Factor is present. It is categorized as zero (% of the species' range), low (, % of the species' range), medium (-%) or high (. %). Initially, the second category of Scope was categorized as 'few scattered areas' but, following refinement of the framework for some well-known EDGE species during interviews with experts, we felt that is more clearly and consistently represented as , %. Each of the six methods gave different weights to the Scope and Level parts of each combination. The six methods provided generally similar results in terms of the patterns of scoring between and within species, with minor differences depending on the relative weights assigned to Scope and Level. Based on this sensitivity analysis, the method chosen to score all questionnaires (Table ) uses a simple ordinal scale, which minimizes assumptions inherent in the more complicated systems. A total index of effectiveness of conservation attention is produced for each species, as well as for the eight Factors (Table ) and three Stages (input, output and outcome). The scoring system is based on a simple ranking given to the combination of the Level and Scope of each Factor at each Stage. The ranking (-) ascribes greater importance to high Levels than to Scope; it focuses on the highest Level of a Factor attained for a species, rather than the Level obtained at the widest Scope (higher numbers indicate more effective conservation attention). As an example, achieving a medium (M) Level of a Factor across a low (L) Scope of a species' range would be coded ML, which is given a score of . Once each Factor is scored at each Stage, all scores can be added to provide an overall index of the effectiveness of conservation attention for a species (out of a possible ) or scores can be amalgamated according to Stage or Factor (out of maximum scores of  and , respectively).

Engaging stakeholders
Patterns in these scores were investigated and preliminary suggestions made on the potential causes of deviations from expected results. Where more than one person completed a questionnaire for a species, the scores were considered separately, and comparisons made between the scores produced by each assessor.

The framework
The framework (Table ) of eight Factors (Table ) essential for species conservation unites information on local-scale projects into a global picture of the effectiveness of conservation attention across a species as a whole. Two of the  Factors were not included in the framework: project management and leadership, and law and policy. With respect to project management and leadership, management and business appraisal processes are likely to remain confidential within an organization. Therefore a species expert may be able to report on leadership and project management for their own organization but unable to provide this information for other organizations. Law and policies vary widely in their scope, power, implementation and effectiveness across countries and species, and so the creation of categories to adequately describe different levels of law and policy across the whole range of a species is impractical.
The Levels are described in a manner appropriate to each Factor and such that they form a nested hierarchy. This enables the tracking of change over time but involves judgements about the value of different types and degree of conservation attention. The framework's hierarchy is based on the literature and our personal experiences. It needs to be tested for a range of species to assess its robustness.
Changes in the conservation attention to a species over time, as measured by the framework, can be monitored through the use of scores. Final scores are obtained by dividing the score obtained from the questionnaire by the maximum possible score (, or, where expressed, the maximum score for the Stage, ) and then categorizing the value TABLE 3 System used to score completed questionnaires. Each component combines the Level and Scope (see text for definitions) of a Factor (Tables -) into a rank-based score, and these scores are combined to provide totals for each Factor (maximum score = ) and Stage (maximum score = ) and an overall score for effectiveness of conservation attention (maximum score = ). These scores are converted to a -point scale for analysis. A score of zero is given if either the Level or Scope, or both, is equal to zero; i.e. none/unknown. obtained into a -point scale. Supplementary Table S gives a worked example of the framework applied to a species, including the scores attained.

Expert feedback on the framework
Feedback was generally positive; respondents were keen to see the development of the framework and its associated research and felt it was a 'good initiative to assess the conservation level in [a] short time', although another felt 'these kinds of questionnaires. . . force simplification and superficiality'. Some minor alterations were suggested, which will be implemented during full roll-out to the EDGE programme. There remain discrepancies between respondents' interpretations of the language used, which are hard to avoid in exercises such as this (Johnson & Gillingham, ). A glossary of key terms used in the framework will therefore accompany the questionnaire. Three respondents felt the framework could not be applied adequately to their species; these respondents returned completed questionnaires but with the caveat that they had struggled to fit the categories to their species. For some species the hierarchical order within a certain Factor may not be a suitable demonstration of the levels aspired to for better conservation. For example, for the Chinese giant salamander Andrias davidianus, local governments are the most important stakeholders to engage (Level Medium, Table ), whereas local people (Level High) have little control over threats to, and conservation of, the species (H. Meredith, pers. comm.).
Thirty of  respondents ( of  questionnaires; some respondents completed questionnaires for more than one species) rated their confidence in the answers they gave. Confidence was highest for inputs, and gradually decreased through to outcomes; the difference is statistically significant (Wilcoxon paired signed rank test, V = ,., P , .). This expected pattern was most noticeable in the decrease in confidence in the very high category and an increase in responses in the medium category.
Four species were assessed by more than one person (three by two people, and one by three people). Scores never fully agreed, but for three of the four species the total scores from each assessment were within two points of each other. The patterns of scores for each stage were conserved even where scores themselves were different, except in the case of the black rhino Diceros bicornis, which is discussed below.
For species for which multiple questionnaires were received, the biggest discrepancies in scores related to the existence or not of an officially recognized species action plan. However, the management programme category had the most very high confidence responses over all three Stages combined. Existence of an officially recognized action plan should be something of which a respondent could be highly confident.

Patterns in the effectiveness of conservation attention
The framework is not intended to identify causal mechanisms behind conservation outcomes; rather, the scoring pattern for a species over time and across categories may act as an indicator prompting investigation of underlying processes. Given the small sample size, it was not possible to identify consistencies in the patterns of effectiveness of conservation attention for the species used to test the framework. However it is valuable to consider the patterns that were obtained, to develop hypotheses that could be tested more fully as the framework is implemented. One expectation was that the first Stage (inputs) would score most highly, followed by outputs then outcomes. As time progresses, the initiation of various inputs should lead to increased outputs, which will then result in increased outcomes. In our sample, this pattern was evident for  of the  species, including the Asian elephant Elephas maximus (Fig. ). Deviations from the expected scoring pattern may provide a signature suggestive of a certain situation. For example, the score pattern for the red slender loris Loris tardigradus was -- (inputs-outputs-outcomes). The Malabar civet Viverra civettina had a similar pattern (--). The taxonomy of this species is under question yet, although there are no species-specific actions in place, it may nonetheless be benefiting from the wider impacts of interventions targeted at its broader ecosystem (W. Duckworth, pers. comm.). Assessments of other species known to be in a similar situation to the civet will provide evidence of whether this signature is a reliable indicator of non-target species benefiting from wider conservation interventions.
A further example is the Kenyan subspecies of black rhinoceros Diceros bicornis michaeli (--). It is likely that the Kenyan subspecies is benefiting from general aspects of rhinoceros conservation programmes and projects, which may not be recorded within the activities for the Kenyan subspecies itself, but without which its conservation would be less comprehensive. The Peruvian yellow-tailed woolly monkey Oreonax flavicauda had an unusual scoring pattern (--). No supporting evidence was provided with the completed questionnaire, and without further knowledge of the conservation of this species it is difficult to speculate on the cause of this signature.

Differences between highest scoring Factors
Another pattern investigated was which Factors scored most highly for each species. It may be expected that, of the eight Factors, engaging with stakeholders is the first step in the process of conservation. Those who wish to work towards the conservation of a species will seek the backing of governments, the public, other organizations, and scientists, and their cooperation would be required for the ultimate success of many of the other Factors. Thus it could be expected that conservationists may focus on engaging with stakeholders early on, and thus that the conservation attention scores will be higher for this Factor than for others. Engaging stakeholders was the highest scoring Factor for . % of questionnaires returned.
However, when split by taxon (Fig. ), another pattern emerges. The distribution of the highest scoring Factor for mammal species (which comprise the majority of the species assessed) mirrors the patterns displayed for all species together. For amphibians, the highest scoring Factor was more often addressing threats than engaging with stakeholders. Patterns among the other Factors are similar to those for mammals. This pattern may be a result of the increasing number of threat assessments that are being undertaken to identify the spread of the infectious disease-causing fungus Batrachochytrium dendrobatidis.

Discussion
It is traditionally difficult to evaluate conservation effectiveness because programmes have varying, often subjective goals (Patton, ). The framework developed and tested here helps to address this limitation by operating at the level of species conservation, an aim to which all projects and interventions for a particular species are ultimately committed. Although intermediate goals of conservation success may vary (Howe & Milner-Gulland, ), the goal of persistence or recovery is a commonality for species conservation interventions (Redford et al., ).
Conservation attention provides an intermediate target against which effectiveness can be measured (Kleiman et al., ), which is achievable over a much shorter timescale than eventual population recovery. The broadness of the framework should allow its application to most species, most of the time. Given that EDGE species are broad taxonomically, geographically and in terms of conservation attention, they provided a relatively robust test of the general applicability of the framework. An indicator can never be perfect but must be a compromise between conflicting priorities (Jones et al., ). As with the IUCN Red List, the general potential of this broad approach outweighs the limitations presented by those species for which the framework may not be applicable (Mace et al., ). For those species, such as the Chinese giant salamander, where one category is not suitable, the rest of the framework can be used and a note made of the inapplicable aspects. The recognition that a species does not fit within the categories of the framework may be an important insight, identifying ways in which steps taken for the species' conservation should differ from the norm. As the number of examples increases, the framework may also provide insights into country-level or taxonomic patterns in deviations from the hierarchy, which may be useful for conservation planning; for example, assessing other Chinese species may corroborate experience with the giant salamander that stakeholder engagement is better focused on the State than local level.
By focusing on the highest Level of a given Factor attained for a species and the area covered at this Level, rather than on the Level found across the geographical range, assessing conservation of a species against the framework FIG. 2 The number of times each of the eight Factors was the highest scoring Factor (Tables -) for each questionnaire (for mammals n = , for amphibians n = ; note this is higher than the total number of questionnaires completed as sometimes two or more Factors were equally the highest scoring). highlights current best achievements for a species, which can in turn demonstrate what is possible within the range of a species. This may motivate those in less highly scoring areas to investigate current effective practice and try to replicate it (Sutherland & Peel, ).
When assigning numerical values to qualitative data, inferences drawn from scores should be the same regardless of the scale applied (Wolman, ). The ranking system used to score this framework is the most parsimonious of those proposed (Supplementary Table S). For the subset of species on which all scoring systems were trialled, all systems displayed similar patterns in scoring between Stages and Factors, demonstrating that the conclusions drawn are not numerical artefacts of the system (Wolman, ) but reveal information about the relationships between components of conservation attention. Requiring quantitative measures of uncertainty during expert elicitation can discourage completion (Martin et al., ). The qualitative confidence categories provided in this study did not appear to deter participants (% of respondents completed these information fields). One would expect confidence in inputs to be highest and outcomes to be lowest, as inputs are tangible, easily quantifiable, and less open to varying interpretations than outcomes (Mace et al., ; Kapos et al., ), and this is what we found. As the effectiveness of conservation attention scores is tracked over time, there may be instances where the scores do not increase (i.e. selected categories do not change in subsequent assessments) but confidence in the answer given improves, for example where more information becomes available for a previously understudied species (such as many of the EDGE species). For future integration of scores between assessors we recommend applying a similar method to that of McBride et al. (), who shared completed forms amongst the experts, facilitated discussion of any differences and then asked experts to complete a second form taking into account the products of their discussion.
The non-specificity afforded by the lack of disaggregation of interventions by project or organization may foster honesty and candidness, removing the pressure put on practitioners to report successes rather than failures in order to appease donors (Redford & Taber, ). This also avoids the vested interest a project leader may hold in demonstrating the success of a project, potentially biasing their evaluations (Brooks et al., ). Obviously these advantages do not hold where conservation efforts are sufficiently limited that the activities of one project encompass all interventions underway for a species. An advantage in using the framework to assess the effectiveness of conservation attention is that it facilitates the compiling of key information about current species conservation. Our finding that experts may have high confidence that may be misplaced (e.g. in the presence of an official action plan) suggests that for some species a common understanding of the attention that the species is receiving is currently missing. The information contained within a completed questionnaire, particularly references to documentation provided as supporting evidence, can be an important centralized source for other species experts, which may avoid duplication of interventions and research. A number of respondents provided substantial additional information that would be of general use to species conservationists.
The framework can be used as part of a complementary toolkit for measuring and improving species conservation, by linking global and local scales of action and impact (Saterson et al., ) and tracking this over time. Although project-based, organizational or geographical evaluations are vital, evaluating at one scale may miss processes acting at another (Cundill & Fabricius, ). If coupled with in-depth analysis of Factors of particular interest in the conservation of a species (Redford et al., ), this framework could represent an important step in the development of methods to consider the effectiveness of conservation attention as it affects a species across its global range. It can also provide a simple and visually appealing method of tracking conservation progress over time, enabling investigations of the causes of stalled activities. This is information conservationists need to support assessments of the impact their efforts are having on stemming biodiversity loss.