What role should randomized control trials play in providing the evidence base for conservation?

Abstract The effectiveness of many widely used conservation interventions is poorly understood because of a lack of high-quality impact evaluations. Randomized control trials (RCTs), in which experimental units are randomly allocated to treatment or control groups, offer an intuitive way to calculate the impact of an intervention by establishing a reliable counterfactual scenario. As many conservation interventions depend on changing people's behaviour, conservation impact evaluation can learn a great deal from RCTs in fields such as development economics, where RCTs have become widely used but are controversial. We build on relevant literature from other fields to discuss how RCTs, despite their potential, are just one of a number of ways to evaluate impact, are not feasible in all circumstances, and how factors such as spillover between units and behavioural effects must be considered in their design. We offer guidance and a set of criteria for deciding when RCTs may be an appropriate approach for evaluating conservation interventions, and factors to consider to ensure an RCT is of high quality. We illustrate this with examples from one of the few concluded RCTs of a large-scale conservation intervention: an incentive-based conservation programme in the Bolivian Andes. We argue that conservation should aim to avoid a rerun of the polarized debate surrounding the use of RCTs in other fields. Randomized control trials will not be feasible or appropriate in many circumstances, but if used carefully they can be useful and could become a more widely used tool for the evaluation of conservation impact.


Introduction
I t is widely recognized that conservation decisions should be informed by evidence (Pullin et al., ; Segan et al., ). Despite this, decisions often remain only weakly informed by the evidence base (e.g. Sutherland & Wordley, ). Although this is at least partly a result of continuing lack of access to evidence (Rafidimanantsoa et al., ), complacency surrounding ineffective interventions (Pressey et al., ; Sutherland & Wordley, ), and perceived irrelevance of research to decision-making (Rafidimanantsoa et al., ; Rose et al., ), there are limitations in the evidence available on the likely impacts of conservation interventions (Ferraro & Pattanayak, ; McIntosh et al., ). This has resulted in a growing interest in conservation impact evaluation (Ferraro & Hanauer, ; Baylis et al., ; Börner et al., ; Pressey et al., ), and to the creation of initiatives to facilitate access to and systematize the existing evidence, such as The Collaboration for Environmental Evidence () and Conservation Evidence (). Impact evaluation, described by the World Bank as assessment of changes in outcomes of interest attributable to specific interventions (Independent Evaluation Group, ), requires a counterfactual: an understanding of what would have occurred without that intervention (Miteva et al., ; Ferraro & Hanauer, ; Baylis et al., ; Pressey et al., ). It is well recognized that simple before-and-after comparison of units exposed to an intervention is flawed, as factors other than the intervention may have caused change in the outcomes of interest (Ferraro & Hanauer, ; Baylis et al., ). Simply comparing groups exposed and not exposed to an intervention is also flawed as the groups may differ in other ways that affect the outcome.
One solution is to replace post-project monitoring with more robust quasi-experiments, in which a variety of approaches may be used to construct a counterfactual scenario statistically (Glennerster & Takavarasha, ; Butsic et al., ). For example, matching involves comparing outcomes in units where an intervention is implemented with outcomes in similar units (identified statistically) that lack the intervention. This is increasingly used for conservation impact evaluations, such as determining the impact of establishment of a national park (Andam et al., ) or Community Forest Management (Rasolofoson et al., ) on deforestation. Quasi-experiments have a major role to play in conservation impact evaluation, and in some situations they will be the only robust option available to evaluators (Baylis et al., ; Butsic et al., ). However, because the intervention is not allocated at random, unknown differences between treatment and control groups may bias the results (Michalopoulos et al., ; Glennerster & Takavarasha, ). historically this problem led many in development economics to question the usefulness of such quasi-experiments (Angrist & Pischke, ). Each kind of quasi-experiment has associated assumptions that, if not met, affect the validity of the evaluation result (Glennerster & Takavarasha, ).
Randomized control trials (RCTs; also known as randomized controlled trials) offer an outwardly straightforward solution to the limitations of other approaches to impact evaluation. By randomly allocating from the population of interest those units that will receive a particular intervention (the treatment group), and those that will not (the control group), there should be no systematic differences between groups (White, a). Evaluators can therefore assume that in the absence of the intervention the outcomes of interest would have changed in the same way in the two groups, making the control group a valid counterfactual.
This relative simplicity of RCTs, especially when compared with the statistical black box of quasi-experiments, may make them more persuasive to sceptical audiences than other impact evaluation methods (Banerjee et al., ; Deaton & Cartwright, ). They are also, in theory, substantially less dependent than quasi-experiments on any theoretical understanding of how the intervention may or may not work (Glennerster & Takavarasha, ). Randomized control trials are central to the paradigm of evidence-based medicine, and since the s tens of thousands of RCTs have been conducted, with them often considered the gold standard for testing the efficacy of treatments (Barton, ). They are also widely used in agriculture, education, social policy (Bloom, ), labour economics (List & Rasul, ) and increasingly in development economics (Ravallion, ; Banerjee et al., ; Deaton & Cartwright, ; Leigh, ). The governments of both the UK and the USA have strongly supported the use of RCTs in evaluating policy effectiveness (Haynes et al., ; Council of Economic Advisers, ). The U.S. Agency for International Development explicitly states that experimental impact evaluation provides the strongest evidence, and alternative methods should be used only when random assignment is not feasible (USAID, ).
However there are both philosophical (Cartwright, ) and practical (Deaton, ; Deaton & Cartwright, ) critiques of RCTs. The statistical basis of randomized analyses is also not necessarily simple. Randomization can only be guaranteed to lead to complete balance between treatment and control groups with extremely large samples (Bloom, ), although baseline data collection and stratification can greatly reduce the probability of unbalanced groups, and remaining differences can be resolved through inclusion of covariates in analyses (Glennerster & Takavarasha, ). Evaluators also often calculate both the mean effect on units in the treatment group as a whole (the intention to treat) and the effect of the actual intervention on a treated unit (the treatment on the treated). These approaches will often give different results as there is commonly imperfect uptake of an intervention (a drug may not be taken correctly by all individuals in a treatment group, for example).
Regardless of the polarized debate that the spread of RCTs in development economics has caused (Ravallion, ; Deaton & Cartwright, ), some development RCTs have acted as a catalyst for the widespread implementation of trialled interventions (Leigh, ). There are increasing calls for more use of RCTs in evaluating environmental interventions (Pattanayak, ; Miteva et al., ; Ferraro & Hanauer, ; Samii et al., ; Baylis et al., ; Börner et al., , ; Curzon & Kontoleon, ). As many kinds of conservation programmes aim to deliver environmental improvements through changing human behaviour (e.g. agri-environment schemes, provision of alternative livelihoods, protected area establishment, payments for ecosystem services, REDD+ programmes, and certification programmes; we term these socio-ecological interventions), there are lessons to be learnt from RCTs in development economics, which aim to achieve development outcomes through changing behaviour.
A few pioneering RCTs of such socio-ecological interventions have recently been concluded (although these may not be fully exhaustive), evaluating: an incentive-based conservation programme in Bolivia known as Watershared, described here; a payment programme for forest carbon in Uganda (Jayachandran et al., ); unconditional cash transfers in support of conservation in Sierra Leone (Kontoleon et al., ); and a programme to reduce wild meat consumption in the Brazilian Amazon through social marketing and incentivising consumption of chicken (Chaves et al., ). We expect that evaluation with RCTs will become more widespread in conservation.
Here we draw on a range of literature to examine the potential of RCTs for impact evaluation in the context of conservation. We discuss the factors influencing the usefulness, feasibility and quality of RCT evaluation of conservation and aim to provide insights and guidance for researchers and practitioners interested in conducting high-quality evaluations. The structure of the text is mirrored by a checklist (Fig. ) that can be used to assess the suitability of an RCT in a given context. We illustrate these points with the RCT evaluating the Watershared incentive-based conservation programme in the Bolivian Andes. This programme, implemented by the NGO Fundación Natura Bolivia (Natura), aims to reduce deforestation, conserve biodiversity, and provide socioeconomic and water quality benefits to local communities (Bottazzi et  Under what circumstances could an RCT evaluation be useful?

When quantitative evaluation of an intervention's impact is required
Randomized control trials are a quantitative approach allowing the magnitude of the effect of an intervention on outcomes of interest to be estimated. Qualitative approaches based on causal chains or the theory of change may be more suitable where such quantitative estimates are not needed or where the intervention can only be implemented in a few units (e.g. White & Phillips, ), or when the focus is on understanding the pathways of change from intervention through to outcome (Cartwright, ). Some have argued that such mechanistic understanding is more valuable FIG. 1 Summary of suggested decision-making process to help decide whether a randomized control trial (RCT) evaluation of a conservation intervention would be useful, feasible and of high quality. Items in the right-hand column without a box represent end-states of the decisionmaking process (i.e. an RCT is probably not appropriate and the researcher should consider using an alternative evaluation method). than estimates of effect sizes for practitioners and policymakers (Cartwright, ; Miteva et al., ; Deaton & Cartwright, ). To put this another way, RCTs can indicate whether an intervention works and to what extent, but policy makers often also wish to know why it works, to allow prediction of project success in other contexts.
This issue of external validity (the extent to which knowledge obtained from an RCT can be generalized to other contexts) is a major focus of the controversy surrounding use of RCTs in development economics (e.g. Cartwright, ; Deaton, ). Advocates for RCTs accept such critiques as partially valid (e.g. White, a) and acknowledge that RCTs should be considered to provide knowledge that is complementary to, not incompatible with, other approaches. Firstly, qualitative studies can be conducted alongside an RCT to examine processes of change; most evaluators who advocate RCTs also recognize that combining quantitative and qualitative approaches is likely to be most informative (e.g. White, b). Secondly, researchers can use covariates to explore which contextual features affect outcomes of interest, to look for those features in future implementation of the intervention (although to avoid data dredging, hypotheses and analysis plans should ideally be pre-registered). Statistical methods can also be used to explore heterogeneous responses within treatment groups in an RCT (Glennerster & Takavarasha, ), and RCTs may be designed to answer more complex contextual questions through trials with multiple treatment groups or other modifications (Bonell et al., ). Thirdly, evaluators may conduct RCTs of the same kind of intervention in different socio-ecological contexts (White, a), which increases the generalizability of results. Although this is challenging because of the spatial and temporal scale of RCTs used to evaluate socio-ecological interventions, researchers have undertaken a number of RCTs of incentive-based conservation programmes (Kontoleon et al., ; Jayachandran et al., ; Pynegar et al., ). Finally, the question of whether learning obtained in one location or context can be applicable to another is an epistemological question common to much applied research and is not limited to RCTs (Glennerster & Takavarasha, ).
In the RCT used to evaluate the Bolivian Watershared programme, the external validity issue has been addressed as a key concern. Similar socio-ecological systems exist throughout Latin America and incentive-based forest conservation projects have been widely implemented (Asquith, ). Natura is currently undertaking two complementary RCTs of the intervention in other parts of Bolivia. Researchers used a combination of both qualitative and quantitative methods at the end of the evaluation period to understand in more depth participant motivation and processes of change within treatment communities (Bottazzi et al., ) and to compare outcomes in control and treatment communities (Pynegar et al., ; Wiik et al., ).

When the intervention is reasonably well developed
Impact evaluation is a form of summative evaluation, meaning that it involves measuring outcomes of an established intervention. This can be contrasted with formative evaluation, which progressively develops and improves the design of an intervention. Many evaluation theorists recommend a cycle of formative and summative evaluation, by which interventions may progressively be understood, refined and evaluated (Rossi et al., ), which is similar to the thinking behind adaptive management (McCarthy & Possingham, ; Gillson et al., ). Summative evaluation alone is inflexible because once begun, aspects of the intervention cannot sensibly be changed (at least not without losing external validity). The substantial investment of time and resources in an RCT is therefore likely to be most appropriate when implementers are confident they have an intervention whose functioning is reasonably well understood (Pattanayak, ; Cartwright, ).
Natura has been undertaking incentive-based forest conservation in the Bolivian Andes since . Learning from these experiences was integrated into the design of the Watershared intervention as evaluated by the RCT that began in . However, despite this substantial experience developing the intervention, there were challenges with its implementation in the context of the RCT, which in retrospect affected both the programme's effectiveness and the evaluation's usefulness. For example, uptake of the agreements was low (Wiik et al., ), and little of the most important land from a water quality perspective was enrolled in Watershared agreements. Given this low uptake, the lack of an observed effect of the programme on water quality at the landscape scale could have been predicted without the RCT (Pynegar et al., ). Further formative evaluation of uptake rates and likely spatial patterns of implementation before the RCT was implemented would have been valuable.
What affects the feasibility of RCT evaluation?

Ethical challenges
Randomization involves withholding the intervention from the control group, so the decision to randomize is not a morally neutral one. An ethical principle in medical RCTs is that to justify a randomized experiment there must be significant uncertainty surrounding whether the treatment is better than the control (a principle known as equipoise; Brody, ). Experiments such as randomly allocating areas to be deforested or not to investigate ecological impacts would clearly not be ethical, which is why the Stability of Altered Forest Ecosystems project, for example, made use of already planned deforestation (Ewers et al., ). However the mechanisms through which many conservation interventions, especially socio-ecological interventions, are intended to result in change are often complex and poorly understood, meaning that in such RCTs there will often be uncertainty about whether the treatment is better. Additionally, it is debatable whether obtaining equipoise should even always be an obligation for evaluators (e.g. Brody, ), as it is also important to know for policymakers how well an intervention works and how cost-effective it is (White, a). It may be argued that lack of availability of high-quality evidence leading to resources being wasted on ineffective interventions is also unethical (List & Rasul, ). Decisions such as these are not solely for researchers to make and must be handled sensitively (White, a).
Another principle of research ethics is that no one should be a participant in an experiment without giving their free, prior and informed consent. Depending on the scale at which the intervention is implemented, it may not be possible to obtain consent from every individual in an area. This could be overcome by randomizing by community rather than individual and then giving individuals in the treatment community the opportunity to opt into the intervention. This shows how implementers can think flexibly to overcome ethical challenges.
In Bolivia, the complex nature of the socio-ecological system, and the initial relative lack of understanding of the ways in which the intervention could affect it, meant there was genuine uncertainty about Watershared's effectiveness. However, had monitoring shown immediate significant improvements in water quality in treatment communities, Natura would have stopped the RCT and implemented the intervention in all communities. Consent was granted by mayors for the randomization and individual landowners could choose to sign an agreement or not. Although this was both more ethically acceptable and in reality the only way to implement Watershared agreements in this socioecological context, it led to variable (and sometimes low) uptake of the intervention, hampering the subsequent evaluation (Wiik et al., ).

Spatial and temporal scale
Larger numbers of randomization units in an RCT allow detection of smaller significant effect sizes (Bloom, ). This is easily achievable in small-scale experiments, such as those studying the effects of nest boxes on bird abundance or of wildflower verges on invertebrate biodiversity; such trials are a mainstay of applied ecology. However, increases in the scale of the intervention will make RCT implementation more challenging. Interventions implemented at a large scale will probably have few randomization units available for an RCT, increasing the effect size required for a result to be statistically significant, and decreasing the experiment's power (Bloom, ; Glennerster & Takavarasha, ). Large randomization units are also likely to increase costs and logistical difficulties. However, this does not make such evaluations impossible; two recent RCTs of a purely ecological intervention (impact of use of neonicotinoid-free seed on bee populations) were conducted across a number of sites throughout northern and central Europe (Rundlöf et al., ; Woodcock et al., ). When the number of units available is low, however, RCTs will not be appropriate and evaluations based upon analysing expected theories of change may be more advisable (e.g. White & Phillips, ). Such theory-based evaluations allow attribution of changes in outcomes of interest to particular interventions, but do not allow estimation of treatment effect sizes.
For some conservation interventions, measurable changes in outcomes may take years or even decades because of long species life cycles or the slow and stochastic nature of ecosystem changes. It is unlikely to be realistic to set up and monitor RCTs over such timescales. In these cases, RCTs are likely to be an inappropriate means of impact evaluation, and the best option for evaluators probably consists of a quasi-experiment taking advantage of a historically implemented example of the intervention.
In the Bolivian case, an RCT of the Watershared intervention was ambitious but feasible ( communities as randomization units, each consisting of - households). Following baseline data collection in , the intervention was first offered in  and endline data was collected in -. Effects on water quality were expected to be observable over this timescale as cattle exclusion can result in decreases in waterborne bacterial concentration in ,  year (Meals et al., ). However, there was no impact of the intervention on water quality at the landscape scale (Pynegar et al., ), potentially because of time lags; nor did the programme significantly reduce deforestation rates (Wiik et al., ). A potential explanation is that impacts may take longer to materialize as they could depend on the development of alternative livelihoods introduced as part of the programme.

Available resources
Randomized control trials require substantial human, financial and organizational resources for their design, implementation, monitoring and evaluation. These resources are above the additional cost of monitoring in control units, because design, planning, and subsequent analysis and interpretation require substantial effort and knowledge. USAID advises that a minimum of % of a project or programme's budget be allocated to external evaluation (USAID, ), and the World Health Organization recommends -% (WHO, ). The UN's Evaluation Group has noted that the sums allocated within the UN in the past cannot achieve robust impact evaluations without major uncounted external contributions (UNEG Impact Evaluation Task Force, ). As conservation practitioners are already aware, conducting a high-quality RCT is expensive (Curzon & Kontoleon, ).
Collaborations between researchers (with independent funding) and practitioners (with a part of their programme budget) can be an effective way for high-quality impact evaluation to be conducted. This was the case with the evaluation of Watershared: Natura had funding for implementation of the intervention from development and conservation organizations, and the additional costs of the RCT were covered by separate research grants. Additionally, there are a number of organizations whose goals include conducting and funding high-quality impact evaluations (including RCTs), such as Innovations for Poverty Action (), the Abdul Latif Jameel Poverty Action Lab () and the International Initiative for Impact Evaluation ().

What factors affect the quality of an RCT evaluation?
Potential for spillover, and how selection of randomization unit may affect this Evaluators must decide upon the unit at which allocation of the intervention is to occur. In medicine the unit is normally the individual; in development economics units may be individuals, households, schools, communities or other groups; in conservation they could also potentially include fields, farms, habitat patches, protected areas, or other units. Units selected should correspond to the process of change by which the intervention is understood to lead to the desired outcome (Glennerster & Takavarasha, ).
In conservation RCTs, surrounding context will often be critical to the functioning of interventions. Outcomes may spill over, with changes achieved by the intervention in treatment units affecting outcomes of interest in control units (Glennerster & Takavarasha, ; Baylis et al., ), at least in cases where the randomization unit is not closed or somehow bounded in a way that prevents this from happening. For example, an RCT evaluating a successful community-based anti-poaching programme would suffer from spillover if population increases in the treatment community-associated areas resulted in these acting as a source of individuals for control areas. Spillover thus reduces an intervention's apparent effect size. If an intervention was to be implemented in all areas rather than solely in treatment areas (presumably the ultimate goal for practitioners), such spillover would not occur, and so it is a property of the trial itself. Such spillover affected one of the few large-scale environmental management RCTs: evaluation of badger culling in south-west England (Donnelly et al., ).
Spillover is particularly likely if the randomization unit and the natural unit of the intended ecological process of change are incongruent, meaning the intervention would inevitably be implemented in areas that would affect outcomes in control units. Therefore, consideration of spatial relationships between units, and of the relationship between randomization units and the outcomes' process of change, is critical. For example the anti-poaching programme described above could instead use closed groups or populations of the target species as the randomization unit, with the programme then implemented in communities covering the range of each treatment group. Spillover may also be reduced by selecting indicators (and/or sites to monitor) that would still be relevant but would be unlikely to suffer from it (i.e. more bounded units or monitoring sites, such as by choosing a species to monitor that has a small range or ensuring that a control area's monitoring site is not directly downstream of that of a treatment area in an RCT of a payments for watershed services programme).
In the RCT of Watershared, it proved difficult to select a randomization unit that was politically feasible and worked for all outcomes of interest. Natura used community as the randomization unit, so community boundaries had to be defined but these did not always align well with the watersheds supplying the communities' water sources. Although few water quality monitoring sites were directly downstream of another, land under agreements in one community were in some cases in the watershed upstream of the monitoring site of another, risking spillover. The extent to which this took place, and its consequences, were studied empirically (Pynegar, ). However, the randomization unit worked well for the deforestation analysis. Communities have definable boundaries (although see Wiik et al., ) and offering the programme by community was most practical logistically. A smaller unit would have presented issues of perceived fairness as it would have been difficult to offer Watershared agreements to some members of communities and not to others. The RCT of Jayachandran et al. () also selected community as the randomization unit.

Consequences of human behavioural effects on evaluation of socio-ecological interventions
There is a key difference between ecological interventions that aim to have a direct impact on an ecosystem, and socioecological interventions that seek to deliver ecosystem changes by changing human behaviour. Medical RCTs are generally double-blinded so neither the researcher nor the participants know who has been assigned to the treatment or control group. Double-blinding is possible for some ecological interventions such as pesticide impacts on nontarget invertebrate diversity in an agroecosystem: implementers do not have to know whether they are applying the pesticide or a control (Rundlöf et al., ). However, it is harder to carry out double-blind trials of socioecological interventions, as the intervention's consequences can be observed by the evaluators (even if they are not the people actually implementing it) and participants will obviously know whether they are being offered the intervention.
Lack of blinding creates potential problems. Participants in control communities may observe activities in nearby treatment communities and implement aspects of them on their own, reducing the measured impact of the intervention. Alternatively, they may feel resentful at being excluded from a beneficial intervention and therefore reduce existing pro-conservation behaviours (Alpízar et al., ). It may be possible to reduce or eliminate such phenomena by selecting units whose individuals infrequently interact with each other. Evaluators of Watershared believed that members of control communities could decide to protect watercourses themselves after seeing successful results elsewhere (which would be encouraging for the NGO, suggesting local support for the intervention, but that would interfere with the evaluation by reducing the estimated intervention effect size). They therefore included questions in endline socioeconomic surveys to identify this effect; these revealed only one case in . , household surveys (Pynegar, ).
The second issue with lack of blinding is that randomization is intended to ensure that treatment and control groups are not systematically different immediately after randomization. However, those allocated to control or treatment may have different expectations or show different behaviour or effort simply as a consequence of the awareness of being allocated to a control or treatment group (Chassang et al., ). Hence the outcome observed may not depend solely on the efficacy of the intervention; some authors have claimed that these effects may be large (Bulte et al., ).
Overlapping terms have been introduced into the literature to describe the ways in which actions of participants in experiments vary as a result of differences in effort between treatment and control groups (summarized in Table ). We do not believe that behavioural effects inevitably invalidate RCT evaluation, as some have claimed (Scriven, ), as part of any intervention's impact when implemented will be because of effort expended by the implementers (Chassang et al., ). It also remains unclear whether behavioural effects are large enough to result in incorrect inference (Bulte et al., ; Bausell, ). In the case of the evaluation of Watershared, compliance monitoring is an integral part of incentive-based or conditional conservation, so any behavioural effect driven by increased monitoring should be thought of as an effect of the intervention rather than a confounding influence. Such effects may also be reduced through low-impact monitoring (Glennerster & Takavarasha, ). Water quality measurement was unobtrusive (few community members were aware of Natura technicians being present) and infrequent (annual or biennial); deforestation monitoring was even less obtrusive as it was based upon satellite imagery; and socio-economic surveys were undertaken equally in treatment and control communities.  Babad et al., 1982). Treatment-group interviewees also tend to give answers they believe evaluators wish to hear (experimenter demand; Levitt & List, 2011) Increases None/decreases Increases Rational effort Experimental participants decide how much effort to expend on implementing an intervention based upon their own expectations of the intervention's effectiveness; this closely parallels the Galatea effect (Babad et al., 1982) Increases None/decreases Increases John Henry Individuals in the control group increase effort in an attempt to compete with the intervention group (Saretsky, 1972; see also Bausell, 2015) None

None/increases None/decreases
Randomized control trials

Conclusions
Scientific evidence supporting the use of an intervention does not necessarily lead to the uptake of that intervention. Policy is at best evidence-informed rather than evidencebased (Adams & Sandbrook, ; Rose et al., ) because cost and political acceptability inevitably influence decisions, and frameworks to integrate evidence into decisionmaking are often lacking (Segan et al., ). However, improving available knowledge of intervention effectiveness is nevertheless important. For example, conservation managers are more likely to report an intention to change their management strategies when presented with high-quality evidence (Walsh et al., ). Conservation science therefore needs to use the best possible approaches for evaluation of interventions.
As with any evaluation method, RCTs are clearly not suitable in all circumstances. Large-scale RCTs are unlikely to be a worthwhile approach to impact evaluation unless the intervention to be evaluated is well understood, either from theory or previous formative evaluation. Even when feasible and potentially useful, RCTs must be designed with great care to avoid spillover and behavioural effects. There will also inevitably remain some level of subjectivity as to whether findings from an RCT are applicable with confidence to a different location or context. However, RCTs can be used to establish a reliable and intuitively plausible counterfactual and therefore provide a robust estimate of intervention effectiveness, and hence cost-effectiveness. It is therefore unsurprising that interest in their use is increasing within the conservation community. We hope that those interested in evaluating the impact of conservation interventions can learn from the use of RCTs in other fields but avoid the polarization and controversy surrounding them. Randomized control trials could then make a substantial contribution towards the evaluation of conservation impact.