The year 2018 marks nine years since Cass Sunstein entered the US federal government as Administrator of the White House Office of Information and Regulatory Affairs (OIRA) and eight years since the creation of our employers, the UK Behavioural Insights Team (BIT). This period has seen many other attempts to apply behavioural science to government; various others, notably Sunstein (Reference Sunstein2014), Thaler (Reference Thaler2015) and Halpern (Reference Halpern2015), document and reflect on these developments. We do not attempt to cover the same ground here.
Instead, we first briefly summarise BIT's origins, before reviewing and assessing the current state of behavioural science in policy. This is followed by a more in-depth discussion of areas that have not been central to the application of behavioural science to policy so far. In our view, these represent either the most substantive critiques or the most interesting avenues for future investigation. We conclude by suggesting what changes would be needed to ensure these areas are explored in the future.
The origins of the Behavioural Insights Team
We start close to home. In 2010, BIT became one of the first government institutions (if not the first) dedicated to applying behavioural science to policy and public administration. Set up as part of the UK Prime Minister's Office and the Cabinet Office, the seven members were set three objectives to achieve in order to avoid triggering a ‘sunset clause’ that would see the team shut down on its two-year anniversary. These were: (1) transform at least two major areas of policy; (2) spread an understanding of behavioural approaches across Whitehall; and (3) achieve at least a tenfold return on cost.
In addition to these objectives, BIT also developed two main guiding principles, which are still core to its practices today. The first principle was to have a positive social impact, inspired by Richard Thaler's mantra of ‘nudge for good’. This principle has not only influenced BIT's choice of projects, it has also meant making our findings publicly available in order to spread the use of behavioural approaches as widely as possible. The second principle was to robustly evaluate the impact of its interventions. Often, this principle was realised by promoting the use of randomised controlled trials (RCTs) wherever possible, including in public administration contexts where they had previously been uncommon. We summarised this principle as ‘test, learn, adapt’ (Haynes et al., Reference Haynes, Goldacre and Torgerson2012).
We also adopted some strategies to address the specific challenges we faced in our early days. One was that we would focus initially on translating the best-evidenced interventions from the behavioural science literature in order to provide a proof of concept and some ‘quick wins’. This would have the additional benefit of addressing concerns that effects generated in laboratory studies would not translate into real-world contexts (Institute for Government and Cabinet Office, 2010). Secondly, it was decided to focus on revenue-producing or money-saving projects. Not only would such projects address Objective 3 and help build the case to maintain the team, but also they could often draw on practical advantages such as reliable data sources and established communication mechanisms. The combination of our initial focus on small changes to existing processes, the connection to Thaler and Sunstein's book (Reference Thaler and Sunstein2008) and the generous support of Thaler himself led to the team attracting the nickname of ‘the Nudge Unit’.
Where are we now?
In order to meet increasing demand in the UK and abroad, BIT ‘spun out’ of government in 2014 and became a social purpose company. The organisation is still partially owned by the UK government, but is now shared with Nesta (the innovation charity) and the employees themselves – with each of these partners owning a third of the company. The team now has over 120 staff and offices in London, Manchester, New York, Sydney, Singapore and Wellington. It has strategic partnerships with several universities, including Harvard, University College London and Imperial College London. Its 2015–2016 annual update contained the results of 46 individual randomised trials (BIT, 2016). Despite these changes, BIT has maintained its focus on the guiding principles of social impact and robust evaluation.
At the same time, other UK government departments have established their own behavioural science teams, including HM Revenue and Customs (who have a team twice the size of BIT's original Cabinet Office incarnation), Public Health England, the Department of Work and Pensions and the Department of Education. This expansion has not been limited to the UK, however: many countries now explicitly use behavioural science in policy settings. The Organisation for Economic Co-operation and Development (OECD)’s report on behavioural insights (OECD, 2017) contains 150 case studies of such applications gathered from around the world – and this list is illustrative rather than exhaustive. Dedicated teams have been set up in the US federal government, the United Nations, the World Bank and in more than a dozen other countries and organisations around the world. Many reports on the application of behavioural science worldwide have been written (e.g. World Bank, 2015). Other jurisdictions have made use of behavioural insights through non-governmental organisations and consultancies that work mainly or exclusively in the area (Whitehead et al., Reference Whitehead, Jones, Howell, Lilley and Pykett2014). The scale and duration of activity led the OECD to conclude that “Behavioural insights can no longer be seen as a fashionable short-term foray by public bodies. They have taken root in many ways across many countries around the world and across a wide range of sectors and policy areas” (OECD, 2017, p. 13).
Much work has also gone into the development and dissemination of ideas to support (or question) these activities. Dedicated journals, such as this one, have been created to consider the relationship between behavioural science and policy. Several edited volumes on the topic are either in circulation or in press (Oliver, Reference Oliver2013; Shafir, Reference Shafir2013; Pykett et al., Reference Pykett, Jones and Whitehead2017). Many universities have set up centres, groups and networks dedicated to the application of behavioural sciences to public sector issues (and some to critique attempts to do so). An increasing number of international conferences explicitly bring together academics and practitioners interested in the topic – and there appears to be the demand to sustain them. The UK government now requires policy-makers to have received training in principles from behavioural science.
There is much for proponents of behavioural science to be pleased with here, but it is difficult to argue that it constitutes a revolution. There have been behavioural insights teams that have failed to get off the ground or that have been launched and failed to make a meaningful contribution – whether through contingent factors or deficiencies in ability. While behavioural science is much more widely used than it was, it has yet to sit alongside economics as a discipline dominant in the thinking of policy-makers. For example, BIT has conducted over 400 RCTs in the last seven years on policy issues including charitable fundraising, changing general practitioners’ prescribing behaviour, getting people back into work and stopping people from reoffending. However, this obviously only represents a tiny fraction of total public sector activity. Moreover, as discussed below, there is a danger that behavioural science is seen to offer merely technocratic tweaks, rather than the more wide-ranging reassessment of public administration that could be possible.
The ‘replication crisis’ currently gripping psychology (and other sciences) is also creating wider and more profound consequences for behavioural science and policy. A combination of questionable research practices and small sample sizes, particularly in laboratory experimental research, means that many previously accepted findings are now being called into question. The crisis should not be dismissed as of merely academic interest, since several of these findings are ones that have been – or could be – applied to policy problems. What the global community of behavioural scientists does next will determine whether policy-makers will continue to see behavioural science as a reliable source of policy ideas and approaches.
Where do we go from here?
Below we present two clusters of ideas. The first deals with the complications and challenges that face behavioural public policy: the long-term effects of interventions; repeated exposure effects; problems with proxy measures; spillovers and general equilibrium effects and unintended consequences; cultural variation; ‘reverse impact’; and the replication crisis. The second cluster concerns opportunities: influencing the behaviour of government itself; scaling interventions; social diffusion; nudging organisations; and dealing with thorny problems. We recognise that there may be overlaps between these topics.
Cluster 1: Complications and challenges
One common concern is that we lack evidence that effects on behaviour endure for a substantial period. Despite many field experiments testing the application of psychology or behavioural economic interventions to practical problems (a good summary of which can be found in DellaVigna, Reference DellaVigna2009), relatively little is known about the long-term effects of many such interventions (Frey & Rogers, Reference Frey and Rogers2014). BIT has experienced this problem because it has been required to range widely across many policy areas, which has sometimes limited the incentives to return to an intervention in order to assess its long-term consequences. We are beginning to address this omission in some areas. We know, for example, that non-compliant Guatemala taxpayers who received either a social norm or an ‘oversight message’ were significantly more likely to comply 12 months later, despite not receiving another letter in the interim (Kettle et al., Reference Kettle, Hernandez, Ruda and Sanders2016). We are also engaged in a long-term evaluation of the economic impact of receiving a growth voucher, one of the largest RCTs of its kind.Footnote 1
One underlying issue is that public officials and academics (particularly junior scholars) are rarely incentivised to choose studies where the main outcome measure will only be reported far in the future.Footnote 2 Obviously, a longer discussion is needed to deal with the various aspects of this problem, so we only sketch some ideas here. One is that public officials applying behavioural science should plan to achieve a balance between longer-term outputs and the short-term outputs that they often need to justify continued support and funding. Another possibility is to anticipate that personnel and attention will shift as time moves on and mitigate the consequences. For example, those running trials could be required to leave them in a state that allows others to revisit them and match interventions to longer-term outcomes. Initiatives like the ‘Datalab’ created in HM Revenue and Customs could allow this to happen.Footnote 3 The Datalab works by listing the datasets available to researchers, inviting proposals for research and then allowing researchers to access and analyse the anonymised data. Thus, the data associated with any behavioural tax compliance initiative could be accessed later by other researchers.
It is reasonable to expect to see more studies of long-term effects emerge as the field matures. From BIT's perspective, the need to focus more on ‘quick wins’ has subsided, and so we have more scope to pursue projects that pay off only in the longer term. In the meantime, we should also address the problem actively. For example, Frey and Rogers (Reference Frey and Rogers2014) give a helpful overview of the different pathways that can be used to make long-term effects more likely. We can also identify at least three situations in which concerns over long-term effects may be avoided or assuaged.
Some interventions might have a ‘once and done’ property, and therefore have lasting effects without requiring any follow-up. Obvious examples here are those where an individual is being asked to complete a single action that changes their status in some way – for example, registering as a potential organ donor or enrolling in a workplace pension.
Another possibility is that an intervention may succeed in creating a new behavioural pattern that is sufficiently resilient to endure, even if the stimulus is withdrawn. Examples include when people are paid to visit the gym multiple times within a limited period, and then continue to show elevated attendance when payments are removed (Charness & Gneezy, Reference Charness and Gneezy2009), or when a single letter leads to a sustained shift in the prescribing practices of doctors (Hallsworth et al., Reference Hallsworth, Chadborn, Sallis, Sanders, Berry, Greaves and Davies2016). The initial changes may be unintentional, rather than constituting a carefully constructed intervention. Data from the London Underground shows that during a two-day strike in 2014, most people found a new route to work. What is interesting is that 5% of these people did not return to their old commute: the disruption led to new sustainable practices (Larcom et al., Reference Larcom, Rauch and Willems2017).
An intervention might introduce a sustainable change to the decision-making environment that is likely to influence choices over the long term – for example, changes to the design and physical properties of a hospital waiting room. In this case, the behavioural stimulus endures, making it more likely that the behaviour will as well. A slight variation on this point are those interventions where a behavioural analysis suggests that the best option is not to try to change behaviour, but rather to mitigate its consequences. In other words, the intervention does not actually require people to do anything differently. Perhaps the best examples here are the successful interventions to reformulate food to reduce levels of salt and sugar while not changing consumer purchasing patterns.
Repeated exposure effects
A related concern is that repeated exposure to behaviourally informed approaches or interventions will lead to diminishing returns. We can distinguish between two main cases here: first, ‘structured repetition’, where the same approach is deliberately used as a direct follow-up to an initial intervention in order to reinforce its effects (e.g. reminders)Footnote 4; and second, ‘unstructured repetition’, where individuals are exposed to the same kind of approach from different actors, at different times and in relation to different topics (e.g. when a social norm message appears in a different context). In both cases, the concern is that we may have a prior expectation that approaches become less effective with repeated exposure. The assumption underpinning this expectation is that the impact of an approach is driven by a novelty effect, wherein the approach: (a) succeeds in attracting attention in the first place; and (b) provides salient motivation to act in a particular way. Repeated exposure means the approach becomes less salient and novel, and thus less effective. (It is worth noting, therefore, that these concerns relate mainly to approaches that require attention from the individual, rather than those that are not immediately apparent – like default changes [see Hansen & Jespersen, Reference Hansen and Jespersen2013]. One would expect the latter to continue to be effective, since they rely more on automatic processes.)
Taking the ‘structured repetition’ question first, we can see that there is limited evidence available, which can lead us to compare wildly different studies. In some instances, we can see support for the diminishing returns hypothesis. For example, one BIT study showed that giving investment bankers sweets one year had a large impact on their willingness to donate a day's salary to charity. The next year, the sweets had the same impact on bankers that had not received sweets before. However, bankers that had received the sweets the first time were less likely to give the second time around, although they remained more likely to give than participants who did not receive sweets at all (Sanders, Reference Sanders2015). At the same time, there is also evidence for an alternative hypothesis: repeated structured exposure can reinforce the initial intervention, rather than undermine it. Allcott and Rogers (Reference Allcott and Rogers2014) analysed the effect of repeated exposure to home energy reports containing social norm information and energy-saving tips. They found that such repeated exposure had the effect of keeping consumption lower compared to a group for whom the reports were dropped (although after two years, consumption remained lower in both groups than it was in the pre-treatment period). It seems clear that various aspects of the target behaviour and intervention could determine whether habituation or reinforcement occurs. We may hope that, as the number of studies that involve structured repetition increases, we are able to start drawing conclusions about what variables are associated with one outcome or another. We feel relatively confident that time will answer many of these questions.
On the other hand, we can be less sure about whether and how issues around unstructured repetition will be addressed, mainly because they are likely to be more complex and less tractable to analysis. Essentially, the problem of unstructured repetition is caused by success. BIT has a mission to increase the use of behavioural approaches that can achieve policy outcomes and increase the public good. As a result, we try to promote solutions that seem to work (as is common in academia). To give an example, Hallsworth et al. (Reference Hallsworth, List, Metcalfe and Vlaev2017) considered the impact of including social norm messages (“9 out of 10 people pay their tax on time”) on tax compliance and found a significant and positive impact. Similar effects of social norm messages have been replicated in other fields, some of which are reviewed in John et al. (Reference John, Sanders and Wang2014).
The problem is that if the use of social norm messages spreads to many other policy domains, we have to start considering the effect of receiving the same or similar interventions from different quarters at the same time. How effective would social norm messages be if they were found on our gas bills, tax letters, inducements to travel by public transport and reminders to attend class at the local college? In other words, what are the general equilibrium effects? We might be concerned that the unstructured nature of the approaches, associating the same message with different behaviours, might mean that habituation occurs, but not reinforcement.
Unfortunately, most of these studies to date cannot answer this question, since they took place while few other studies were being conducted on the same sample at any given moment. Since the community of people running such experiments has been small, it is likely that crossovers have been avoided. In the future, the main problem is likely to be one of coordination – since there are many different actors who could be using the same behavioural science approach, it is unlikely that any one study will be able to identify the other activities in play and take them into account. Perhaps the most we can do right now is to look at the studies that assess the impact of highlighting the use of behavioural science. Loewenstein et al. (Reference Loewenstein, Bryce, Hagmann and Rajpal2015) showed that emphasising to people that their behaviour had been influenced by a particular default setting did not undermine the effectiveness of the nudge (see also Bruns et al., Reference Bruns, Kantorowicz-Reznichenko, Klement, Luistro Jonsson and Rahali2016). Clearly, this is not exactly the same issue, but it could suggest that a higher profile for behavioural policy may not necessarily mean a loss of effectiveness.
Problems with proxy measures
One response to the difficulties in including long-term effects is to use proxy measures.Footnote 5 For example, many studies in education take attendance as an outcome measure (e.g. Kraft & Rogers, Reference Kraft and Rogers2015; Chande et al., Reference Chande, Luca, Sanders, Soon, Borcan, Barak-Corren, Linos, Kirkman and Robinson2017). We might reasonably expect that school attendance would have a positive impact on grades, as shown in Gottfried (Reference Gottfried2010)’s analysis of longitudinal data on school attendance and attainment. Similarly, one can infer long-term effects of behavioural interventions that reduce short-term unemployment (such as Altmann et al., Reference Altmann, Armin, Jäger and Zimmermann2015) on the basis that they prevent ‘employment scarring’ – the tendency for one protracted spell of unemployment to lead to others, even if the first spell can be attributed to bad luck, such as graduating into a recession (Arulampalam et al., Reference Arulampalam, Gregg and Gregory2001).
However, the causal relationship between the two types of measures may not be constant. This can be thought of as a behavioural science-specific form of the Lucas critique found in economics. Lucas (Reference Lucas, Brunner and Meltzer1976) argued that since consumption functions are not fixed and can respond to changes in circumstances, they are not ‘policy invariant’, and so may change in response to changes of policy. In economics, this critique remains important, as policies designed with a particular intended direction or magnitude of effect, based on current policy-response parameters in consumption functions, might be undermined because consumers’ consumption functions change in response to the policy itself.
The equivalent argument is that the relationship between short-term and long-term behaviours is not ‘nudge invariant’: the act of nudging someone can cause their short- and long-term behaviours to become untethered from each other. To illustrate, we can imagine that there is a well-established causal relationship between applying for jobs and leaving unemployment. It need not hold, however, that increasing job applications necessarily leads to increasing employment. For example, it is possible that marginal job applications are of a lower standard, or are less well targeted, than those that preceded them.Footnote 6 Or we might speculate that a quick return to work may be less insulating from the psychological aspect of employment scarring if people are aware that they were nudged into work.
This problem is likely to be particularly acute with measures that are currently considered to be reliable proxies but where no causal chain exists. Self-control in children (e.g. measured by a ‘marshmallow test’) has been shown to predict several later-life outcomes and particularly later-life self-control (Mischel et al., Reference Mischel, Shoda and Rodriguez1989). Given the existence of a problem early on in life, many would argue that it is at this stage that interventions should be targeted. Although this makes sense, we have no reason to expect (given the lack of a causal link between marshmallows now and alcohol later) that today's proxy and tomorrow's outcome are inextricably linked, or even linked that strongly. Hence, it may be possible that we can change one behaviour (self-control while young) without influencing the other(s) at all. This would not invalidate the intervention, as self-control when young may have all manner of direct positive benefits in the short and medium term. However, behavioural scientists should be cautious about extrapolating too far from their findings and should, through long-term data monitoring, attempt to identify whether relationships between variables that exist in the absence of interventions continue to do so in their presence.
Spillovers and unintended consequences
The argument about proxy measures can be taken in a more troubling direction. Rather than just questioning the relationship between immediate and proxy measures, we could point to the possibility that interventions are having unintended and unmeasured effects elsewhere. In this view, the focus in the experimental paradigm on specific, predefined outcome measures becomes a source of weakness as well as strength. Research on applying complex adaptive systems to policy emphasises how we may shift the dial on one outcome measure, even as problems mount elsewhere unobserved (Dörner, Reference Dörner1996; Dolphin & Nash, Reference Dolphin and Nash2012).
In some senses, this phenomenon may be unsurprising because governments may have a multitude of policy goals that they may be pursuing at the same time. Fisman and Golden (Reference Fisman and Golden2017) offer a good example of the hidden trade-offs between these goals that interventions may reveal. They documented a US initiative that attempted to reduce fraud committed by grocery stores participating in a government food voucher scheme intended to improve nutrition. The fraud consisted of artificially inflating the prices charged for the goods covered by the scheme and was only possible because stores were not required to reveal the prices they charged to cash customers. The new initiative made changes that revealed the discrepancy. As hoped, the fraud disappeared. But the change also had the unintended consequence that many retailers simply stopped selling the high-nutrition foods altogether, since they were no longer profitable. Those who retained the foods increased their prices by nearly 10%. The result was poorer nutrition for both those on the programme and those in the local area who used the same stores.
Again, this argument about unintended consequences is not specific to behavioural policy interventions (see Cartwright & Hardie, Reference Cartwright and Hardie2012). But since behavioural science itself studies how such spillovers are triggered, there is a case to be made that it should be particularly sensitive to them. Here we can point to the large literature on how ‘licensing effects’ may mean that attempts at self-control in one domain create indulgence in another, how virtuous behaviour in one situation may increase dishonesty elsewhere and so on (Khan & Dhar, Reference Khan and Dhar2006; Merritt et al., Reference Merritt, Effron and Monin2010).
True universals and cultural variation
As noted above, many more people and groups have been applying behavioural science to policy around the world. One description of what these people are trying to do is “to bring a more realistic model of human behaviour into … policy and regulation” (Halpern, Reference Halpern2016). This is a useful shorthand and is not strictly speaking incorrect. But it also suggests that behavioural public policy has a tendency to treat human beings as a broad class, with variation occurring primarily at the individual level. Although we might, as in the case of stereotype threat, allow for the possibility of systematic differences by gender (Spencer et al., Reference Spencer, Steele and Quinn1999) and ethnicity (Steele & Aronson, Reference Steele and Aronson1995), behavioural scientists are relatively quiet on the issues of nationhood and culture.Footnote 7
As Henrich et al. (Reference Henrich, Heine and Norenzayan2010) noted, most people in the world are not ‘WEIRD’ – Western, educated, industrialised, rich and developed – but a large proportion of the people who apply behavioural science are (as are the subjects of their studies). This issue may be particularly acute for undergraduate participants in the labs of American universities, but it is still true to a great extent for participants on Amazon's Mechanical Turk and for participants in many field experiments conducted by BIT, as these mostly take place in the UK, Australia and the USA.
This perspective makes it much more troubling that we lack field experimental evidence on the effectiveness of the same interventions in different contexts. We can make some general predictions, however. There is relatively strong evidence that Western societies have a stronger concept of individual, autonomous actors than East Asian societies, where a more collectivist perspective is often more prevalent (Heine, Reference Heine2008). This difference may have an impact on the relative effectiveness of certain interventions. For instance, Bond and Smith's (Reference Bond and Smith1996) study suggests that this difference means that social norm effects are larger in non-Western societies. Interestingly, Kettle et al. (Reference Kettle, Hernandez, Ruda and Sanders2016, p. 36) found that social norms in tax letters were similarly effective in Guatemala, even though they stated that “65 percent of people pay their tax on time,” a very different number from that found on UK tax letter.Footnote 8
The growth of teams around the world that draw on the same literature and develop similar interventions may allow us to address this challenge. If new teams are able to conduct very similar replication studies, it could be possible to establish which (if any) heuristics and biases are common to which societies and, for those that are not universal, what factors moderate or mediate their effects. This process is likely to identify issues, contexts and cultures that challenge behavioural scientists to develop and test new theories and interventions.
In recent years, the UK higher-education sector has been encouraged to document how its research has ‘impact’, defined as “an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia” (Penfield et al., Reference Penfield, Baker, Scoble and Wykes2014, p. 21). Interestingly, behavioural scientists working on policy issues may experience the situation in reverse. Rather than publishing peer-reviewed research that may then influence government action, they may alter government actions and then attempt to publish the results in peer-reviewed journals. In other words, the impact comes first.
The issue is that there may be several barriers to academic publication, many of which BIT has experienced in its history and tried to surmount. Publication is usually entirely dependent on the approval of senior officials, who may see little reason to approve this request, particularly if there is any transfer of data involved (even anonymised data). The issues are particularly acute if the people responsible for the study are public officials, as BIT staff members were until spring 2014. In this case, there are often few resources provided to support the publication process, which is likely to be seen as a luxury. But even if resources are supplied by academic partners, there may be differences over the proposed timing, framing and conclusions of any potential publication. While BIT has always been pressing the case for publication, it has not always been successful.
Should we be concerned by the lack of ‘reverse impact’? Here, we should distinguish between initiatives that are not made public and those that are made public but do not go through the additional step of peer-reviewed publication. The former create obvious problems in terms of reducing the transparency of government, as well as potentially creating a ‘public file drawer’ problem if it is null results that are selectively held back. The latter are less serious but may mean that (depending on exactly how they have been reported) the quality and reliability of their results cannot be fully assessed. The worry is that if not all the studies have been prepared and structured carefully, this may introduce quality uncertainty, where even good studies are suspected of being potentially flawed (Akerlof, Reference Akerlof1970). One obvious solution for this second problem is for public but non-peer-reviewed reports to provide enough detail to allow a reasonable judgement of their quality.
The replication crisis
Finally, we wanted to mention the challenge that the replication crisis brings behavioural public policy. BIT takes many of its intervention ideas from existing lab studies or field experiments and is therefore dependent on their reliability. If this reliability is questionable, there are two main consequences for behavioural public policy: first, and most importantly, it would mean that we have wasted resources trying to implement concepts that are not viable – resources that could have been allocated more profitably; and second, it could damage the trust policy-makers, and the public, have in behavioural science, with the result that they become reluctant to use the approach in future.
This is a challenge that will not go away immediately, since new questionable practices are still being discovered regularly. International efforts on pre-registration, data sharing, identification of statistical irregularities and replication consortia (such as the Many Labs Replication Project; see Klein et al. Reference Klein, Ratliff, Vianello, Adams, Bahník, Bernstein and Cemalcilar2014) are all part of the emerging solution to the crisis. However, as behavioural scientists, we are all responsible for helping the field overcome this challenge. In particular, those behavioural scientists who are helping to form policy (and thus determine the allocation of limited public resources) have a special responsibility to review the relevant evidence critically and in detail, bearing in mind the problems that have recently emerged. Robustly evaluating the ensuing interventions becomes even more important. Finally, and most painfully, we need to accept with humility that some of our own trials may have been one-off findings. The implication is that we should be attempting to replicate an intervention as a matter of course before trying to scale up an intervention. This would present a significant challenge, since the public sector is often still reluctant to evaluate interventions in the first place.
Cluster 2: Opportunities
Those who apply behavioural science to policy have focused most of their energies on using new approaches to improve policy outcomes. Much less attention has been paid to behavioural science as a tool to improve the way government itself functions (Lodge & Wegrich, Reference Lodge and Wegrich2016). This division of resources reflects how incentives are structured: individuals and groups using behavioural science may not seek to challenge fellow public sector actors because they need them as allies for implementing interventions. Yet there are strong reasons for doing so.
The main reason is that these critiques are almost certainly correct. Government is vulnerable to biases and pathologies in the way it acts; no serious observer could argue otherwise. Moreover, many of these traits are the same ones that behavioural science has discussed extensively in relation to individuals. A minister visiting a hospital or factory may be exposed to a particular piece of information that acts as a breeding ground for confirmation bias. A finance officer will rely heavily on the previous year's budget as the default when allocating resources, even if ‘zero-based budgeting’ is applied.Footnote 9 There are real gains to be made here; the task should not be dismissed as too difficult or mundane. Behavioural scientists have much to add to existing studies on organisational behaviour in this regard.
The other reason is more tactical. Critics of using behavioural public policy often use the fact that government is also vulnerable to biases to justify the claim that governments should simply refrain from nudging (or other applications) altogether (Waldron, Reference Waldron2014). This does not necessarily follow: a stronger argument is that this points towards the need for more behavioural science, not less, just applied to government itself. Yet, in general terms, behavioural scientists have allowed this argument to go mostly unchallenged, partly because they have not developed the arguments or generated the examples to do so. With both these reasons in mind, BIT is currently working on a self-funded project on this issue.
Scaling interventions represents a specific opportunity for taking a behavioural approach to government itself. The application of behavioural science has often adopted a basic approach of ‘experiment and then scale’: a trial is conducted on a sample (which nonetheless might be relatively large) and the best-performing variant is adopted more widely. In many cases, interventions are constructed to ensure that they can be integrated into existing large-scale practices, so they have a clear path to wider adoption.
Perhaps the best examples are those that concern changes to messages that are already being sent out, since the change itself can require little expenditure. Hence, many of BIT's studies have concerned modification of letters (e.g. Hallsworth et al., Reference Hallsworth, List, Metcalfe and Vlaev2017), text messages (e.g. Haynes et al., Reference Haynes, Green, Gallagher, John and Torgerson2013) or web forms (Kettle et al., Reference Kettle, Hernandez, Sanders, Hauser and Ruda2017). Other kinds of intervention are a feasible route to scaling up, even if they are not as cheap or easy as message changes. Examples might be changes to the way payments are structured, such as using lotteries (SBST, 2016), different match rates (Huck & Rasul, Reference Huck and Rasul2011) or social incentives (BIT, 2016).Footnote 10
However, despite these apparent advantages, the proportion of behavioural science interventions that reach scale could be much higher. Even when interventions are scaled, this does not necessarily mean that they retain their original effectiveness. A well-known example concerns surgical checklists, a patient safety measure that requires the surgical team to check whether certain actions have been carried out. Checklists had been highly successful in improving patient outcomes in previous studies, reducing surgical complications from 11% to 7% (Haynes et al., Reference Haynes, Weiser, Berry, Lipsitz, Breizat, Dellinger and Merry2009). However, when their use was mandated across hospitals in Canada, that effect disappeared, despite self-reported compliance being over 90% (Urbach et al., Reference Urbach, Govindarajan, Saskin, Wilton and Baxter2014). Checklists also did not improve outcomes in a hospital in The Netherlands; patients for whom the checklists were used correctly had improved outcomes, but they were only used for 39% of the procedures (Van Klei et al., Reference Van Klei, Hoff, Van Aarnhem, Simmermacher, Regli, Kappen and Peelen2012). In these cases, both context and how the checklist was implemented, as well as a potential lack of intensive training or monitoring of compliance, may have affected their effectiveness (Urbach et al., Reference Urbach, Govindarajan, Saskin, Wilton and Baxter2014).
It is important to note that the failure of innovations to reach scale is often observed in the public administration literature: the issue is not specific to behavioural science interventions (Bason, Reference Bason2010). Nevertheless, there are at least two reasons why failure to scale is particularly relevant to behavioural public policy. The first is that recent efforts in this field have tended to couple behavioural science with an emphasis on the importance of experimentation (Haynes et al., Reference Haynes, Goldacre and Torgerson2012; SBST, 2015; Ames & Hiscox, Reference Ames and Hiscox2016). There are many good reasons for this approach. However, the danger is that it means too much focus is placed on trials and trial results as ends in themselves (particularly in the light of our comments about proxy measures above). Furthermore, it is arguable that academic incentives encourage innovative results and highly cited publications, but fail to encourage researchers to scale their interventions. We need to ensure that the individuals and organisations applying behavioural science to policy consider scalability when selecting projects and that they are incentivised to look beyond the successful completion of a trial. Luckily, this is something government and other organisations (e.g. the Educational Endowment Fund and the Health Foundation) are becoming increasingly interested in, providing new opportunities for trials in this area.
The second reason is that one could argue that the issue of scaling is, at heart, one of organisational behaviours – and thus behavioural science itself may have something useful to say about the problem. We are aware, of course, of the contribution that implementation science has made in this regard. But we argue that more of the attention and resources dedicated to behavioural public policy should be directed to this question (as for government activity in general). For example, the literature on diffusion of innovations has established that homophily (i.e. the tendency for individuals to associate with similar others) plays a significant role in how ideas do or do not reach scale (e.g. Yavaş & Yücel, Reference Yavaş and Yücel2014). It seems likely that behavioural science could contribute to this field by, for example, offering insights into how the source of a message affects its persuasiveness or into how social norms function. But we are still some way from using behavioural science to give policy-makers recommendations about how to use homophily to maximise the likelihood that others take up their innovation. How best to scale up and spread interventions is an area where there is a strong need for further research that uses the same robust methods that are associated with behavioural science trials.
Many applications of behavioural science to policy have adopted a simple, unidirectional model of influence: a public sector actor attempts to influence (usually) an individual, organisation or group. This approach is clearly important and covers many common policy situations. However, it notably excludes avenues for influence that exist between individuals, organisations or groups. Sacerdote's (Reference Sacerdote2001) canonical study of social influence, in which undergraduates at Dartmouth College were randomly assigned to roommates, found that spatial proximity (and, presumably, the relationships that form as a consequence) significantly influences Grade Point Average (GPA) at the end of students’ first year of college. Moreover, it is likely that this estimate is a lower bound, since most friendships are formed through associative matching rather than random assignment.
Arguably, behavioural science's focus on individual decision-making means it has neglected relevant insights from theories about social networks and systems thinking (Ormerod, Reference Ormerod2010). Obviously, this is a generalisation, and many such valuable studies do exist (e.g. Drago et al., Reference Drago, Mengel and Traxler2015; Kim et al., Reference Kim, Hwong, Stafford, Hughes, O'Malley, Fowler and Christakis2015). But the potential gains to policy-makers are so large that they justify more work. In a world of limited resources, a better understanding of how to harness peer-to-peer transmission of behaviour could mean that the same or better outcomes are achieved at much less cost (since far fewer contacts from government would be needed). The same is true if governments are attempting to limit this transmission.
BIT has conducted some preliminary studies on how to create ‘network nudges’. One example compared the difference between: asking an email recipient (at an investment bank) to click on a link to donate to charity; asking the recipient to “reach out and email their friends and colleagues”; and asking them to reach out and tell their acquaintances “about the huge contribution their donation can make.” For each of these variants, the proportion of investment bankers making a donation was 12.4%, 23.6% and 38.8%, respectively (BIT, 2016). This is clearly a very simple example, however, and much more work is needed to identify the best ways to use social networks.
Most behavioural science interventions have focused on individuals, most commonly in their roles as citizens and consumers, and less frequently, but increasingly, as employees. There has been much less work that targets organisations, and there are a few likely reasons why. One is that (generally, but not exclusively) the main academic disciplines informing behavioural policy have tended to use the individual as their unit of analysis. Another is that regulators, which constitute one of the main points of contact between government and organisations, have traditionally taken a legalistic approach that is averse to experimentation and innovation. Recent developments show that this attitude is changing (Financial Conduct Authority, 2013). In addition, the drive for robust evaluation may have guided attention towards individuals. Since there are fewer organisations than people, randomised trials that use organisations as the unit of analysis may be more likely to encounter problems with statistical power.
The more important point is that many policy-relevant decisions are made by organisations. The majority of fossil fuels are consumed by organisations, through transportation of products, the powering of factories and so on. Although consumer behaviour is an important part of preventing and mitigating climate change, any strategy that does not incorporate changing business behaviour is destined to fail. This argument also applies to the products that firms make. Policy interventions may try to reduce carbon emissions by getting people to drive less (Leape, Reference Leape2006) or to reduce calorie consumption using food labels (Roberto et al., Reference Roberto, Larsen, Agnew, Baik and Brownell2010). However, in both cases, the problem could be solved upstream if firms made cars that used less petrol or reformulated their products to contain fewer calories – something the UK's new sugared drinks levy will aim to address.
How far can we translate findings from individual psychology to organisational psychology, or is this attempt hopelessly naïve? After all, businesses are not people, but they are made up of people.Footnote 11 If the government were to write a letter to all grocery stores asking them to stock more low-calorie drinks and fewer high-calorie ones, that letter, if it is read at all, will be read by a person, and so to the extent that people can be nudged, we could expect businesses to be ‘nudgeable’.
This argument may not follow, however. Many firms may have created processes to prevent single employees from making large decisions – indeed, these processes may have been created explicitly to guard against cognitive biases. Two or more decision-makers may not be more biased than one, but if only one of them has been nudged, the intervention's effect may be reduced. There is also a question of identity (Akerlof & Kranton, Reference Akerlof and Kranton2000). When people are working for a firm, we do not expect them to behave the way that they do at home. The evidence suggests that relatively weak, short-term prompts that invoke a particular identity, such as being ‘a voter’ rather than simply ‘voting’ (Bryan et al., Reference Bryan, Walton, Rogers and Dweck2011), can have substantive effects on subsequent behaviour. Employees may have even more intense and sustained exposure to prompts that encourage behaviours that are in line with corporate identities (Akerlof & Kranton, Reference Akerlof and Kranton2005). Finally, we must acknowledge the great variety of organisational forms. For example, it is plausible that smaller firms may have less of a distinct identity from the individual(s) running them, which could lead them to behave more like individuals. Since organisation size is often recorded, there is a strong case for trying to answer this question by gathering together existing evidence on how size interacts with behavioural science interventions.
Governments have often approached the behaviour of organisations through the lens of regulation. At this stage, we do not know enough about how behavioural science can improve, complement or even replace regulation, despite the many valuable contributions of Cass Sunstein (Reference Sunstein2011). BIT has been considering this question, from the 2010 MINDSPACE report – which stresses that its framework “does not attempt to replace [legislation and regulation] … it extends and enhances them” (Dolan et al., Reference Dolan, Hallsworth, Halpern, King and Vlaev2010, p. 49) – to its recent report on applying behavioural insights to regulated markets (Costa et al., Reference Costa, King, Dutta and Algate2016). However, it is clear that we have made much less progress here than we have in the domain of individual decisions. Oliver (Reference Oliver2015)’s discussion of ‘budging’ rightly emphasises the potential gains that can be made in the field of behavioural regulation; clearly, more work is needed here.
One criticism of behavioural policy is that the breadth of its usefulness is limited to certain domains where one-off behaviours with binary desirable decisions are prevalent, such as tax compliance, attending appointments or enrolling in a pension plan. It is certainly the case that the majority of government teams that work in behavioural science have typically focused their early work on changing these kinds of behaviours. We (Hallsworth & Sanders, Reference Hallsworth, Sanders and Spotswood2016; Sanders & Halpern, Reference Sanders and Halpern2016) have been among those advocating for pursuing so-called ‘low-hanging fruit’ early in the life of a behavioural insights team. The social value of this work has now, become easy to downplay, particularly when policy-makers have become accustomed to ‘tax letter trials’ producing effects.
However, we argue it would be disappointing if compliance were the only application of behavioural science active in policy ten years from now. Our hope is that behavioural science has as much to offer on some trickier, more complex problems. When we think of a particularly challenging area of government policy, we often think of areas where the traditional tools of regulation, incentives and information have been extensively utilised and yet the problem persists. At first, this can seem something daunting and to be avoided, but it is precisely here that new tools should be used. If we consider that behavioural economics came into existence to explain phenomena that standard economic analysis found inexplicable, perhaps behavioural science can solve problems that standard economic tools have found insoluble.
The last few years have seen the beginnings of these changes at BIT in the UK. We are making dedicated attempts to consider how behavioural science can be applied to relieve poverty (Gandy et al., Reference Gandy, King, Streeter Hurle, Bustin and Glazebrook2016); to curb recidivism, where financial and regulatory incentives are ineffective for a substantial number of people; and to prevent corruption and bribery, which may be so culturally pervasive as to evade traditional tools. Work in these areas is still in its early stages. As a community of practice, we must not become disheartened if/when we do not achieve immediate successes in these areas. Where we do start to make progress, we must guard against the temptation to overstate our successes and to declare prematurely that the battle has been won.
Summary and conclusion
Tackling these challenges will require sustained effort and new collaborations. We should be prepared for the fact that interventions over the next few years will be harder to implement and may produce more ambiguous effects. As the replication crisis rolls onwards, we must similarly be ready for the possibility that many more things that we had previously taken to be true will cease to be credible.
We should, however, remain optimistic. A lot of progress has been made in a relatively short space of time. Breaking economists’ intellectual hegemony over government was never going to be straightforward, but – as is already apparent – it is worth the effort.
We are grateful to colleagues at the Behavioural Insights Team for numerous conversations over the years that have informed the content of this paper.