Introduction
Missed medical appointments – commonly referred to as ‘no-shows’ or ‘Do Not Attend’ (DNAs) – represent a persistent and costly challenge to healthcare systems worldwide. DNAs result in underutilised clinical capacity, inefficient resource allocation and reduced access to care for other patients (Dantas et al., Reference Dantas, Fleck, Oliveira and Hamacher2018). In the UK, the National Health Service (NHS) estimates the annual cost of missed general practitioner appointments alone at over £216 million (Werner et al., Reference Werner, Alsuhaibani, Alsukait, Alshehri, Herbst, Alhajji and Lin2023), while in the United States, missed visits are estimated to cost the health system US$150 billion annually (£112 billion) (Mooney, Reference Mooney2024). Similar patterns appear in low- and middle-income countries; for example, specialty outpatient clinics in Santa Maria, Brazil, reported an average DNA rate of 9.4% from 2016 to 2021, resulting in substantial costs for municipal health budgets (Jacobi et al., Reference Jacobi, Jacobi, Souza, Dorneles and Coronel2023). Comparable inefficiencies have been documented in Kenya, Nigeria and other LMIC settings (Opon et al., Reference Opon, Tenambergen and Njoroge2021; Fatoye et al., Reference Fatoye, Afolabi, Gebrye, Oyewole, Fasuyi and Mbada2024; Pacheco & Leal de Souza, Reference Pacheco and Leal de Souza2025). Likewise, across high-income countries such as Denmark, DNAs remain a significant source of inefficiency and strain on healthcare resources (Nørgård et al., Reference Nørgård, Iachina, Ammentorp, Schwalbe, Waidtløw, Richardt and Sodemann2025).
Beyond the financial burden, persistent non-attendance disrupts workflows, creates scheduling inefficiencies and increases administrative effort, often leading to staff frustration when time slots remain unused (Groden et al., Reference Groden, Capellini, Levine, Wajnberg, Duenas, Sow, Ortega, Medder and Kishore2021; Nørgård et al., Reference Nørgård, Iachina, Ammentorp, Schwalbe, Waidtløw, Richardt and Sodemann2025). Improving attendance enhances predictability, enabling clinicians to plan caseloads, allocate resources more efficiently and focus on patients who need care (Boone et al., Reference Boone, Celhay, Gertler, Gracner and Rodriguez2022). Even small improvements can have cascading system-level effects; for example, an AI-driven scheduling intervention improved capacity utilisation by 6% (Toker et al., Reference Toker, Ataş, Mayadağlı, Görmezoğlu, Tuncay and Kazancioglu2024). These pressures make reducing DNAs a key policy priority. Because DNAs ultimately result from individual behaviour, addressing them effectively requires interventions grounded in behavioural as well as operational insights.
Often the intuitive response to attempt to reduce DNAs is to fine patients. However, evidence suggests that these penalties have a poor track record as they do not reduce DNAs relative to non-fined patients (Blæhr et al., Reference Blæhr, Væggemose and Søgaard2018). In addition, they affect patients disproportionately not only because lower SES patients might have to reduce spending on essential items to pay fines, but because higher SES people might also have the knowledge or personal connections to ‘navigate’ the system and appeal to have fines removed furthering inequities. The study described in this paper provides policymakers and hospital administrators with an alternative method that is more equitable and effective.
The use of behavioural insights to reduce DNAs
Forgetfulness is the most frequently reported reason for DNA (Kaplan-Lewis & Percac-Lima, Reference Kaplan-Lewis and Percac-Lima2013; Parsons et al., Reference Parsons, Bryce and Atherton2021; Wilson & Winnard, Reference Wilson and Winnard2022; Charton et al., Reference Charton, Gatier, Delacour and Lépine2024). In a retrospective study of 5,604 patients, 927 (16.5%) missed their appointments, with forgetting cited as the most common reason (n = 97, 35.5%) (Kaplan-Lewis & Percac-Lima, Reference Kaplan-Lewis and Percac-Lima2013). A systematic meta-review of 20 studies similarly found that forgetfulness was the most frequent cause of non-attendance in 5 of them (Parsons et al., Reference Parsons, Bryce and Atherton2021). A review of 21 NHS studies published between 2016 and 2021 also reported forgetfulness as the primary reason for missed appointments (Wilson & Winnard, Reference Wilson and Winnard2022).
To address this issue, healthcare providers have increasingly turned to using reminders typically delivered a few days before a scheduled appointment. Such reminders deliver low-cost, scalable interventions that may be conceptualised as ‘behavioural interventions’ from the perspective of this journal. More generally, behavioural interventions can be defined as suggestions for new policy initiatives or procedures, or changes to existing ones, that results from systematically applying theoretical and methodological insights from the behavioural sciences to policy practices – from the identification and definition of the challenges, over analysing their nature, to testing and suggesting potential interventions for policy improvements dealing with these challenges (Hansen, Reference Hansen2019; Hallsworth, Reference Hallsworth2023).
Some interventions are found post-hoc to align with these insights and are referred to as behaviourally aligned; other interventions are inspired by these insights and referred to as behaviourally informed; and yet others are tested according to these insights and referred to as behaviourally tested (Sousa et al., Reference Sousa, Ciriolo, Rafael and Troussard2016). In this perspective, the increasing use of reminders can be conceptualised as originally being behavioural aligned, but increasingly behaviourally tested interventions, aimed at mitigating problems arising from humans’ limited attention giving rise to phenomena such as forgetfulness, overlooking and relegation of information (Hansen, Reference Hansen2019; Werner et al., Reference Werner, Alsuhaibani, Alsukait, Alshehri, Herbst, Alhajji and Lin2023). One way of explaining the increasing tendency to test behavioural interventions in the domain of public health may be seen as the natural compatibility of methods from the behavioural sciences – such as randomised controlled trials – with those traditionally used in clinical research and healthcare.
Today a substantial body of evidence, including meta-analyses and systematic reviews, has demonstrated that mobile phone reminders are generally effective at improving appointment adherence (Gurol-Urganci et al., Reference Gurol-Urganci, de Jongh, Vodopivec-Jamsek, Atun and Car2013). A Cochrane review concluded that text messaging increases attendance compared to no reminder, with an estimated improvement in attendance rates of 11–12 percentage points across studies (Gurol-Urganci et al., Reference Gurol-Urganci, de Jongh, Vodopivec-Jamsek, Atun and Car2013). Systematically reviewing the various channels available, Stubbs et al. (Reference Stubbs, Geraci, Stephenson, Jones and Roth2012) find that letter reminders, on average across 7 studies, reduced DNAs by 7.6%, while the more cost-effective technological channel of SMS reminders, across 12 studies, on average reduced DNAs by 8.6%. Another systematic review confirms the effect of using SMS reminders and adds that the effect is unchanged whether reminders are used in the primary health sector or outpatient clinics on hospitals, and whether the reminder is sent 24, 48 or 72 hours prior to the appointment, and whether they are sent to younger or older patients (Eriksen & Kjellberg, Reference Eriksen and Kjellberg2013). A third meta-study finds the effectiveness of reminders to be even more effective, with an average reduction in DNAs of 34% independent of whether reminders were sent the day before or the week before the appointment (Hasvold & Wootton, Reference Hasvold and Wootton2011). However, this study also finds that automatic calls and SMS reminders reduced DNAs on average by 29%, while the less cost-effective approach of reminding people by manually calling them reduced DNAs by 39% on average, thus suggesting that there is more to DNAs than merely forgetting.
This point is confirmed by research which points out that patients’ tendency to attend or miss appointments – besides the psychological factors of forgetfulness and limited attention – correlates with structural factors such as transportation or work commitments (Werner et al., Reference Werner, Alsuhaibani, Alsukait, Alshehri, Herbst, Alhajji and Lin2023) as well as demographic and clinical factors, such as younger age, male sex and mental health diagnoses (Nørgård et al., Reference Nørgård, Iachina, Ammentorp, Schwalbe, Waidtløw, Richardt and Sodemann2025). Still, as such structural, demographic and clinical factors are difficult to change, applied behavioural science has tended to study how cost-effective framing the content of reminders may ‘nudge’ patients to keep their appointments without applying additional coercion or financial incentives that would otherwise exacerbate the unequal influence of structural, demographic and clinical factors. A notable example in this stream of research is Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) who found that simply including DSNs (descriptive social norms; ‘9 out of 10 patients attend’) reduced DNAs, and perhaps more surprising that including the SIC of a missed appointment (specific institutional costs; ‘Not attending costs the NHS £160’) significantly reduced DNAs compared to a standard reminder. Other experiments have corroborated and extended this point that there is more to DNAs than merely forgetting. For instance, Berliner Senderey et al. (Reference Berliner Senderey, Kornitzer, Lawrence, Zysman, Hallak, Ariely and Balicer2020) conducted a large-scale A/B testing trial in Israel, finding that emotionally resonant messages – particularly those invoking guilt – reduced the DNAs rate, lowering them by nearly 7 percentage points compared to control groups. Similarly, Groden et al. (Reference Groden, Capellini, Levine, Wajnberg, Duenas, Sow, Ortega, Medder and Kishore2021) found that behavioural messaging through appointment cards and signage in a US primary care clinic improved both visit adherence and long-term retention among high-cost, high-need patients.
While these results suggest that interventions based on integrating behavioural insights in reminders to reduce DNAs can yield real and measurable improvements in health care delivery when implemented at scale, there is also a need for caution. Teo et al. (Reference Teo, Niederhausen, Handley, Metcalf, Call, Jacob, Zikmund-Fisher, Dobscha and Kaboli2023) have cautioned against assuming the universal applicability of such nudges, particularly in complex care environments such as mental health. In a pragmatic trial at a U.S. Veterans Affairs medical centre, they found that incorporating persuasive nudges into appointment letters failed to produce statistically significant reductions in DNAs. More generally, recent meta-reviews (e.g., Hummel & Maedche, Reference Hummel and Maedche2019; Mertens et al., Reference Mertens, Herberz, Hahnel and Brosch2022) and subsequent methodological critiques (Maier et al., Reference Maier, Bartoš, Stanley, Shanks, Harris and Wagenmakers2022; Szaszi et al., Reference Szaszi, Higney, Charlton, Gelman, Ziano, Aczel and Tipton2022) have sparked debate about the actual effectiveness of applying behavioural insights and nudging in public policy. However, besides the evaluation criteria used in these meta-reviews being confused about inclusion criteria, this debate – even where distinctions are made between categories of interventions (e.g., defaults, simplification, social reference) – tends to evaluate behavioural interventions as a unified class independently of the relations to the specific behavioural problems they aim to address or the psychological mechanisms that supposedly mediate effects (Szaszi et al., Reference Szaszi, Palinkas, Palfi, Szollosi and Aczel2018; reiterated in Szaszi et al., Reference Szaszi, Higney, Charlton, Gelman, Ziano, Aczel and Tipton2022). The latter is particularly problematic as behavioural interventions do not constitute a single, uniform class of interventions with universal effects, but rather a diverse set of insights that mediate through different psychological mechanisms that work differently relative to different contexts and behavioural problems (Hansen, Reference Hansen2019). Thus, attempting to evaluate ‘the average effectiveness of behavioural interventions’ is like attempting to evaluate ‘the average effectiveness of medication’ without regard for them being different medications and whether a study matches a medication with the relevant disease (Hansen, as cited in ScienceDK, 2022).
BIAS: a systematic approach for testing behavioural insights concepts
In the wider literature on Behavioural Insights and nudging, the debates about the effectiveness of behavioural interventions have led to an emerging discussion in what is becoming two streams of reception.Footnote 1 The first, which might be termed scientific universalism, seeks generalisable insights and adheres to the nomothetic foundations of behavioural science. The second, and growing stance, which might be termed contextualism, adopts a more ideographic position, asserting that contextual factors not only may moderate outcomes, but even presuppose the very efficacy of behavioural interventions – such that their real-world effectiveness cannot be understood apart from cultural and subjective perceptions of those nudged. The latter stream of contextualism often points to debates arising from recent meta-reviews and methodological critiques as evidence in their favour.
At our research centre, we adhere to the former stream for reasons of consistency with the behavioural sciences that goes beyond the scope of this paper. Here it suffices to say, that we have developed and follow a structured research approach that conceptualises and advances Behavioural Insights as Applied Science (BIAS). This approach draws explicit parallels to methodologies from medical and clinical research when it comes to evidence-building.Footnote 2 Rather than treating Behavioural Insights as a collection of universal and generalisable persuasive techniques that work independently of the nature of behavioural problems, BIAS frames the field as a cumulative and systematic science, focused on identifying, developing and rigorously testing Behavioural Insight Concepts (BICs) relative to such problems; but rather than treating behavioural problems merely as a matter of contextualism, it works on the assumption that there exist generic behavioural problems sharing an underlying psychological structure.
To elaborate, a BIC is a structured intervention concept that integrates behavioural insights and is developed to address a specific generic behavioural problem, e.g., missed appointments, that may be adapted further to specific domains, e.g., DNAs in healthcare settings. BIAS insists, like Comparative Effectiveness Research (CRE) in the clinical sciences (see Greenfield & Rich, Reference Greenfield and Rich2012, for CRE principles), that interventions must be evaluated in relation to which treatment will work best, in which patient, and under what circumstances. However, the diagnosis or description of the behavioural problem is regarded as having an underlying generic psychological structured situation – parallel to diseases in medical research – comprising the most crucial component of these circumstances. Like CER, BIAS insists on evaluating BICs in relation to the type of behavioural problem they target, rather than pooling everything into generic meta-estimates of averages across domains. Typically, the development of a BIC is embedded within a broader diagnostic framework, in our case BASIC (Hansen, Reference Hansen2019), which ensures that interventions are grounded in a behavioural understanding of the problem context rather than merely assumed. Without a proper diagnosis, ‘researchers risk testing interventions that either appear effective but cannot be scaled or sustained due to poorly understood mechanisms or fail entirely without a clear explanation’ (Osman et al., Reference Osman, McLachlan, Fenton, Neil, Löfstedt and Meder2020). The implication is that this results in wasted resources.
In particular, the BIAS approach currently structures the testing of BICs into three sequential experimental phases, closely aligned with the logic of clinical trials:
1. Phase 1: Providing Experimental Proof of Concept. The first phase establishes efficacy, that is, it establishes whether the BIC can affect behaviour under controlled conditions, providing initial evidence that the underlying behavioural insights translate into measurable change.
2. Phase 2: Confirming Efficacy through Replication. The second phase evaluates whether the BIC reliably produces effects across similar, if not identical, instances of the same underlying behavioural problem, rather than in a single controlled setting. The aim is not to test every possible context, but to establish whether the mechanism embedded in the BIC is robust enough to support generalisation beyond the original trial. This step addresses concerns about replicability without collapsing into contextualism: variation across domains or populations is considered only to the extent that it tests whether the BIC addresses the same generic behavioural problem through the same underlying mechanism. A successful Phase 2 replication shows that the concept’s efficacy is not an artefact of the original trial’s design, setting or statistical luck (cf. Spiegelhalter, Reference Spiegelhalter2019), thereby justifying progression to Phase 3, where moderators, context interactions and other refinements are examined systematically.
3. Phase 3: Studying Moderations, Moderators and Comparative Dimensions. The third phase examines the conditions and refinements under which a BIC works, addressing both theoretical and practical considerations for policy implementation. This includes systematically identifying moderators across what we call the ‘3Cs’ – cognition (e.g., individual differences such as age, gender, cognitive style), context (institutional settings, delivery mechanisms) and culture (shared norms and values shaping interpretation of interventions). In addition to moderators, Phase 3 investigates moderations, encompassing both variations of the intervention and targeted refinements to enhance performance, which, if significant, may be treated as a sort of sub-BICs in themselves. Beyond mechanism-level analysis, Phase 3 expands the evidential base by comparing the BIC with alternative interventions (akin to comparative effectiveness in medicine), assessing cost–benefit profiles, potential side effects and ethical acceptability. Finally, this phase allows exploration of sub-hypotheses that can inform theory development and guide adaptive design in future applications.
A central motivation behind BIAS is to address the mixed findings in the Behavioural Insights literature and practice. We assume that such mixed results often arise not simply from contextual variability but from the absence of proper behavioural diagnosis prior to intervention. Without diagnosis, practitioners risk overlooking the fact that many behavioural challenges – such as DNAs – are instances of generic behavioural problems that recur across contexts. By failing to establish whether an intervention aligns with the actual behavioural problem at play, interventions may target the wrong problem, leading to inconsistent outcomes. These inconsistencies are then often attributed to ‘contextual factors’, while meta-analyses, which typically pool diverse interventions as a single class, conclude limited or no effects on average. BIAS positions behavioural diagnosis as a foundational step in treating BIAS. By embedding diagnosis and structured testing within a systematic framework, BIAS aims to improve the validity and replicability of BICs and strengthen their cumulative contribution to public policy. In particular, BIAS provides an evidential hierarchy that serves to coordinate expectations and inform meta-reviews like how it is done in clinical research.
A quasi-experimental Phase 2 trial of SMS reminder framing to reduce missed hospital appointments
Returning to health policy, Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) is widely regarded as a landmark study in applying Behavioural Insights to healthcare. Conducted across 2 large-scale randomised controlled trials involving approximately 19,800 outpatient appointments in the UK, the study tested whether integrating behavioural insights into SMS reminders could reduce DNAs. Trial 1 (four arms; N ≈ 10,000) compared a standard reminder (control) with three variations: a DSN (‘9 out of 10 patients attend’), a message highlighting the SIC of a missed appointment to the NHS (‘Not attending costs the NHS £160’) and an ‘Easy Call’ message facilitating cancellation. The cost-framed message significantly reduced DNAs from 11.1% to 8.4% (OR 0.74, 95% CI [0.61, 0.89], p < 0.01), while a DSN message showed no effect, but slightly increased cancellations (8.8% to 10.5%, OR 1.23, 95% CI [1.02, 1.48], p = 0.03). Trial 2 (four arms; N ≈ 9,800) retained the SIC message as the reference and introduced two new insights: empathy (‘Please be fair to others waiting’) and accountability (reminding patients that a missed appointment would be recorded), along with a general cost message. Again, the SIC message produced the lowest DNA rate (8.2%), compared to 9.9% for the general cost framing (OR = 1.22) and 10.7% for empathy (OR = 1.33).Footnote 3
In the present context, we take experiment 1 in Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) to suggest Phase 1 experimental proof of concept that the sub-BICs of integrating the SIC of missed appointments into the well-tested BIC of SMS reminders can successfully reduce DNAs, and integrating a DSN did not show any effect, though it slightly increased cancellations. Hallsworth et al. do not explain at greater length the diagnostic assumptions behind the addition of these sub-BICs. Relative to the SIC, it is said that ‘The patient may not be aware that missing an appointment incurs a cost. Even if they are, the cost may not be salient, since it is likely to be seen as just an “opportunity cost” – i.e., the loss of an opportunity to do something more productive’ and there is evidence emerging that highlights the SIC influence on behaviour. Relative to a DSN, it is said that ‘There is evidence from other fields that people overestimate the extent to which others perform acts that are non-optimal (or which cause harm to others), and that correcting these perceptions can change behaviour’. Still, we take these explanations to be well-aligned with diagnostic assumptions that may be identified using BASIC (Hansen & OECD, Reference Hansen2019), which associates a DSN with influencing Choice through a preference for social conformity; and the SIC with supporting Determination relative to keeping one’s appointment by leveraging social norms about not imposing cost on the collective, and amplified by its specificity.
From the perspective of BIAS, however, the evidential base for these sub-BICs remains incomplete. Hallsworth et al. provide Phase 1 evidence of efficacy of highlighting the SIC in SMS messages under controlled conditions, but as experiment 2 did not include a control, only retaining the SIC message as a reference, it does not count as a Phase 2 replication. Thus, before making claims about robust efficacy, progressing to Phase 3 (where moderators, refinements and comparative dimensions are more systematically explored), or implementing the intervention at scale, BIAS insists on establishing whether these effects replicate in a new trial that targets the same generic behavioural problem through the same hypothesised mechanism, but in a similar operational context. Phase 2 replication is thus not an exercise in contextual exploration, but a test of actual efficacy and, in turn, diagnostic robustness: do the sub-BICs perform consistently when applied to another health system with comparable reminder infrastructure and appointment processes, given the same generic problem and comparable delivery mechanisms? Confirming or disconfirming this assumption is critical because, if the effects do not replicate, the efficacy of highlighting the SIC as well as the diagnostic inference underlying it could be called into question.
The present study addresses this gap by conducting a quasi-experimental Phase 2 trial at a major Danish hospital. Using a large sample across three outpatient departments, we examine the impact of SMS reminders framed either with a DSN or the SIC relative to a standard reminder. This design allows us to evaluate the robustness of the high-profile sub-BIC of highlighting the SIC under real-world conditions and to contribute to the cumulative science of Behavioural Insights by adhering to the structured testing approach advocated by BIAS.
Methods
As our experiment concerns assessing whether previously observed causal effects replicate under comparable operational conditions, the structure of this methods section is adapted from Cronbach (Reference Cronbach1982) and Shadish et al. (Reference Shadish, Cook and Campbell2002) concept of UTOS – units that receive the conditions being contrasted, the treatments themselves, observations made on the units and the settings in which the study is conducted. Rearranging these and adding the goal of the experiment (as determined by its phase), we get the mnemonic GUSTO, which stands for Goals, Units, Setting, Treatment and Observations, and reports on these elements the sequence most suitable for laying out the experiment. Including ‘Goals’ reflects BIAS’s emphasis on an evidential hierarchy: each phase serves a distinct purpose in cumulative testing (proof of concept, replication or exploration). Structuring the methods section around GUSTO ensures transparency about how design choices align with the phase-specific goal and the overarching process of building robust evidence.
Goal
The experiment is a Phase 2 replication, testing whether the sub-BICs of DSNs and institutional costs, respectively – when integrated into the established BIC of text reminders – affect patient behaviour in the setting of hospital appointments. Specifically, the study examines whether these framings increase advance cancellations and thereby, or independently, reduce DNAs, the consequence of the underlying behavioural problem of choosing not to attend.
Units
The experimental unit was the individual outpatient appointment scheduled within eligible departments. The trial took place at Sydvestjydsk Sygehus (South-western Jutland Hospital, hereafter SVS), a regional hospital in Southern Denmark serving approximately 390,000 outpatient visits annually. Only departments using the scheduling system Bookplan were included, as this system allows for customised SMS reminders – a requirement for implementing the treatment conditions – unlike the national Nem-SMS system, which is restricted to 160 characters (see Figure 1). All included patients had attended at least one prior appointment, ensuring both phone number registration in Bookplan and consent for SMS communication. Patient confidentiality was preserved through anonymisation of CPR numbers, and ethical clearance was granted by Danish Regions (the coordinating body for Denmark’s five regional health authorities) and by SVS.
The figure shows the messages for different groups within the Cardiology department as they were designed to appear on the phone of patients.

These represent sub-BICs (significant moderations) of the established BIC of text reminders. Assignments rotated every 2 months (see Table 1). The experiment was discontinued on 16 March 2020 following nationwide COVID-19-related changes that paused SMS systems and reprioritised hospital operations.
Assignment of hospital departments to intervention conditions (Social Norm, Cost, Control) across three consecutive periods during the experimental phase

Patients in all conditions received at least one SMS reminder 2 days or less before an appointment, and only these were modified. Messages sent earlier retained the hospital’s standard text to ensure consistency across conditions and avoid message overlap. Although patients might receive more than one reminder text within 2 days before the appointment, they were not given different messages to avoid overlap with other treatment conditions. Departmental staff implemented the schedule, and compliance was verified via internal reporting to the research team.
The dataset included all outpatient appointments in the three departments between 1 October 2019 and 16 March 2020. For each appointment, the following variables were extracted, with 20,867 appointments retained in the final sample (see Figure 2). Table 2 summarises the distribution by treatment group and gender.
Criteria used to filter the dataset, resulting in 20,867 data points.

This table summarises the total number of appointments included in the study (N = 20,867) across three hospital departments – Endocrinology, Pulmonary and Cardiological – by treatment groups and gender

Observations (outcome measures)
Two primary outcomes were evaluated:
• DV1: Advance Cancellation Rate – The proportion of appointments cancelled by patients at least 1 day before the scheduled date, reflecting behavioural compliance with the request to reschedule rather than default to inaction.
• DV2: No-show Rate (DNAs) – The proportion of appointments where patients neither attended nor cancelled in advance, representing the behavioural problem of non-attendance.
Both outcomes allow us to evaluate whether the effects of interventions grounded in these assumed mechanisms – social conformity for DSN and strengthened determination through highlighting the SIC – replicate when applied to a new instance of the same behavioural problem under real-world hospital conditions. We have summarised the experimental design of this study in Figure 3.
Experimental design.

Results
We hypothesised that including a DSN message in the SMS reminder would increase behavioural compliance by influencing decisions to attend or cancel in advance, as reflected in higher advance cancellation rates and/or lower DNA rates compared to the Control group (H1). Alternatively, we hypothesised that emphasising the SIC of missed appointments to the healthcare system would produce similar effects relative to the Control group (H2). To evaluate these hypotheses, we conducted a series of logistic regression analyses using cancellation and DNA as outcome variables. In Analysis 1, we compare outcomes between patients in the DSN group and the Control group, and between the SIC group and the Control group.
As noted in the Introduction, demographic variation and departmental differences may influence observed effects. To account for this and ensure robustness, we conduct two additional analyses. In Analysis 2, we include patient age and gender as covariates. In Analysis 3, we examine whether treatment effects are consistent across the three participating departments by segmenting the analysis by specialty.
Analysis 1
To test whether cancellation and DNA rates differed between treatment groups and the Control group, we estimated logistic regression models comparing outcomes for patients in the DSN (Treatment 2) and SIC groups (Treatment 3) relative to the Control group (Treatment 1). Formally, this analysis can be summarised as follows for each outcome variable:
Percentages for the different outcomes by treatment variables are shown in Figures 4 and 5, and the results for the inferential statistics are shown in Table 3.
Cancellation rate by treatment group. Error bars represent Wilson score 95% CIs.

DNA rate by treatment group. Error bars represent Wilson score 95% CIs.

This table presents logistic regression results showing the effect of each treatment on the odds of cancellation and no-show. Each treatment condition was compared to the Control condition. Only the effect of Cost on Cancellation rate was statistically significant

Cancellation rates
No significant difference in advance cancellation rates was observed between the Control group and the DSN group. Patients in the SIC group had an advance cancellation rate of 3.42%, compared to 2.45% in the Control group (OR = 1.41, 95% CI [1.16, 1.72], p < 0.001). This corresponds to a 41% increase in odds of cancelling in advance, although the absolute difference was under 1 percentage point.
DNA rates
No statistically significant differences in DNA rates were observed between the DSN group and the Control group, between the SIC group and the Control group, and between the two treatment groups.
In sum, highlighting the SIC significantly increased advance cancellations, whereas neither treatment significantly reduced DNAs.
Analysis 2
To test whether demographic variables influenced the outcomes, the analyses above were repeated with age and gender included as covariates in the model for each outcome variable. More specifically:
Figures 6 and 7 present outcome percentages by gender, and Table 4 provides the corresponding inferential statistics. Inclusion of demographic variables did not alter treatment effects; therefore, we report only the demographic results below.
Cancellation rate by sex. Error bars represent 95% CIs for the model-standardised mean probability; because estimates are averaged over a large sample with low event rates, the CIs are very narrow.

DNA rate by sex. Error bars represent 95% CIs for the model-standardised mean probability; because estimates are averaged over a large sample with low event rates, the CIs are very narrow.

This table shows the logistic regression results, controlling for age and gender, by treatment model. Odds ratios, confidence intervals and significance are reported for each predictor variable

Cancellation rates
Older age was associated with significantly lower odds of cancelling in advance (OR = 0.98, 95% CI [0.97, 0.98], p < 0.01), indicating that each additional year of age slightly reduced the likelihood of cancellation. No significant differences in cancellation rates were observed for gender.
DNA rates
Older patients were also significantly less likely to DNA (OR = 0.98, 95% CI [0.96, 0.99], p < 0.01), with the odds of DNA decreasing gradually with age.
Male patients were significantly more likely to miss appointments without cancelling: the DNA rate was 5.13% for men vs 3.62% for women (OR = 1.56, 95% CI [1.32, 1.84], p < 0.001).
These demographic patterns did not modify treatment effects, which remained consistent with Analysis 1.Footnote 4
Analysis 3
To examine whether the results from Analysis 1 remained consistent across clinical subspecialties, we repeated the logistic regression analyses separately for each department.
Cardiology department
Cancellation rates
In the Cardiology Department, patients in the SIC group had an advance cancellation rate of 2.04%, compared to 0.84% in the Control group (OR 2.47, 95% CI [1.47, 4.15], p < 0.001), indicating that integrating the SIC in the SMS increased the likelihood of cancelling in advance. No other significant findings were observed.
DNA rates
Patients in the DSN group had a significantly lower DNA rate of 4.03% compared to 5.23% in the Control group (OR 0.76, 95% CI [0.60, 0.97], p = 0.027), indicating that it led to a reduction in DNA. No other significant findings were observed.
Endocrinology department
Cancellation rates
In the Endocrinology Department, patients in the DSN group had an advance cancellation rate of 4.76%, compared to 3.04% in the Control group (OR = 1.59, 95% CI [1.22, 2.09], p < 0.001), suggesting that integrating the DSN in the SMS reminder effectively increased advance cancellations in this department.
Patients in the SIC group had a cancellation rate of 4.46%, also significantly higher than the Control group (OR 1.49, 95% CI [1.16, 1.91], p < 0.01). No other significant findings were observed.
DNA rates
Patients in the SIC group had a DNA rate of 5.09%, which was significantly larger compared to 3.94% in the Control group (OR 1.31, 95% CI [1.04, 1.64], p = 0.021), suggesting that highlighting the SIC increased DNA rates in this department. No other significant findings were observed.
Pulmonary department
Cancellation rates
In the Pulmonary Department, patients in the SIC group had advance cancellation rate of 2.97%, compared to 4.68% in the Control group (OR 0.62, 95% CI [0.41, 0.95], p = 0.029), indicating that integrating the SIC in the SMS reminder reduced the likelihood of advance cancellations in this department. No other significant findings were observed.
DNA rates
No significant differences in no-show rates were observed across groups.
Summary of analysis 3
Treatment effects varied across departments, with no consistent pattern across specialties. Figures 8 and 9 summarise the descriptive statistics of department-wise DNA and cancellation rates and Table 5 summarises the inferential statistics.
Cancellation rate by Department and Treatment group. Error bars represent Wilson score 95% CIs.

DNA rate by Department and Treatment group. Error bars represent Wilson score 95% CIs.

This table summarises logistic regression outcomes by department, showing how DSN and SIC treatments affected odds of cancellation and DNA relative to control

Discussion
This study illustrates why Phase 2 diagnostic replication is essential for advancing BIAS. Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) provided Phase 1 proof of concept for integrating the SIC and a DSN into SMS reminders for hospital appointments, reporting that SIC messages significantly reduced DNAs from 11.1% to 8.4% in Trial 1, while DSN messages had no aggregate effect but slightly increased cancellations. Although the study included two trials involving approximately 19,800 appointments, the second trial did not function as a replication but as an exploratory refinement: it retained the SIC message as the reference and replaced the DSN with new framings (empathy and accountability), thereby shifting the intervention logic rather than testing robustness.
However, the evidential base for these interventions remains incomplete without systematic Phase 2 replication. Behavioural interventions are not designed to eliminate behavioural problems entirely but to mitigate them by shifting probabilities of action under real-world conditions. Whether such mitigation occurs depends on whether the BIC effectively addresses the behavioural problem via the assumed mechanisms driving the problem, and whether those mechanisms remain operative alongside other constraints. Phase 2 serves this purpose: moving beyond initial proof of concept to test whether a BIC and its diagnostic assumptions demonstrate efficacy across new instances of the same underlying behavioural problem. This logic contrasts with interpretations that attribute mixed results primarily to contextual idiosyncrasies, such as cultural differences. While context may matter, premature contextual explanations risk obscuring a more fundamental question: does the BIC align with the correct behavioural diagnosis and perform consistently under comparable operational conditions? Establishing this is a necessary step before progressing to Phase 3, where moderators, moderations and contextual interactions can be examined systematically.
Across more than 20,000 outpatient appointments in three hospital departments, neither integrating a DSN nor highlighting the SIC produced consistent improvements in patient attendance relative to the standard reminder. This contrasts with Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015), who observed significant reductions in DNA rates for highlighting the SIC in SMS reminders in their Trial 1. Overall, in our study, highlighting the SIC significantly increased advance cancellations, but the absolute effect size was small and did not translate into fewer missed appointments (DNAs). By comparison, Hallsworth et al. reported that highlighting the SIC reduced DNA rates from 11.1% to 8.4% in their first trial (OR = 0.74), while integrating a DSN showed no aggregate effect. In our study, integrating a DSN similarly showed no significant effect in the aggregate analysis (Analysis 1), although a modest reduction in DNAs was observed in the Cardiology department (Analysis 3). Conversely, in Endocrinology, highlighting the SIC was associated with a slight increase in DNAs despite higher cancellation rates, while in Pulmonology, it reduced cancellations rather than increasing them. Table 6 compares the differences between Hallsworth’s original study and our replication.
This table compares Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) (Experiment 1) with our replication

a Statistically different from Control.
These mixed patterns suggest that while our design mirrored the original intervention’s logic, its effects did not replicate under comparable operational conditions. However, the demographic associations observed in our study – older patients less likely to miss or cancel, and men more likely to DNA – align with both prior research (Nørgård et al., Reference Nørgård, Iachina, Ammentorp, Schwalbe, Waidtløw, Richardt and Sodemann2025) and the patterns reported by Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015). This consistency supports the interpretation that the present study functioned as a valid Phase 2 replication within the same diagnostic frame, despite divergent results on treatment efficacy. This highlights the need for systematic Phase 2 testing before progressing to Phase 3 explorations of contextual moderators, moderations and comparative effectiveness.
Strengths and limitations
This study has several strengths that enhance its relevance for both research and practice. It uses a large sample of more than 20,000 hospital appointments across three distinct specialties, providing robust statistical power and variation in patient populations. The quasi-experimental design, implemented under routine operational conditions, ensures high ecological validity and policy relevance. The design differed from the randomised trials reported by Hallsworth et al., but the time-blocked allocation helped maintain comparability and internal validity while accommodating hospital scheduling constraints.
At the same time, certain limitations warrant caution. The absence of randomisation means that unobserved temporal or organisational factors could have influenced outcomes, although the time-blocked design mitigates some of this risk. Additionally, the study did not capture structural or psychological variables – such as transport constraints, comorbidities or planning habits – that may affect responsiveness to reminder framings. Finally, treatment fidelity was maintained to mirror the original intervention design, focusing only on message content. While this is appropriate for Phase 2 replication, it limits conclusions about whether additional operational features – such as interactive links or multiple prompts, which would belong in Phase 3 testing – could improve effectiveness. These limitations underline the importance of further Phase 2 replications and controlled comparisons before broader implementation or progression to Phase 3 refinement testing.
In the Introduction, we wrote that missed medical appointments might be due to forgetting. However, what patients report may reflect surface-level explanations that mask deeper issues. Several authors have suggested that forgetfulness may sometimes serve as a socially desirable explanation, masking more uncomfortable or stigmatised reasons such as anxiety, ambivalence about treatment or dissatisfaction with care (Quinn et al., Reference Quinn, Detman and Bell-Ellison2008; Parsons et al., Reference Parsons, Bryce and Atherton2021). Patients might prefer to describe their absence as simple forgetfulness rather than admit to financial strain, emotional distress or mistrust in the healthcare system. Recognising this possibility is important, as it suggests that the apparent simplicity of ‘forgetting’ may conceal deeper barriers that require more nuanced and supportive interventions.
We might never be able to detect the real reasons behind why patients ‘forget’. However, with advances in artificial intelligence, healthcare providers now have access to more predictive and data-driven methods that might help BI professionals design and test more nuanced and perhaps better interventions. One promising example is Deep Learning, a branch of machine learning that uses multi-layered neural networks to automatically detect patterns in large datasets (Dashtban & Li, Reference Dashtban and Li2022). Dashtban & Li (Reference Dashtban and Li2022) combined electronic patient records with socioeconomic and environmental data to predict non-attendance in the healthcare system. Based on that data, they were able to categorise the risk of DNA for different patients. This approach outperformed traditional statistical models.
The point is that this method allows us to ask more nuanced questions, such as whether a subset of patients at risk for DNA are more likely to attend if they are called in advance, which allows staff to address barriers such as transportation or personal challenges – something that text reminders alone cannot achieve (Dashtban & Li, Reference Dashtban and Li2022). BI professionals should at least experiment with using AI methods to assist them in their work of designing interventions to be tested.
Policy implications
From a policy perspective, these findings argue for caution. While highlighting the SIC reliably increased advance cancellations in aggregate analyses, it did not reduce DNAs overall and produced inconsistent patterns across departments, including a small but statistically significant increase in DNAs in one specialty. Consistent with Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015), integrating a DSN in the SMS reminder showed no aggregate effect and only a limited benefit within a single department. Taken together, these results do not justify the broad implementation of either sub-BICs at scale. We would like to stress to policymakers and hospital administrators reading this that the lack of replication here does not mean that behavioural interventions in reducing DNAs do not work. Instead, they support a ‘stop-and-hold’ position: before such interventions are embedded into routine practice, further Phase 2 replications are needed to determine whether these effects are robust, or whether their variability reflects underlying diagnostic limitations or competing constraints that limit behavioural responsiveness.
Although we have found evidence that the interventions do not work, implementing them would probably not yield worse outcomes than what is currently done (i.e., sending generic appointment reminders). However, we hope that policymakers and hospital administrators who wish to implement behavioural interventions to reduce DNAs – such as the ones described in this paper – reach out to BI professionals to test interventions and avoid unnecessary spending related to the infrastructure necessary to build these new interventions in place, especially in health systems operating under tight efficiency constraints.
Diagnostic implications
Our findings, considered alongside prior evidence, suggest two plausible explanations for the inconsistent effects observed in this Phase 2 trial. First, the performance of integrating a DSN or highlighting the SIC may depend on operational features beyond message content, such as message timing, interactivity or opportunities for personalisation. Studies reporting stronger effects often used designs that incorporated these elements: Berliner Senderey et al. (Reference Berliner Senderey, Kornitzer, Lawrence, Zysman, Hallak, Ariely and Balicer2020), for example, included confirmation links and obligation-based language, while Groden et al. (Reference Groden, Capellini, Levine, Wajnberg, Duenas, Sow, Ortega, Medder and Kishore2021) required patients to sign and record their next appointment. Such enhancements may amplify the intended mechanism – or, as discussed below, they may engage a different mechanism altogether. Compared to these designs, our intervention – a single SMS sent 2 days prior without additional features – may not have incorporated elements that strengthen the relevant behavioural mechanisms.
A second explanation concerns the nature of the behavioural problem. DNAs are rarely a matter of isolated choice; they involve sustaining a prior intention in the face of competing goals. Framings that integrate a DSN or reference the SIC appeal to conformity or abstract responsibility but do little to reinforce individual commitment to act. Evidence from prior studies aligns with this interpretation: the interactive features in Berliner Senderey et al. (Reference Berliner Senderey, Kornitzer, Lawrence, Zysman, Hallak, Ariely and Balicer2020) and the signing requirement in Groden et al. (Reference Groden, Capellini, Levine, Wajnberg, Duenas, Sow, Ortega, Medder and Kishore2021) may both operate by activating a stronger commitment rather than by appealing to social conformity or cost imposed on the collective. Similarly, Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015) observed higher DNA rates among patients whose previous appointments had been cancelled, suggesting that commitment fragility predicts DNAs. Finally, unmeasured structural or psychological factors – such as competing goals, psychological challenges or differences in planning habits – may also account for the absence of consistent effects, although these were beyond the scope of the present analysis. If the commitment hypothesis proves promising, interventions that actively reinforce determination – such as confirmation requests or commitment prompts – would warrant new Phase 1 testing before progressing to replication.
Advancing BI as applied science
These findings underscore the value of approaching Behavioural Insights through a structured evidential hierarchy, as advocated by BIAS, rather than ad hoc experimentation or meta-analyses that aggregate heterogeneous interventions. If this framework had been adopted systematically following Hallsworth et al. (Reference Hallsworth, Berry, Sanders, Sallis, King, Vlaev and Darzi2015), subsequent efforts could have prioritised Phase 2 diagnostic replications – such as the one presented here – before progressing to Phase 3 activities such as exploring moderators, making cross-intervention comparisons or pooling results in meta-analyses. Instead, the field often moved prematurely to implementation and broad claims of generalisability, leaving key diagnostic assumptions untested. The BIAS approach addresses this problem by situating each BIC within a staged process of validation: Phase 1 establishes proof of concept, Phase 2 tests robustness across new instances of the same behavioural problem, and only after consistent evidence emerges does Phase 3 examine moderators, refinements and comparative effectiveness. This sequencing matters: without Phase 2 replication, it is impossible to know whether null or mixed effects reflect contextual idiosyncrasies, suboptimal operational features or a deeper mismatch between mechanism and behavioural problem. More broadly, adopting a BIAS framework would allow meta-analyses to aggregate evidence within clearly defined diagnostic categories rather than collapsing diverse interventions into a single estimate of ‘average effectiveness’. We hope that future research and policy evaluation in Behavioural Insights will increasingly adopt such structured approaches – both to strengthen the cumulative science of the field and to enable meta-analyses that produce meaningful, problem-specific insights rather than misleading generalisations.
AI declaration
The authors confirm that AI tools (specifically our AI colleague ‘Sally’ – a ChatGPT-based agent) were used to assist with language refinement and phrasing during the check of the final draft of this manuscript, the cover letter and this declaration. All content; research decisions, analysis and narrative interpretations are the original work of the authors. Also, in line with Cambridge University Press policy:
• AI tools have not been listed as authors.
• All use of generative AI has been transparently declared.
• The authors remain fully accountable for the integrity of the work.