Enhancing Reporting of After Action Reviews of Public Health Emergencies to Strengthen Preparedness: A Literature Review and Methodology Appraisal

ABSTRACT Objective This literature review aimed to identify the range of methods used in after action reviews (AARs) of public health emergencies and to develop appraisal tools to compare methodological reporting and validity standards. Methods A review of biomedical and gray literature identified key approaches from AAR methodological research, real-world AARs, and AAR reporting templates. We developed a 50-item tool to systematically document AAR methodological reporting and a linked 11-item summary tool to document validity. Both tools were used sequentially to appraise the literature included in this study. Results This review included 24 highly diverse papers, reflecting the lack of a standardized approach. We observed significant divergence between the standards described in AAR and qualitative research literature, and real-world AAR practice. The lack of reporting of basic methods to ensure validity increases doubt about the methodological basis of an individual AAR and the validity of its conclusions. Conclusions The main limitations in current AAR methodology and reporting standards may be addressed through our 11 validity-enhancing recommendations. A minimum reporting standard for AARs could help ensure that findings are valid and clear for others to learn from. A registry of AARs, based on a common reporting structure, may further facilitate shared learning. (Disaster Med Public Health Preparedness. 2019;13:618-625)

P ublic health emergencies, such as infectious disease outbreaks, floods, and terrorist attacks, impact societies severely but are relatively rare for individual countries. However, this national rarity provides an impetus to systematically learn from emergencies when they do occur, so as to strengthen public health emergency preparedness and response planning. 1 One such learning approach is to conduct an after action review (AAR), or a lessons learned document. These documents are completed after a public health emergency has occurred and draw on quantitative and qualitative methods to identify strengths and weaknesses in the public health emergency preparedness system. By addressing any weaknesses identified, they aim to improve preparedness, response, and recovery capacities and capabilities, ultimately lessening the impact of future incidents. 2,3 Typically, documentation and other quantitative factfinding methods help establish a skeleton timeline of events, whereas different forms of qualitative investigation, such as personal testimony, provide richer insights into how and why events unfolded. Combined, these approaches aim to establish the root causes of the event and to identify what lessons can be learned for the future. [2][3][4][5][6][7][8][9] Despite the crucial role of AARs in linking the past with the present and future, there is no widely used or standardized approach to conducting AARs of public health emergencies. Particularly, there is no indication of whether insights gained are valid or based on robust methodologies. 1,9 This literature review aimed to identify the range of methods used to produce AARs to improve emergency preparedness planning and to develop appraisal tools to compare their methodological reporting and validity standards, with a focus on qualitative methods. response to an emergency (theoretical or "table-top" exercises were excluded), were within the geographic scope of the literature review (the European Union, Australia, Canada, New Zealand, and the United States), and were published in English from January 2000 to August 2015.
Search strategies were structured around 2 major concepts: AARs and emergency preparedness. Searches combined free text and thesaurus terms (where available), including synonyms such as "post-event analysis" and "critical incident review" and techniques used within AARs such as "facilitated look back" and "root-cause analysis" (Supplemental Information [SI] 1). Additional search terms and synonyms were identified by scanning the abstracts of articles identified through a scoping search. Additional AARs were identified by searching the Endnote Library for a previous review undertaken for the European Centre for Disease Prevention and Control (ECDC), looking for evaluations of emergency response. 10,11 Reviews were sifted for relevance first on title and abstract and then on full-text review (Figure 1, PRISMA diagram). Studies excluded at the full-text stage can be found in SI-2.

Development of Appraisal Tools
We developed 2 appraisal tools to systematically document the methods used in AARs, to compare methodological reporting and validity between diverse AARs, and to act as a benchmark of theoretical best practice.
We adapted the approach of Woloshynowych 12which related to the analysis of after actions in health careto an emergency public health context by triangulating it with 9 contemporary AAR templates. 5,[13][14][15][16][17][18][19][20] The templates were identified through targeted scoping searches in Google, using synonyms for AARs and templates. These templates were multi-sectorial, coming from after action reports, a significant event analysis, and peer assessments in the fields of US national defense, 14 US state government, 13 UK medicolegal, 17 Canadian health care insurance, 20 international emergency public health, 5,16 a UK hospital, 15 and patient safety agencies (See SI-2). 18,19 Further tool modifications were made in consultation with an expert advisor to increase its relevance to emergency public health. This resulted in a 50-item appraisal tool (SI-3).
Adapting the approach of Piltch-Loeb, 5 we developed an additional 11-point summary tool of factors that boosted methodological rigor in case study and qualitative data collection and analysis.
The original Piltch-Loeb 10-point tool remained intact with minor revisions in definitions to better reflect the context of AARs in emergency public health. We added an 11th factor to capture whether the AAR had ultimately achieved its aim

Appraising the After Action Reviews
The 50-item appraisal tool (SI-3) and 11-item summary measure (SI-4) were applied sequentially to each AAR. First, the 50-item tool was used to systematically document the methods undertaken by each AAR, before being summarized in the 11-item measure, allowing for a simpler comparison of methodology and validity across diverse reviews.
AARs were reviewed against each item on the summary validity tool and assigned one of 3 codes. Fully met ( + + ): These criteria have been fully and often comprehensively met, and we have little doubt that these criteria have been met. Partially met ( + ): The criteria have been met in some regards, but there is significant doubt about the comprehensiveness or there are clear elements missing, preventing a higher rating. Not met (-): These criteria are not met or have not been reported.
A sample of 3 AARs was independently coded by a second reviewer to test the reliability of the coding instrument and to clarify initial rating definitions. The second rater was blind to the first rater's scores and rationales. Given the size of the sample, inter-coder agreement was not calculated. Differences between the 2 raters were discussed and changes agreed by consensus. This led to revisions in the wording of some criteria and scoring guidance to improve clarity and therefore scoring consistency. Definitions of the criteria and additional notes used to guide rating decisions are described in SI-3.

RESULTS Overview
Our search identified 24 published AAR documents, relating to 22 distinct AARs ( Table 1).

Appraisal of After Action Reviews
There was great diversity in the structure, scope, and level of methodological reporting in the 24 reviews identified, potentially reflecting a lack of a standardized approach ( Table 2).  The majority drew heavily on qualitative methods, but the use of established techniques to ensure rigor was routinely missing from the published reports.
Validity boosting measures most frequently reported in the 24 reviews included spending adequate time to observe the setting, people, and incident documentation; sampling a diverse range of views; using multiple sources of data collection; and utilizing multiple perspectives during the analysis.  However, these techniques were generally reported in brief, with few reviews fully meeting all 4 basic validity dimensions.
The criteria that were most commonly unmet in these reports were acknowledging a theoretical basis for the review methodology; describing how the reviewers handled discordant evidence; having an external peer-review process; and ensuring respondents to the reviews had an opportunity to validate that their views had been reflected accurately in the final analysis and report (see Table 2).
The majority of AARs showing depth and insight (9 fully met this validity measure) also clearly reported using multiple data sources (7 of 9) and sustained engagement (5 of 9). Other AARs demonstrated depth and insight without reporting clear methods (see Table 2). 29,34,35,44 Suggestions Based on the systematic assessment of methods and validity measures in 24 AARs, we suggest 11 measures to improve the reporting and validity of reviews more widely (Table 3).

DISCUSSION
To our knowledge, this is the first review to systematically document methods used in public health emergency preparedness AARs across a range of hazards and to formulate suggestions to improve future practice based on principles of qualitative research best practice.
The strengths of this review include our inclusive definition of an AAR, our inclusion of non-health-care specific after actions and reporting templates, and the development of tools rooted in after action methodological research. These tools were applied to a variety of real-world AARs in the field of emergency preparedness spanning multiple hazard types.
The most common data collection methods used by the 24 AARs were document review (typically preparedness plans and protocols compared to execution), focus groups, formal public consultations, in-depth interviews, public discussion forums, questionnaires, site visits, and workshops.
Most reviews (17 of 24) did not report a theoretical framework to guide investigation; of those that did, all reported a comparative or case study methodology. This represents a small fraction of the diverse range of approaches available to after action investigators, including the after action technique 4,8 ; after action analysis 7,45 ; root-cause analysis [46][47][48] ; facilitated look-backs 49 ; peer assessment approach 6 ; realist evaluation 5,9 ; bow-tie analysis 39 ; and serious case reviews. 50 After Action Reviews of Public Boston marathon bomb, US 29 H1N1, EU-wide 27 Underlying methodologies were frequently unreported, so the report validity remained ambiguous. Although a lack of reporting of basic methods to safeguard validity does not necessarily imply that they were not considered or followed, it does significantly increase doubt surrounding the methodological basis of the review and the validity of its conclusions.

Limitations
Our review searched for reports from a diverse range of after actions, but the analyzed sample was small (n = 24) and subject to reporting and selection bias, and may not represent the full spectrum of incident reports available. For example, we excluded 16 studies with insufficient methods for analysis (see SI-2: Excluded Studies) and all reviews not published in English.
Three of the 24 included reviews were used to test and develop early versions of both appraisal tools before their final application to the remaining 21 reports, further reducing the number of independent reviews appraised.
Most AAR reports were not clear on how their data analysis led to generalizable insights by reviewers or how discordant 1 Sustained engagement Reviews should have adequate time for observing the setting, incident documentation, and speaking with a range of people to build a good understanding of the event and its context. Sustained and repeated engagement with the people and processes involved in the incident over time has a higher chance of achieving deeper and valid insight.

Validation of conclusions drawn
Reviews should report the methods they have used to gather and analyze information, and clearly report how these led to the recommendations made. To aid readability, these can be in an appendix but should be easily available for those wanting to assess the review's validity. * 3 Selection of respondents The number and type of respondents interviewed should be clearly described to allow readers to understand which individuals, groups, or data were used to inform the reviewfor example, documenting the number of people interviewed, their job titles, and their role in the emergency response. Without this, readers do not know whether important perspectives, reports, or data were excluded, so are less able to evaluate the review for selection bias. 4 Multiple data sources Reviews should use multiple sources of data collection to ensure that a variety of information is considered, reducing the risk that one potentially biased data source dominates, and increasing the likelihood that fundamental underlying causes and relevant contributory factors of the emergency will be appropriately described. It is common for the most comprehensive reviews to include a combination of personal testimony (through different types of interviews, questionnaires, etc.), document review (PHEP protocols, guidelines, relevant reports on the incident, safety reports before the incident, etc.), and one or more site visits.

Multiple observers
Use of multiple observers to review the interpretation of data can help uncover perception bias and ensure that insights are more roundly developed. 6 Case selection Reviews should describe their rationale for selecting the data they did (the people to interview, protocols for analysis, etc.) to allow readers to clearly understand any potential selection biases. This would include, for example, any sampling strategy for participants selected for interviews; for example, only emergency first responders from health were interviewed due to resource constraints vs. interviews included emergency first responders from police, fire, and ambulance services to get a breadth of perspective. 7 Theoretical construct Reviews may benefit from being more closely aligned with qualitative theoretical frameworks to ensure that recommendations arising address fundamental causes. Reviews should consider applying basic qualitative methods and validity checks to increase the validity of insights gained. 8 Validation by respondents Initial review findings should be checked by respondents to review to ensure the accuracy and relevance of the findings.
9 External peer review Validity of the review may be increased by sharing preliminary or draft findings with public health/emergency response experts not involved in the emergency response for critical comment. Peer review may introduce a fresh and independent perspective on the findings, as well as point out any gaps in the review or analysis. This may also serve to facilitate learning across different sectors and geographies, increase awareness, and build and expand professional networks. 10 Discordant evidence Reviews should discuss any evidence that contradicts initial findings, explanations, and developing theories alongside the main consensus views. The report should show how discordant evidence (from personal testimony, reports, site visits, or in forming improvement plans) has been documented and reconciled. This encourages open and critical assessment of emergent themes when forming key findings and conclusions from the review. 11 Depth and insight Reviews should seek to uncover and report active and latent failures, contributory factors and underlying causes of the emergency, and make specific recommendations to improve systems for preparedness for public health emergencies. Reviews should be explicit in stating how the data were obtained and interpreted to reach the fundamental insights gained to enable recommendations for improvement to emergency preparedness systems. PHEP = public health emergency preparedness. * The development of an evidence-based minimum reporting standard for after action reviews, similar to the Consolidated Standards of Reporting Trials (CONSORT) statement for randomized controlled trials, may facilitate this process and comparisons between AARs. See http://www.consort-statement.org/.
After Action Reviews of Public Health Emergencies Disaster Medicine and Public Health Preparedness information was handled. 22,28,29 As such, it was not clear to what extent certain views or data had been explored or discounted, for example, if they did not fit with the emerging researcher consensus. This risked introducing perception bias into the analysis and conclusions drawn.

CONCLUSIONS
We suggest that the lack of methodological reporting provides a strong case for the development of evidence-based minimum reporting standard for AARs, akin to the CON-SORT statement for randomized controlled trials. These standards could benefit after action reports in 2 ways. First, they may ensure that a wider range of robust methods is considered before and during the review, and, second, that methods are more clearly reported in the end report itself, allowing an external assessment of validity. The 11-point summary tool presented here allows a simple validity comparison to be made across a range of diverse AARs, which could be further developed and refined in the future.
It is noteworthy that critical incident registries have been adopted in transport, health care, and workplace safety industries, but not in emergency preparedness. 5 We thus advocate an AAR registry (similar in nature to the US government's Lessons Learned Information Sharing program) in Europe, to facilitate cross-border learning that will further strengthen emergency preparedness. 51 The 11-point summary validity tool presented here could contribute to such an initiative by promoting an AAR design that is as robust and credible as possible.

Acknowledgment and Author Contributions
This publication is based upon a report produced by Bazian Ltd and commissioned by the ECDC under Direct Service Contract ECD.5860. Robert Davies provided input into project design, performed data extraction, performed data synthesis, and coauthored this manuscript; Elly Vaughan managed the project at Bazian, provided input into project design, designed and ran literature searches, performed data extraction, contributed to the synthesis, and coauthored this manuscript; Dr Robert Cook provided input into project design, reviewed draft reports, and provided project oversight; Dr Graham Fraser, Dr Massimo Ciotti, and Dr Jonathan Suk initiated the study and commissioned the work, provided technical guidance throughout the study, and coauthored this manuscript; Dr Katie Geary provided expert advice throughout the project design and execution, including refining the appraisal tools for a public health emergency context.