A framework to develop semiautomated surveillance of surgical site infections: An international multicenter study

Abstract Objective: Automated surveillance of healthcare-associated infections reduces workload and improves standardization, but it has not yet been adopted widely. In this study, we assessed the performance and feasibility of an easy implementable framework to develop algorithms for semiautomated surveillance of deep incisional and organ-space surgical site infections (SSIs) after orthopedic, cardiac, and colon surgeries. Design: Retrospective cohort study in multiple countries. Methods: European hospitals were recruited and selected based on the availability of manual SSI surveillance data from 2012 onward (reference standard) and on the ability to extract relevant data from electronic health records. A questionnaire on local manual surveillance and clinical practices was administered to participating hospitals, and the information collected was used to pre-emptively design semiautomated surveillance algorithms standardized for multiple hospitals and for center-specific application. Algorithm sensitivity, positive predictive value, and reduction of manual charts requiring review were calculated. Reasons for misclassification were explored using discrepancy analyses. Results: The study included 3 hospitals, in the Netherlands, France, and Spain. Classification algorithms were developed to indicate procedures with a high probability of SSI. Components concerned microbiology, prolonged length of stay or readmission, and reinterventions. Antibiotics and radiology ordering were optional. In total, 4,770 orthopedic procedures, 5,047 cardiac procedures, and 3,906 colon procedures were analyzed. Across hospitals, standardized algorithm sensitivity ranged between 82% and 100% for orthopedic surgery, between 67% and 100% for cardiac surgery, and between 84% and 100% for colon surgery, with 72%–98% workload reduction. Center-specific algorithms had lower sensitivity. Conclusions: Using this framework, algorithms for semiautomated surveillance of SSI can be successfully developed. The high performance of standardized algorithms holds promise for large-scale standardization.

HAI occurred. Only procedures classified as high probability undergo manual chart review to confirm the presence of an HAI. This automated preselection increases standardization and reduces the number of charts requiring manual review by >75% while retaining the possibility of interpreting data on clinical signs and symptoms. 8 In the past decade, multiple publications have described automated surveillance with good performance, [9][10][11][12] but the automation of surveillance has not yet been adopted widely. 13 Extension to other hospitals may have been hampered by previous efforts being restricted to single hospitals with differences in information technology (IT) systems, data availability, and local diagnostic and treatment practices, or by relying on datadriven algorithm development. Absence of guidance on how to develop, implement, and maintain such systems limits their widespread adoption. A framework to develop algorithms for semiautomated surveillance, applicable to routine care data and not requiring complex data-driven modeling, may facilitate the development of reliable algorithms implementable on a larger scale.
In this observational, international multicenter cohort study, we assessed the performance and feasibility of such a framework for pre-emptive algorithm development for semiautomated surveillance (Fig. 1) applied to the surveillance of deep incisional and organ-space SSIs after hip and knee arthroplasty (THA and TKA), cardiac surgery, and colon surgery. In addition, we evaluated the generalizability of algorithms between hospitals.

Methods
In this retrospective cohort study, we assessed the performance of a framework for the development of semiautomated surveillance by comparing the developed algorithms to routinely performed traditional manual surveillance for the detection of deep incisional and/or organ-space SSI (reference standard), according to locally applied definitions. We focused on SSIs after THA and TKA, cardiac surgery, and colon surgery because these are high-volume procedures commonly targeted by surveillance programs.

Hospital and patient inclusion
European hospitals participating in the CLIN-NET network 14 were recruited for this study. We considered only hospitals with longterm experience with manual SSI surveillance after THA and TKA and cardiac surgery, and (optionally) other surgical procedures, from 2012 onward. In each hospital, the feasibility of data extraction from the EHR for the same period was assessed. Dates of (re)admission and discharge, surgical records, microbiology results, and in-hospital mortality were the minimum requirements for participation. Optional data included demographics, antibiotic prescriptions, outpatient clinic visits, temperature, other vital signs, radiology ordering, clinical chemistry, and administrative data. Data collection was performed retrospectively between July 2017 and May 2019. The targeted number of SSIs during the surveillance period had to be 60-80 per center to estimate performance with sufficient precision. All procedures included in the participating hospitals' targeted surveillance were included nonselectively in the study. From all centers, local approval and waivers of informed consent were obtained.

Automated surveillance framework
The framework was applied to each hospital by the coordinating center (University Medical Center Utrecht, The Netherlands) and consisted of the following steps: Step 1: Inventory of local surveillance and clinical procedures and availability of electronic routine care data. Centers completed a questionnaire for each targeted procedure collecting information on (1) local surveillance procedures (ie, SSI definitions, selection of the surveillance population), (2) clinical procedures regarding standard of care and diagnostic and therapeutic practice in case of SSI suspicion, and (3) the availability of relevant electronically routine care data (see the supplementary material online: Questionnaire SSI Surveillance Methods). If needed, hospitals were contacted for a follow-up discussion. Step 2: Algorithm design. Algorithms were pre-emptively designed based on the information collected in the inventory (step 1). Algorithms classified surgical procedures as having a high probability of SSI if they met the criteria of different components representing different possible indicators of HAI, adapted to targeted procedures. With data selection and defining component criteria, the following considerations were taken into account: (1) the relevance of data to serve as an HAI indicator for hospitals; (2) data availability across hospitals; (3) robustness to small changes in clinical practice or documentation; (4) ease of application irrespective of local IT systems and epidemiological support; and (5) a primary focus on optimizing sensitivity (ie, a high detection rate) followed by positive predictive value (PPV) because the framework is set up for semiautomated surveillance. For each targeted procedure, both a standardized algorithm based on common clinical practices across all hospitals and a centerspecific algorithm adapted to specific local clinical procedures were developed. If available, previously developed algorithms were used as the standardized algorithm and were validated.
Step 3: Classification high or low probability of SSI. Algorithms were applied to the data extracted from each center to classify procedures as high or low probability of an SSI.
Steps 4 and 5: Assessing and refining algorithm performance. Results of the semiautomated algorithms were compared to traditional surveillance to determine accuracy and efficiency of SSI detection. Subsequently, based on group-level analysis of results per algorithm component (without evaluating individual procedures), and discussion with each hospital, structurally missing data or miscoding were corrected. Second, a detailed caseby-case discrepancy analysis was performed to obtain insight into reasons for misclassification. All false-negative cases, and a selection of false-positive and concordant cases, were reassessed by each center, blinded for algorithm outcome. If needed, additional corrections in algorithm application were made. Errors in manual surveillance potentially discovered after reassessment were not reclassified.

Analyses
The primary end point of this study was the determination of the accuracy and efficiency of SSI detection by the semiautomated algorithms, as measured by sensitivity, PPV, and workload reduction as proportion of charts requiring manual review, compared to traditional manual surveillance. Analyses were performed at the procedural level. In addition, discrepancy analyses explored reasons for misclassification. Finally, the results of this study provide an indication of the feasibility of broad adoption of this semiautomated surveillance framework.

Inclusion of hospitals and surveillance characteristics
Initially, 4 hospitals were recruited. However, the required EHR data extraction for 1 hospital was not possible, and this hospital was therefore excluded from further analyses. The included hospitals were Amphia hospital (Breda, Netherlands), Dupuytren University Hospital (Limoges, France), the Bellvitge University Hospital (Barcelona, Spain), referred to as hospital A-C (random order). Table 1 provides an overview of the surveillance population, definitions used in manual surveillance, and the number of procedures that could be linked to EHR data and, hence, to serve as reference data to evaluate algorithm performance. In total, 4,770 THA and TKA procedures, 5,047 cardiac surgeries, and 3,906 colon surgeries, with 38 SSIs for THA and TKA, 94 SSIs for cardiac surgeries, and 230 SSIs for colon surgeries. An overview of available EHR data is presented in Table 2. Because data for antibiotics from hospital B were only available for 2016, the framework was applied to data without antibiotics and additionally to a subset of data including antibiotics.

Algorithm development
All pre-emptively developed classification algorithms included components reflecting microbiological culture results, admissions (prolonged length of stay or readmission), and reinterventions, with criteria adapted to the algorithm (standardized or center specific) and targeted procedure. Additionally, algorithms including antibiotic prescriptions or radiology ordering components were defined to accommodate the differences between centers in data availability. Procedures were classified as high probability of SSI if a patient scored positive on a combination of components, including a mandatory component for some algorithms. For THA and TKA, the algorithm developed by Sips et al 15 was applied in hospitals when data on antibiotics were available. A schematic overview of the algorithms for THA and TKA is provided in Figure 2 as an example. Detailed algorithm descriptions and a flowchart of framework application are provided in the supplementary material online. Table 3 presents the performance of all algorithms prior to case-bycase discrepancy analyses, and the results are presented in Table 4. Overall, the performance of the standardized algorithms was high in terms of sensitivity and workload reduction. Center-specific algorithms often achieved higher workload reduction, but at the cost of sensitivity. Standardized algorithms for surveillance of SSIs after THA and TKA had a sensitivity ranging from 81% to 100% across hospitals. Upon reconsideration, all missed SSIs were deemed "no SSI" in the discrepancy analyses; thus, 100% of cases could be detected. A workload reduction of >95% was achieved in all centers. The centerspecific algorithms yielded a sensitivity ranging from 50% to 100% and a workload reduction ranging from 87% to 98% across hospitals.

Algorithm performance
For cardiac surgery, the sensitivity of the standardized algorithms appeared nearly perfect, with a 73%-96% workload reduction, except for the algorithm including antibiotics in hospital B. Of 9 SSIs, 3 were missed; 1 SSI was reconsidered by the center as "no SSI" in the discrepancy analyses. The sensitivity of the center-specific algorithms across centers ranged from 44% to 95%, with a 90%-97% workload reduction.
The standardized algorithm for SSI detection after colon surgery showed >90% sensitivity in all centers, but it was lower for the algorithm without antibiotics in hospital B. All centers achieved a workload reduction between 72% and 82%. Sensitivity of the center-specific algorithms ranged from 49% to 82% across centers.
No formal comparisons were performed, but standardized algorithms with and without an antibiotics component showed comparable overall sensitivity. The PPV and workload reduction were better in algorithms including an antibiotic component for cardiac surgery only. To further assess generalizability of this finding, the performance of standardized algorithms without an antibiotics component was evaluated for hospital A as a sensitivity analysis (see the supplementary material on line: Detailed Algorithm Description, flowchart A). This analysis yielded the following results: for THA and TKA, sensitivity of 100%, a PPV 19%, and a workload reduction of 97%; for cardiac procedures, sensitivity of 94%, a PPV of 18%, and a workload reduction of 93%; for colon surgery, sensitivity of 93%, PPV of 33%, and a workload reduction of 80%.
For all targeted procedures, most SSIs missed by the standardized algorithms (ie, false negatives) were reassessed as "no SSI" in the discrepancy analyses; hence, they were correctly classified by the algorithm. For colon surgery, reasons for missed SSIs included not fulfilling the mandatory microbiology component (algorithms without antibiotics) and incomplete extraction of microbiology data (hospital B) and procedures (ie, drainage or debridement; all hospitals). The SSIs missed by the center-specific algorithms could be explained by the component criteria being too specific (eg, antibiotics after THA/TKA and cardiac surgery in hospital B and microbiology after colon surgery in hospital C). The main reasons for falsely identified high-probability cases were errors in the reference data (SSI after reconsideration in the discrepancy analyses), superficial SSIs or other complications, and patients with pre-existing infection being included in the surveillance population.

Feasibility of framework application
Application of this semiautomated surveillance framework was feasible in the 3 hospitals included in the study; algorithms with good performance were developed, applied, and validated. The questionnaire (framework step 1) provided sufficient information for pre-emptive algorithm development, although application of the algorithm required collecting further details regarding selection of data and technical specification from IT specialists, infection control practitioners, and clinicians. The fourth hospital had to be excluded from this study because historical data extraction was impossible for most data despite considerable effort. In hospital B, data on microbiology results and reinterventions could not be completed due to changes in the IT system. Factors enhancing the feasibility of this framework as encountered during this study are presented in Table 5. Note. PPV, positive predictive value. a A detailed description of applied algorithms is provided in the supplementary material online. b Sensitivity is defined as the number of procedures with SSI in the reference surveillance data that were classified by the algorithm as high probability of an SSI. c The positive predictive value is defined as the number of SSI in the reference surveillance data within all procedures that were classified as high probability by the algorithm. d The workload reduction is defined as the number of procedures that require manual confirmation (ie procedures that were indicated by the algorithm with a high probability of an SSI) as compared to all procedures in traditional surveillance. e Results discrepancy analyses: reclassification of false negative cases: 100% case detection if corrected.
Total false-negative cases, no.

Discussion
This retrospective study assessed a framework for the development of semiautomated HAI surveillance in 3 different European hospitals, focusing on deep incisional and organ-space SSIs after THA and TKA, cardiac procedures, and colon surgery. This framework achieved a high detection rate of SSI (sensitivity) as well as a 72%-98% reduction of manual chart review workload when it was applied in hospitals that differed with respect to surveillance population, HAI definition, and clinical procedures. Because this method of pre-emptive algorithm development relies on a limited number of data sources that can often be extracted from the EHR and does not require complex modeling, calibration, or dealing with missing data, 16 we expected it to be accessible to many hospitals.
The algorithms, developed without any prior knowledge other than information obtained from the survey and interviews, performed well. The 'standardized algorithms' developed with the purpose of being applicable in multiple hospitals had high sensitivity while achieving a substantial workload reduction, given that data extraction was complete. These algorithms offer possibilities for larger-scale standardization of semiautomated surveillance. Although algorithms with a more specific component definition, tailored to each center's characteristics, often achieved a higher positive predictive value, the gain in workload reduction was limited and did not offset the loss in sensitivity. Components appeared to be too specifically defined, either because clinical procedures in case of an SSI were less standardized than anticipated or because the reasoning for SSI diagnosis was less straightforward. Furthermore, the standardized algorithm components are likely more robust to changes in clinical practice over time and are easier to implement and maintain than specific definitions. Hence, for the purpose of semiautomated surveillance, algorithms with more generally defined components are preferable.
The algorithm previously developed for semiautomated surveillance of SSI after TKA and THA 15 was validated in 2 hospitals in this study; all SSIs were detected and a similar reductions in workload were achieved, although the number of SSIs was limited. The algorithm without antibiotics performed similarly. This latter observation also held true for the other types of surgery. In previous studies, data on antibiotics were added to enhance case findings, [17][18][19] but this addition was not essential to this study, and the same held true for data on radiological interventions. Although additional information could be used to optimize algorithm performance, it is feasible to develop well-performing algorithms that rely solely on microbiology results, admission and discharges dates, and procedures codes. These algorithms may be broadly adoptable because extraction of this information in an easy-to-process format was possible in all hospitals except one (due to impossibilities of accessing historical data in legacy EHR), and no computation is required (eg, deriving changes in clinical chemistry results). These elements do not depend on interpretation by medical coders; therefore, they are potentially more reliable than, for example, billing codes. 12,13 In algorithms without data on antibiotic prescriptions, fulfilling the microbiology component was mandatory to limit the number of false-positive cases. Based on the survey (framework step 1), we anticipated that cultures were taken whenever deep incisional or organ-space SSI was suspected; however, the discrepancy analyses revealed that the SSI determination was not always based on microbiology. The appropriateness of a mandatory component can be discussed because culture-negative infections can also meet the definition of deep incisional or organ-space SSI. 20 More flexibility could have increased the sensitivity, but there is always a trade-off between sensitivity and PPV. Accepting missed cases could be another alternative as long as comparability is ensured. 21 Application of this semiautomated surveillance framework was feasible, but this study revealed several important conditions for success. Early involvement of IT specialists was essential because they provided an overview of available data and because the complexity of data management depends on IT support or the medical intelligence department. Equally important is knowledge of local clinical procedures and registration practices to understand what fields or codes should be used in analyses. Hence, data extraction requires close collaboration between infection control practitioners, clinicians, and IT specialists.
The discrepancy analyses revealed procedures misclassified by manual surveillance, which highlights the importance of a validated reference surveillance population when testing the performance of models. Because the selection of cases for the discrepancy analyses was nonrandom, the results of reclassification could not be reliably extrapolated to the entire population and performance estimates were not recalculated. Hence, the performance of the algorithm was likely underestimated.
Our results have some limitations. Even though this study included hospitals from different countries that varied in applied HAI definitions, procedures in surveillance, and clinical and surveillance practices, the generalizability of the results of this framework to other centers may be limited. Only 3 hospitals were Data management support In all hospitals multiple systems for registration of routine care data were running in parallel. IT, medical intelligence specialists, and data managers facilitated algorithm application.
Validation of extracted data For correct application of the algorithm, validation of data quality is important (corrections had to be made to correct structurally missing data).
High quality reference data for algorithm validation In the discrepancy analyses, cases in the manual surveillance were reclassified (SSI to "no SSI" and vice versa), limiting performance estimation of the algorithms Note. Enhancing factors increase the feasibility of framework application, based on lessons learned during the framework application.
included in this study, all from Western Europe. Furthermore, differences in the availability of high-quality data for algorithm application (eg, microbiology results) or clinical registration practices will likely have an impact on the applicability of the presented algorithms in new settings, although similar steps in algorithm development could be undertaken. Performance estimation was further limited by a lower total number of SSIs than anticipated and by incomplete data extraction; both are consequences of limited availability of EHR data. Also, we did not measure net time in workload reduction; this cannot be estimated as a linear reduction proportionate to the charts needed for review. Because implementation of semiautomated surveillance was outside the scope of this study, no guidance on investments in human resources and material were developed in this study. More detailed practical guidance could be obtained from implementation studies. This study was a proof-of-principle investigation of a framework for semiautomated HAI surveillance algorithm development that has promise for broader implementation. Algorithms with good performance can be developed without the need for specific modeling by each hospital and based on limited data sources only. Further validation could provide insight into the feasibility of broader applications of this method, both in other hospitals and for other targeted HAIs. With standardized, semiautomated surveillance on a larger scale, the number of surveyed procedures can be expanded to facilitate local quality improvement, (inter)national comparisons, or outcome measurements in clinical trials.