Semiautomated surveillance of deep surgical site infections after colorectal surgeries: A multicenter external validation of two surveillance algorithms

Abstract Objective: Automated surveillance methods increasingly replace or support conventional (manual) surveillance; the latter is labor intensive and vulnerable to subjective interpretation. We sought to validate 2 previously developed semiautomated surveillance algorithms to identify deep surgical site infections (SSIs) in patients undergoing colorectal surgeries in Dutch hospitals. Design: Multicenter retrospective cohort study. Methods: From 4 hospitals, we selected colorectal surgery patients between 2018 and 2019 based on procedure codes, and we extracted routine care data from electronic health records. Per hospital, a classification model and a regression model were applied independently to classify patients into low- or high probability of having developed deep SSI. High-probability patients need manual SSI confirmation; low-probability records are classified as no deep SSI. Sensitivity, positive predictive value (PPV), and workload reduction were calculated compared to conventional surveillance. Results: In total, 672 colorectal surgery patients were included, of whom 28 (4.1%) developed deep SSI. Both surveillance models achieved good performance. After adaptation to clinical practice, the classification model had 100% sensitivity and PPV ranged from 11.1% to 45.8% between hospitals. The regression model had 100% sensitivity and 9.0%–14.9% PPV. With both models, <25% of records needed review to confirm SSI. The regression model requires more complex data management skills, partly due to incomplete data. Conclusions: In this independent external validation, both surveillance models performed well. The classification model is preferred above the regression model because of source-data availability and less complex data-management requirements. The next step is implementation in infection prevention practices and workflow processes.

surgeries are therefore incorporated in most SSI surveillance programs.
In most hospitals, surveillance is performed manually. However, this is experienced as labor intensive, and possibly inaccurate and is prone to subjectivity and low interrater agreement, thus limiting comparisons between hospitals. [9][10][11] The increasing availability of data stored in the electronic health record (EHR) offers opportunities for (partially) automating SSI surveillance, thereby reducing the workload and supporting standardization of the surveillance process. To date, several studies have published (semi)automated methods to automate SSI surveillance after colorectal surgery. Unfortunately, most of these are not feasible for Dutch hospitals (1) because they include elements that are not representative of the Dutch clinical setting and practice, (2) because they have insufficient algorithm performance, (3) because processing time is delayed or (4) because they are too complex for application in real life. [12][13][14][15][16][17][18] Two published semiautomated surveillance algorithms targeting deep SSI after colorectal surgery may be feasible for the Dutch setting: a classification algorithm 19 and a multivariable regression model. 20 The classification algorithm was pre-emptively designed based on clinical and surveillance practices from a French, a Spanish, and a Dutch hospital. The sensitivity was 93.3%-100% compared to manual surveillance, and the algorithm yielded a workload reduction of 73%-82%. The regression model was developed using data from a Dutch teaching hospital; we used it to predict the probability of deep SSI for each individual patient. This 5-predictor model had a sensitivity of 98.5% and a workload reduction of 63.3%. 20 External validation or actual implementation studies of new methods for automated surveillance are scarce. 21,22 As reported by 2 systematic reviews, only 23% of the included studies used a separate validation cohort 23 and only 25% of automated surveillance were used in clinical routine. 24 Hence, knowledge about generalizability of automated surveillance models is limited, and information about the path toward actual implementation is needed. 22,25,26 In this study, we present an independent and external validation of the previously developed classification and regression model in new cohorts of patients that underwent colorectal surgeries in different types of Dutch hospitals. 21 We investigated the feasibility of data requirements for both algorithms. If feasible and externally valid, these models can be implemented in SSI surveillance practices and workflow processes.

Study design
In this retrospective cohort study, 4 Dutch hospitals (1 academic, 2 teaching, 1 general), each with different, or different versions of, EHR systems, extracted the data needed for algorithm application. To obtain insights in hospitals' clinical practice and patient care, a questionnaire adapted from a previous study 19 was filled in by the hospital staff at the start of the study (Appendix 1 online). Feasibility of the data collection (a precondition for implementation) was evaluated by assessing the completeness of the surveillance population (denominator) and the ability of the hospitals to automatically collect case-mix variables from their EHR. Thereafter, we applied the 2 surveillance algorithms to the extracted data. Model results were compared with conventional (ie, manually annotated) surveillance. 11 Approval for this study was obtained from the institutional Review Board of the University Medical Centre Utrecht (reference no. 20-503/C) and from the local boards of directors of each participating site. Informed consent was waived given the observational and retrospective nature of this study.

Surveillance population and data collection
The hospitals identified patients aged >1 year undergoing primary colorectal resections in 2018 and/or 2019 based on procedure codes in EHR data. Hospitals could use other data sources to establish inclusion rules to construct the surveillance population and to distinguish secondary procedures or resurgeries. For the patients included in the surveillance population, structured data were extracted from the EHR including demographics, microbiological culture results, admissions (ie, prolonged length of stay or readmission), resurgeries, radiology orders, antibiotic prescriptions, and variables for case-mix correction (see Supplementary Table S1 in Appendix 2 online).

Outcome
The outcome of interest was a deep SSI (deep incisional or organspace) within 30 days after surgery according to the Dutch surveillance protocol. 27 In short, patients having purulent drainage from the deep incision or from a drain that is placed through the wound, or having an abscess, a positive culture from the organ space, or signs and symptoms of infection in combination with wound dehiscence and a positive culture of deep soft tissue, or other evidence of infection by direct examination were considered deep SSIs. The criterion of a positive culture is not applicable in case of anastomotic leakage or perforation following the surgery. In each hospital, infection control practitioners (ICPs) manually screened patients to identify deep SSIs. This manual surveillance was considered the reference standard. All ICPs performing manual chart review received training to ensure the quality of data collection and case ascertainment. 11 Moreover, all hospitals participated in an on-site visit to validate the conventional surveillance. Details about this on-site validation visit are described below.

Feasibility of data collection
To evaluate the feasibility of the data collection, we evaluated the completeness of the surveillance population (denominator data) by comparing the patients selected by procedure codes with patients included in the reference standard. Additionally, we compared agreement between the case-mix variables (ie, risk factors: age, sex, ASA classification, wound class, stoma creation, malignancy and anastomotic leakage) that were extracted from the EHR with the case-mix variables that were collected during conventional surveillance.

Algorithm validation
Model validation of the classification model The classification algorithm was based on the development study, using 5 elements: antibiotics, radiology orders, (re)admissions (ie, prolonged length of stay, readmissions or death), resurgeries, and microbiological cultures ( Fig. 1a and Supplementary Table S2 in Appendix 2 online). All extracted data were limited to 45 days following the colorectal surgery to enable the algorithm to capture deep SSIs that developed at the end of the 30-day follow-up period. In accordance with the development study, 19 patients were classified into low probability of having had a deep SSI (≤1 element excluding microbiology, or 2-3 elements and no microbiology) and high probability of having had a deep SSI (4 elements excluding microbiology, or 2-3 elements and microbiology). High-probability patients required manual SSI confirmation, and lowprobability patients were assumed free of deep SSI. If discrepancies were found between the clinical practice reported in the questionnaire and the algorithm, we evaluated whether an adaptation of the classification algorithm could have improved performance. When an algorithm element could not be computed due to incomplete data (eg, discharge date is missing so length of stay cannot be computed), the patient scored positive on that element.

Model validation of the regression model
The regression model utilizes wound class, hospital readmission, resurgery, postoperative length of stay, and death to calculate the probability of deep SSI. Coefficients estimated in the development setting 20 were multiplied with the predictor values of this validation cohort to estimate SSI probability ( Fig. 2 and Supplementary Table S3 in Appendix 2 online). In accordance with the cutoff point in the development study, patients were classified into low probability of deep SSI (≤0.015) and high probability of deep SSI (>0.015). High-probability patients required manual SSI confirmation, whereas low-probability patients were assumed free of deep SSI. In case a predictor could not be automatically extracted by the hospital or had missing values, the predictor collected by the manual surveillance was used to evaluate algorithm performance.

On-site visit
All hospitals participated in an on-site visit to validate the conventional surveillance. This process was executed by 2 experienced surveillance advisors of the Dutch national HAI surveillance network who were blinded for the outcomes of both the reference standard and the algorithms. For each hospital, a sample of 20 patients was taken from the data according to the hierarchical rules (Fig. 3). All false-negative results were included, to confirm their deep SSI status. Additionally, records from every other group (false-positive, true-positive, and true-negative results) were included until 20 were gathered. The group size of 20 patients was based on the time capacity of the validation team.

Statistical analyses
After data linkage, descriptive statistics were generated. To evaluate data feasibility, missing data patterns were described, and no techniques such as multiple imputation were performed to complete the data. Both models were applied to the data extractions, and results were compared with the reference standard. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and workload reduction were calculated overall and were stratified per hospital. Workload reduction was defined as the proportion of colorectal surgeries no longer requiring manual review after algorithm application. A discrepancy analysis was performed in case of any false-negative results (ie, missed deep SSI); the algorithm elements were checked in the original data. Data cleaning and statistical analyses for the classification model were carried out in SAS version 9.4 software (SAS Institute, Cary, NC). For the regression model, we used R version 3.6.1 software (R Foundation for Statistical Computing, Vienna, Austria).

Feasibility of data collection
Completeness of the surveillance population The exact surveillance population could not be reconstructed because there were no separate procedure codes or potential inclusion rules to reliably distinguish secondary procedures or resurgeries from primary procedures (range, 8.7%-22.0%, Table 1). Vice versa, 0-25% of patients in the reference standard were not identified when using inclusion rules based on procedure codes (details in Table 1). Thus, 672 colorectal surgery patients were included in this study, and 28 had deep SSIs (4.1%).

Completeness of data collection
Electronic collection of the minimum required data set from the EHR was feasible for all variables except wound class. Hospital A used text mining to establish the wound class. For hospitals B and C, wound class as collected during manual surveillance  (reference standard) was used. For hospital D, wound class information was not available in the source data. Figure 4 shows the percentage of agreement between the casemix variables extracted from the EHR and those collected manually. Disagreement was mostly related to incomplete data, either variables were not registered in the original source or were not available from source data at all.

Algorithm validation
The original classification model had an overall sensitivity of 85.7% (95% CI, 67.3%-96.0%) ranging from 72.7% to 100% between hospitals, a specificity of 92.1% (95% CI, 89.7%-94.0%), PPV of 32.0% (95% CI, 21.7%-43.8%) and an NPV of 99.3% (95% CI, 98.3%-99.8%). For the performance per hospital, see Table 2. Only  Explanation of mismatch: manual review of a random sample of these records showed these were mainly revision/secondary procedures, and for hospital C surgeries performed at another hospital location that are excluded from manual surveillance. c Explanation of mismatch: Hospital B: incorrect inclusions in reference standard as they did not meet inclusion criteria (no primary procedure) Hospital C: These surgeries were registered as executed by internal medicine department, while for the extractions only resections performed by surgery department were selected. Hospital D: According to the national surveillance protocol the resection with the highest risk is to be registered in case of more resections during the same surgery. Hospital included the wrong procedure in these cases. 8%-13% of the records required manual review after algorithm application. In hospitals C and D, respectively, 1 and 3 deep SSIs were missed by the algorithm (Table 3). In contrast to hospitals A and B, both hospitals had reported in the questionnaires that microbiological cultures were not consistently taken in case of suspected infection, and this was reflected in the percentage of patients meeting the microbiology element. Therefore, we adapted the algorithm and classified patients with 1 element (ie, radiology order, antibiotics, readmission, or resurgery) as low probability (Fig. 1b). This model resulted in higher sensitivity (overall sensitivity, 100%; 95% CI, 87.7%-100.0%) but at the cost of lower PPV and less workload reduction (Table 2).   No reoperation was performed. The antibiotic treatment was not identified by the algorithm as these were home-administered antibiotics, which were not included in the data selection.

Patient 2, Hospital D 3 Microbiology b Readmission
Reoperation took place 3 days after surgery, during the hospitalization of the index surgery; no readmission needed.

Patient 3, Hospital D 3 Microbiology b Resurgery
Patient had an endosponge placement; however, this reintervention is not registered as resurgery and performed as outpatient treatment by an internist, gastroenterologist or endoscopist from the gastrointestinal and liver diseases specialty while for the data extractions only resurgeries performed by same specialty as index surgery were selected a Algorithm elements are radiology orders, antibiotics, (re)admissions, resurgeries, and microbiology. Patients needed 4 elements excluding microbiology, or 2-3 elements and microbiology to be classified as high probability by the algorithm. See also Fig. 1 and Appendix 2 (online). b Both hospitals had reported in the questionnaires that cultures were not consistently taken in case of suspected infection.
Due to the small sample size and low number of deep SSIs, discrimination and calibration were not evaluated.
No discrepancies were found during the on-site validation visit in hospital D. In the other 3 hospitals, on-site validation revealed 5 additional deep SSIs: 2 were overlooked in the conventional surveillance and 3 were initially classified as superficial SSIs. All additional deep SSIs were classified correctly as high probability by both the (modified) classification model and the regression model. Other findings of the on-site validation of the reference standard, though not essential for the assessment of the algorithms, were reclassifications of superficial SSIs to no SSI (n = 1), missed superficial SSIs (n = 2), and incorrect inclusions (n = 8).

Discussion
This study demonstrated the external validity, both temporal and geographical, of 2 surveillance algorithms that identify patients with a high probability of deep SSI after colorectal surgery. Both had a high detection rate for deep SSI and can be used for semiautomated surveillance and, thus, to further improve efficiency and quality of SSI surveillance. Both the classification model, especially when adapted to local practices, as well as the regression model, performed very well. To select a model for use within an organization, we considered other aspects of implementation. First, in case of incomplete data, the original development study of the regression model used multiple imputation techniques. For the classification model, the patient scored positive on the algorithm element that could not be computed due to incomplete data. This was a more convenient method for which no complex data management techniques were required. Second, according to the original study, patients with a dirtyinfected wound (ie, wound class 4) were excluded from the cohort of the regression model. However, according to the national surveillance protocol, these cases should have been included in the surveillance. In addition, in 2 hospitals, wound class was not available in a structured format for automated extraction hindering algorithm application. Third, the classification model was easily be adapted to local practices. For the regression model, a sufficient sample size was required for redevelopment or recalibration in case of low predictive accuracy. This aspect may be challenging for hospitals performing few colorectal resections. Therefore, the (modified) classification model is more feasible and sustainable for real-life implementation within hospitals, improving standardization and benchmarking. We know from a previous study that the classification model has also been successful in other European countries and in low-risk surgeries such as hip and knee arthroplasties. 19,28 For both algorithms, however, several hurdles remain for implementation. The exact surveillance population could not be automatically selected by procedure codes, but a change in the current inclusion criteria or target population could be considered. In this study, 10%-22% of surgeries detected by procedure codes did not concern a resection, were not the main indication for surgery (but performed concomitant to other intra-abdominal surgeries), or were not the first colon resection for that patient. Also, the variables necessary for case-mix adjustment are sometimes difficult to extract automatically. Although the search for a proper case-mix correction is ongoing, 14,29-32 automated extraction of a minimal set of risk factors is necessary to interpret the surveillance results and to maintain the workload reduction delivered by (semi)automated surveillance.
Two findings in this study emphasize that close monitoring, validation of algorithm components, and future maintenance are important to maintaining alignment with clinical practice and guarantee high-quality surveillance. First, as appeared from the questionnaire, 2 hospitals did not consistently obtain microbiological cultures in case of suspected deep SSI. We advise researchers to first verify whether algorithms align with clinical practice and consider adapting algorithms to differences subsequently. 23,[33][34][35] Secondly, new treatment techniques should also be evaluated regularly and algorithms adapted accordingly. Endosponge therapy is increasingly used after anastomotic leakage; however, this intervention is often not registered or is regarded as resurgery but as outpatient treatment performed by a different specialty than the initial colorectal surgery. Each hospital should therefore periodically evaluate care practices and algorithm elements to select the appropriate resurgeries or to include recently introduced interventions, such as endosponge therapy, within the re-surgery element in the surveillance algorithm.
This study had several strengths. We performed an independent external validation in independent patient data from different types of hospitals, as well as a temporal validation. Apart from algorithm performance, automated selection of patients and case-mix variables were investigated as well, which are prerequisites for actual implementation.
This study also had several limitations. First, both algorithms targeted deep SSIs only, but in colorectal surgeries 20%-50% of SSIs are superficial. 6,36 Debate continues regarding the inclusion of superficial SSI in surveillance programs given their subjective criteria and limited clinical implications. 28,37,38 Second, we aimed to validate all published automated surveillance systems that appeared applicable to Dutch practice; however, automated surveillance systems may have been developed by commercial companies that were not published in scientific literature and were therefore not included. Third, the small sample size and low number of deep SSIs resulted in large confidence intervals for the individual hospitals and impeded the evaluation of discrimination and calibration. 39,40 Although a larger validation cohort is preferred, the numbers used in this study reflect the reality of surveillance practices. Although underpowered, the overall sensitivity and hospitals' individual point estimates were satisfying, and this study provided valuable insights into implementation. Fourth, for both manual-and semiautomated surveillance, postdischarge surveillance was limited to the initial hospital. In the Dutch setting, patients return to the operating hospital in case of complications, so this will likely not lead to underestimation of SSI rates. SSI benchmarking or widespread implementation of this semiautomated algorithm may be hampered for countries without this follow-up. Last, as actual widespread implementation of automated surveillance is still limited, [24][25][26] this study provides insights into validity and data requirements needed for implementation of semiautomated SSI surveillance after colorectal surgery. However, this study did not include a full feasibility study including economic, legal, and operational assessments. We emphasize that successful implementation also depends on organizational support, information technology knowledge, staff acceptance, change management, and possibilities for integration in workflows.
In this independent external validation both approaches to semiautomated surveillance of deep SSI after colorectal surgery performed well. However, the classification model was proven preferrable to the regression model because of source data availability and less complex data-management requirements. Our results have revealed several hurdles when automating surveillance. The targeted surveillance population could not be automatically selected by procedure codes, and not all risk factors were complete or available for case-mix correction. The next step is implementation in infection prevention practices and workflow processes to automatically identify patients at increased risk of deep SSI.