Hostname: page-component-74d7c59bfc-d7gsp Total loading time: 0 Render date: 2026-02-01T00:33:32.380Z Has data issue: false hasContentIssue false

Leveraging a large language model to support expansion of surveillance activities to include cardiovascular implantable device infections in a large, integrated national healthcare system

Published online by Cambridge University Press:  23 January 2026

Dipandita Basnet
Affiliation:
Center for Health Optimization and Implementation Research (CHOIR), VA Boston Healthcare System, Boston, MA, USA Boston University School of Public Health, Boston, MA, USA
Hillary J. Mull
Affiliation:
Center for Health Optimization and Implementation Research (CHOIR), VA Boston Healthcare System, Boston, MA, USA Department of Surgery, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
Daniel J. Morgan
Affiliation:
VA Maryland Healthcare System, Baltimore, MD, USA Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA
Samuel W. Golenbock
Affiliation:
Center for Health Optimization and Implementation Research (CHOIR), VA Boston Healthcare System, Boston, MA, USA
Rebecca P. Lamkin
Affiliation:
Center for Health Optimization and Implementation Research (CHOIR), VA Boston Healthcare System, Boston, MA, USA
Judith M. Strymish
Affiliation:
Section of Infectious Disease, VA Boston Healthcare System, Boston, MA, USA Harvard Medical School, Boston, MA, USA
Kimberly Harvey
Affiliation:
Center for Health Optimization and Implementation Research (CHOIR), VA Boston Healthcare System, Boston, MA, USA
Kaeli Yuen
Affiliation:
Department of Veterans Affairs, Office of the Chief AI Officer, Washington, DC, USA
Marin L. Schweizer
Affiliation:
William S. Middleton VA Hospital, Madison, WI, USA University of Wisconsin-Madison, Madison, WI, USA
Dimitri Drekonja
Affiliation:
Section of Infectious Diseases, Minneapolis VA Healthcare System, MN, USA
Maria C. Rodriguez-Barradas
Affiliation:
Section of Infectious Diseases, Michael E. DeBakey VA Medical Center, Houston, TX, USA Department of Medicine, Baylor College of Medicine, Houston, TX, USA
Westyn Branch-Elliman*
Affiliation:
Section of Infectious Diseases, VA Greater Los Angeles Healthcare System, Los Angeles, CA, USA Department of Medicine, UCLA David Geffen School of Medicine, Los Angeles, CA, USA Center for the Study of Healthcare Innovation, Implementation, and Policy, Greater Los Angeles VA Medical Center, Los Angeles, CA, USA
*
Corresponding author: Westyn Branch-Elliman; Email: westyn.branch-elliman@va.gov
Rights & Permissions [Opens in a new window]

Abstract

Background:

Surveillance activities are emerging as exemplar use cases for large language models (LLMs) in health care. The aim of this study was to evaluate the potential for LLMs to support the expansion of surveillance activities to include cardiovascular implantable electronic device (CIED) procedures.

Methods:

A validated machine learning-based infection flagging tool was applied to a cohort of VA CIED procedures from 7/1/2021 to 9/30/2023; cases with ≥10% probability of CIED infection underwent manual review. Then, a weighted random sample of 50 infected and 50 uninfected cases was reviewed with generative artificial intelligence (GenAI) assistance. GenAI prompts were iteratively refined to extract and classify all components of infection-related variables from clinical notes. Data extracted by GenAI were compared with manual chart reviews to assess infection status and extraction consistency.

Results:

Among 12,927 CIED procedures, 334 (2.58%) had ≥10% probability of CIED infection. Among 100 sampled cases, 50 of 50 uninfected cases were correctly categorized. Among 50 infection cases, GenAI identified all CIED infections, but the timing of events and the attribution to a preceding procedure were incorrect in 7 of 50 cases. The overall specificity of the GenAI-assisted process was 100% and the sensitivity for accurately classifying timing and attribution of CIED infection events was 82%. Errors in timing improved with iterative prompt updates. Manual chart reviews averaged 25 minutes per chart; the GenAI-assisted process averaged 5–7 minutes per chart.

Conclusions:

LLMs can help streamline the review process for healthcare-associated infection surveillance, but manual adjudication of output is needed to ensure the correct timeline of events and attribution.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is a work of the US Government and is not subject to copyright protection within the United States. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America.
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© US Department of Veterans Affairs, 2026

Background

Mandatory healthcare-associated infection (HAI) surveillance activities are emerging as an exemplar use case for the application of generative artificial intelligence (GenAI) in health care. Previous work applied GenAI for central line-associated bloodstream infections (CLABSI) and catheter-associated urinary tract infections (CAUTI) surveillance in limited, retrospective settings and/or on synthetic datasets. Reference Rodriguez-Nava, Egoryan, Goodman, Morgan and Salinas1Reference Wu, Langford, Shenoy, Carey and Branch-Elliman5 These proof-of-concept studies show promise, although ongoing challenges include limited access to the full range of data available in the electronic health record (EHR). Reference Rodriguez-Nava, Egoryan, Goodman, Morgan and Salinas1 Prior work does not address real-world challenges in operationalizing these tools. Reference Wu, Langford, Shenoy, Carey and Branch-Elliman5,Reference Wu, Shenoy, Carey, Alterovitz, Kim and Branch-Elliman6 Curated datasets are inherently filtered and direct the large language models (LLMs) to the important information; the tools therefore often do not face the challenge of primarily finding information in the EHR and handling complex and often contradictory information. Additionally, LLMs are complex models that require substantial computing infrastructure to be run and applied to the entire corpus of data available in the EHR. Reference Klang, Apakama and Abbott7Reference Sun, Wang, Guo, Li, Wang and Hai10 Even if the computing infrastructure is available, simply running these models on complete healthcare datasets can take substantial time depending on cohort size and the health system. Potential solutions to address computing power and complexity include expanding cloud capacity, applying more specific models, and identifying ways to curate the dataset such that the LLMs can be directed toward a smaller corpus of unstructured data. Reference Yang, Some, Bain and Kang11 Although expanding infrastructure is costly, reducing the unstructured dataset size and directing the GenAI tool to a curated sample with higher true-event probability can enhance feasibility. Real-world experiences demonstrating the feasibility and accuracy of this phased approach to directing the GenAI to only the highest probability cases where detailed review is needed are limited.

Our group previously developed, validated, and implemented a trigger-tool-based mechanism to support the expansion of HAI surveillance to include cardiovascular implantable electronic device (CIED) infections. Reference Mull, Golenbock and Basnet Thapa3,Reference Mull, Stolzmann, Shin, Kalver, Schweizer and Branch-Elliman12 The machine learning-based surveillance algorithm uses a combination of structured variables and text note searches to flag potential infection cases. Reference Mull, Golenbock and Basnet Thapa3,Reference Mull, Stolzmann, Shin, Kalver, Schweizer and Branch-Elliman12 After real-world testing and updates to include community care (CC) data, the surveillance algorithm flagged 2.58% of cases and had a positive predictive value (PPV) of 74.0%. Reference Mull, Golenbock and Basnet Thapa3 Although the PPV is insufficiently high to leverage the surveillance algorithm as a stand-alone HAI reporting tool without additional manual review in most settings, the surveillance algorithm was highly effective at ruling out cases with negative infection, effectively reducing the size of the cohort requiring manual review to confirm the presence of CIED infection from 12,927 to 334. Reference Mull, Golenbock and Basnet Thapa3 We hypothesized that LLMs newly available in the VA healthcare system, specifically GPT-4o accessed through Azure OpenAI, could be integrated into the surveillance workflow to more fully automate the process. Thus, the aim of this study was to evaluate the accuracy of GenAI for performing the second stages of CIED infection review, including assessments of each element of the infection determination process, following a reduction in cohort size using less computationally complex informatics strategies (Figure 1).

Figure 1. Hybrid process for directing GenAI tools to support sustainable scaling for healthcare-associated infection surveillance.

GenAI systems have substantial computing infrastructure requirements that limit the feasibility of scaling in healthcare systems. A phased approach, in which less computationally complex algorithms are first applied to direct the GenAI tools to the highest probability cases and then GenAI tools are applied to a smaller, more curated dataset, may provide a pathway to feasible and sustainable deployment of GenAI tools for healthcare applications, such as healthcare-associated infection surveillance.

Methods

Cohort creation

Details of the creation of the retrospective cohort used to conduct this work are previously described. Reference Mull, Golenbock and Basnet Thapa3,Reference Mull, Stolzmann, Shin, Kalver, Schweizer and Branch-Elliman12,Reference Branch-Elliman, Lamkin and Shin13 Briefly, CIED procedures conducted in the VA healthcare system from 7/2021–9/2023 were identified using Current Procedural Terminology (CPT) codes. The previously developed machine learning algorithm for flagging CIED procedures with a high probability of infection (>10%) Reference Mull, Golenbock and Basnet Thapa3,Reference Branch-Elliman, Lamkin and Shin13 was then applied, and all cases underwent manual review to adjudicate the presence or absence of a procedure-related CIED infection. The machine learning tool includes VA clinical data, including text note flags from clinical notes, as well as Community Care data. Community care data includes admissions, CPT codes, and medication dispensing data but not clinical note information. Reference Mull, Golenbock and Basnet Thapa3,Reference Branch-Elliman, Lamkin and Shin13 To test the quality of the GenAI-assisted chart review process, a stratified random sample of the manually reviewed dataset with CIED infection (N = 50) and without CIED infection (N = 50) underwent a second round of review using GenAI (VA GPT 4.0). The stratified random sampling strategy was used to ensure that a diversity of VA sites were included and the sample size was based on the feasibility of a second manual review and previously published studies using similar approaches.

CIED infection definitions

CIED infection status was determined based on the presence of at least 2 symptoms (eg, erythema, pain, swelling, or drainage/pus at the insertion site, fever, chills, sweats), and/or echocardiographic evidence of infection (lead involvement or valve involvement/endocarditis), and/or positive wound site culture or blood cultures with directed treatment and/or clinician diagnosis of infection (see Supplementary Material 1 for chart review tool). Reference Branch-Elliman, Lamkin and Shin13 Infection diagnosis required a minimum of 2 symptoms plus treatment or a clinician diagnosis. Cases with possibly compatible symptoms and a clear alternative diagnosis (eg, pain and swelling and hematoma diagnosis) were not classified as infections. Any clinician diagnosis with treatment was classified as an infection. Pocket infections were defined as infections confined to the battery pocket/wound site; systemic/lead infections required evidence of systemic infection with either positive blood cultures or echocardiographic evidence of infection. Aligned with Centers for Disease Control and Prevention surgical site infection (SSI) definitions with device placement, CIED infections that occurred within 90 days of a procedure and that were not existing at the time of the procedure were considered to be procedure-related. 14 Infections that occurred outside of the 90-day window and procedures that were performed for managing an existing infection (eg, the infection predated the procedure) were classified as not procedure-related.

Development of prompts and GenAI infection determination process

VA GenAI is a generative AI chat interface program fully contained within the VA ecosystem that does not retain patient data. VA GenAI leverages GPT-4o, accessed through Azure OpenAI, with a temperature set to zero. Due to the closed nature of the VA healthcare informatics ecosystem and the lack of integration with VISTA, relevant clinical notes were manually pulled and entered into the GenAI system. Time to complete the note harvesting and the copy-paste effort were tracked to compare with the original manual review process.

Based on prior experience with a GenAI prompt to identify CLABSIs and a previously developed CIED infection chart abstraction tool, an initial GenAI prompt to measure relevant variables for the CIED infection determination was created. Reference Rodriguez-Nava, Egoryan, Goodman, Morgan and Salinas1 GenAI-assisted chart review process began with aggregating clinical notes from the Joint Legacy Viewer into the GenAI platform as a single, continuous document. The GenAI process was designed to be two phased: first, to organize the data elements and, second, to summarize the information and make a determination about whether a CIED infection was present and if it occurred in the 90-day surveillance window (eg, was procedure-related). For the first phase (data extraction and organization), prompts were designed to provide a comprehensive summary of the key pieces of information included in the chart review tool needed to make a determination about the presence of CIED infection, for example, identify symptoms, microbiologic evidence, etc. Then, the GenAI tool was asked to apply the same CIED infection definitions and National Healthcare Safety Network (NHSN) criteria about infection timing used for the manual review to make a final determination about CIED infection presence and HAI attribution.

The GenAI prompts were iteratively updated until they reliably organized the relevant information needed to populate the chart review tool and make an infection determination (Table 1, Supplementary Material 2). Following prompt development, clinical data cut and pasted from EHR notes were provided to the VA GenAI tool along with the iteratively updated and improved prompts.

Table 1. Prompt revisions for accurate cardiovascular implantable electronic device (CIED) infection date extraction and attribution as procedure-related from GPT-4o

Analysis: GenAI-alone versus manual review alone

GenAI summary determination (infection yes/no) and accuracy of each of the individual elements were evaluated with manual review determination used as the standard of comparison. Sensitivity and specificity of the GenAI determination versus manual review were calculated and described using simple descriptive statistics; sensitivity and specificity are presented for final, refined prompts only. Additionally, the time taken for each stage, from EHR note extraction to Excel data entry, was quantitatively measured. The qualitative assessment included classifying errors, categorizing them by type, and evaluating prompt quality and updates. Variables extracted by GenAI were compared to manual determinations, with discrepancies and errors recorded and categorized for further evaluation.

Ethical considerations

This study was approved as exempt human subjects research by the VA Boston Research and Development Committee prior to data collection and analysis. VA GPT is Health Insurance Portability and Accountability Act (HIPAA)-compliant and does not store user data or retain data input into the interface. It is authorized to capture and handle Protected Health Information, Personally Identifiable Information, and other VA sensitive data securely.

Results

During the study period, 12,927 CIED procedures were performed, and 334 (2.58%) were flagged by the algorithm. This included confirmed infection cases (n = 50) and confirmed non-infection cases (n = 50) from 71 VA medical centers; among the 50 CIED infection cases, 46 occurred during the 90-day surveillance window period following the index procedure, and 4 were preexisting infections present at the time of the procedure. Patients in the reviewed sample were predominantly male (98%) and non-Hispanic white (61%) with an average age of 74.2 years (standard deviation, 10.5 years). The CIED infections included 31 pocket infections, 9 cases of systemic infections and endocarditis, 7 cases of lead infection, 1 case of cellulitis, 1 case of left ventricular assist device infection, and 1 infection with evidence of both pocket involvement and lead infection (mixed case). Most CIED infection cases had positive blood cultures (37 cases) and/or wound cultures (4 cases). Blood culture isolates included Staphylococcus aureus (11), Escherichia coli (2), Streptococcal species (2), and others (22). Wound culture isolates included S. aureus (2), Serratia marcescens (1), and Proteus Mirabilis (1). The accuracy of each element in the CIED infection review process is presented in Table 2. GenAI correctly identified CIED infection presence in 50 of 50 true infection cases. The most common errors were related to the accurate ascertainment of dates and timing of events. Initially, GenAI correctly identified 43 of 50 procedure dates and 49 of 50 culture collection dates. Accuracy of the procedure date was able to be improved with iterative prompt updates (Table 1); the culture date error was not able to be corrected because the culture was collected outside of the VA healthcare system and that information was not available to the GenAI tool. Both methods accurately identified infection-related symptoms, clinical diagnoses, procedural interventions, and antibiotic treatments in all 50 infection cases. Through iterative prompt engineering focusing on infection-related keywords and specific instructions about how to assess infection onset, GenAI’s performance for identifying dates correctly improved substantially (See Supplementary Material 3 for exemplar output).

Table 2. Accuracy of individual data elements and data organization process conducted by GenAI versus manual review

GenAI, generative artificial intelligence; CIED, cardiovascular implantable electronic device.

* Manual review process used as the standard of comparison.

GenAI was generally able to summarize clinical information to determine infection severity, correctly ascertaining local versus systemic infection in 49 of 50 cases; the one misclassification was due to a mixed infection with evidence of both pocket and systemic involvement. In this case, the GenAI classified the infection type in the following manner:

“CIED infection-present… While there are indications suggestive of a systemic infection, the specific involvement of the device pocket or definitive imaging results for endocarditis were not provided. Based on the clinical information provided it could either be a pocket infection or endocarditis. Further imaging studies and clinical evaluations are necessary to accurately determine the precise nature of the infection.”

After data extraction and organization, the GenAI system was able to summarize the information and correctly identify all 50 CIED infection cases as having an infection and correctly ruled out infection in all 50 uninfected cases. Of the 46 procedure-related infections, 39 were correctly attributed. Most errors were due to failures to identify dates and the timeline of events (Table 2). GenAI correctly created timelines in 42 of 50 cases and 49 of 50 infection types, resulting in 82% sensitivity (95% CI: 68.6%–91.4%) and 100% specificity (95% CI: 92.9%–100%) (Table 3).

Table 3. Accuracy of the generative artificial intelligence (GenAI) tool summary for measuring procedure-attributable cardiovascular implantable electronic device infections

Discussion

The current iterations of GenAI systems are more flexible and user-friendly than older models, both likely catalysts for their relatively rapid adoption. Reference Wu, Shenoy, Carey, Alterovitz, Kim and Branch-Elliman6 Despite these promising features, a major and ongoing challenge for the integration of GenAI systems in health care is the need for advanced computing infrastructure to process data and apply these computationally complex models. Training and deployment of GenAI systems require substantial computational resources and time due to the complexity and size of these models. Reference Sun, Wang, Guo, Li, Wang and Hai10,15 To address these barriers to support scaling on a larger scale, we first applied an algorithm using less computationally intensive strategies to flag CIED procedures with >10% probability of CIED infection to reduce the size of the cohort to a more manageable size for processing and application of the GenAI and then provided filtered unstructured clinical data with iteratively developed prompts to support the second phase of the review process for identifying true CIED infections (Figure 1). We then used GenAI to replicate the data abstraction steps of the manual review process and to summarize the information to streamline review for CIED infections. Overall, we found that the GenAI augmented process had outstanding specificity and good sensitivity; the most common source of errors related to the GenAI’s ability to correctly sequence events and to determine attribution, similar to the errors identified during a multisite deployment of GenAI to support CLABSI surveillance. Reference Morgan16

HAIs are reported as a quality metric, and given the limitations of currently available GenAI tools, manual adjudication and review of output continues to be an important step for ensuring appropriate attribution, classification, and reporting of cases, particularly for SSI, which, unlike CAUTI and CLABSI, include clinician diagnosis as one of the surveillance criteria. 14,Reference Shenoy and Branch-Elliman17,Reference Branch-Elliman, Sundermann, Wiens and Shenoy18 A persistent source of error for GenAI tools continues to be the creation of timelines of events, an essential element for determining attribution of possible HAIs. Some, but not all, of these errors can be addressed with iterative prompt updates, but it remains unclear if and when systems will evolve beyond AI-assisted review to fully automated processes. Additionally, manual reviews often provide critical insights for root cause analysis and patient safety interventions, and thus, even if the process could be fully automated, human review is likely to continue to be necessary to use the surveillance data to inform patient safety interventions. Despite these limitations of currently available tools and challenges with scaling, most systems lack mechanisms for comprehensive surveillance of CIED infections and other patient safety events. Reference Branch-Elliman, Sundermann, Wiens and Shenoy18 Thus, even imperfectly accurate systems offer a framework for expanding these services beyond traditional inpatient settings to encompass a broader array of clinical care.

A major benefit of the GenAI-assisted process was its ability to rapidly process, review, and organize information from a large volume of clinical notes to facilitate manual determination. In our study, the GenAI-assisted process reduced case review time to 5–7 minutes per chart compared to manual chart reviews, which typically required at least 20 minutes per chart, with complex charts requiring up to 25 minutes for a thorough review. Additionally, the GenAI system offers the opportunity for a “second check” on the manual review process, which is also imperfect, with prior work finding that corrections are required in up to 37% of cases with a single reviewer. Reference Siems, Banks and Holubkov19 A structured chart review conducted by GenAI, followed by a brief secondary review by an expert, may be an approach that balances accuracy with human resource requirements. Reference Siems, Banks and Holubkov19,Reference Zhan, Humbert-Droz, Mukherjee and Gevaert20

One of the most high-profile concerns about GenAI systems is their propensity to “hallucinate” or to provide answers that are not grounded in the data. The GenAI deployed in most healthcare systems, including VA GPT, is set to a temperature of zero to minimize this risk, and prior evaluations suggest that errors of omission may be more common than errors of commission for summarization tasks performed in the healthcare setting. Reference Morgan, Branch-Elliman and Goodman21,Reference Williams, Subramanian and Ali22 Although our dataset was limited, we reassuringly did not find any evidence of hallucinations in our study.

Our strategy of first limiting the cohort using less computationally intensive strategies and then using the GenAI on the more limited dataset reduces but does not eliminate concerns about the costs associated with the computing and scaling of these tools. Text-note searching tools, such as VA Voogle, a web-based application available in the VA, is a keyword-based search engine that can flag notes with specific terms but does not have organization or summarization capabilities. GenAI supports interactive querying, allowing users to ask follow-up questions and receive refined answers in real time. The system is also able to adjust its responses based on the evolving context of queries, but these systems can present information in an overly confident manner even when determinations are incorrect and have a tendency to be sycophantic. Ultimately, both chart-review support tools have limitations. An additional consideration for future improvement would be to use a search tool such as Voogle to identify the most important clinical notes for review followed by a GenAI-assisted process to support higher-level tasks, such as determination of whether an infection was present at the time of admission and/or attributed to a specific procedure; this would partially replicate the filtering and note selection process applied in this study. Other model types may be better suited for tasks where timelines are important for determining attribution and application of these models should be an area of future investigation. Reference Uo23

Limitations

This study was performed within the VA Healthcare System, and while it included Community Care data as part of the flagging process, the findings primarily demonstrate the feasibility and effectiveness of natural language processing in a national healthcare setting specific to VA as we did not have access to clinical note data from outside the VA for the purposes of this study. Results may not be directly generalizable to other healthcare systems. The analysis involved data filtering prior to input into the GenAI tool, which is likely to increase the apparent accuracy of the tool by directing the model to the most relevant data. We did not attempt to directly apply the GenAI tool to an unfiltered dataset. However, given prior work suggesting high specificity of these tools, applying the GenAI to less complex cases may increase apparent accuracy.

The precise impact of these steps on the performance of GenAI performance remains unclear, although there may be bias toward improved accuracy relative to real-world performance or performance at scale. Future studies need to evaluate the method’s efficacy without extensive data filtering to understand its robustness and applicability in raw data contexts. Our application of the GenAI tool was limited to a specific subset of clinical notes, and the inclusion of more complex and potentially contradictory clinical data may impact performance. Additionally, we did not perform a comparative analysis of time saved by using our proposed GPT-4o tool against other established tools such as Voogle, and it is possible that other less computationally complex strategies could yield similar time savings.

Conclusions

HAI surveillance activities are a promising application of GenAI tools, such as VA GPT 4.0, with the potential to streamline the manual review process and save time. Careful attention is needed to ensure appropriate attribution and timing of HAI events. Filtering strategies to reduce the number of cases that use the computationally resource-intensive GenAI tools can enhance feasibility and support scaling. A hybrid informatics approach using machine learning and structured data, followed by GenAI, may facilitate the expansion of HAI surveillance activities.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/ice.2025.10384.

Acknowledgments

We would like to thank Dr Mikhail Kuprian, MD, for his assistance in addressing operational and scaling challenges and for assisting with references related to computing requirements.

Author contribution

Dr Branch-Elliman, Mr. Golenbock, and Ms. Basnet had full access to all of the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis.

  • Study concept and design: DB, DJM, and WBE.

  • Acquisition of data: SG, RL, DJM, and WBE.

  • Analysis and interpretation of data: DB, DJM, HM, JS, SG, DD, MSL, KY, and WBE.

  • Drafting of the manuscript: DB, HM, JS, and WBE.

  • Critical revision of the manuscript for important intellectual content: All authors.

  • Statistical analysis: SG, DB, RL, HM, and WBE.

  • Final approval of the version to be published: WBE.

  • Administrative, technical, or material support: All authors.

  • Study supervision: WBE.

Financial support

This study was supported by VA HSR&D IIR 20-076 (PI, WBE). WBE was supported by the VA Digital Health Office National Artificial Intelligence Institute during the conduct of this study.

Competing interests

All authors declare that they have no competing interests.

Disclaimer

The views expressed are those of the authors and do not necessarily represent those of the US Federal Government or the Department of Veterans Affairs.

References

Rodriguez-Nava, G, Egoryan, G, Goodman, KE, Morgan, DJ, Salinas, JL. Performance of a large language model for identifying Central Line-Associated Bloodstream Infections (CLABSI) using real clinical notes. Infect Control Hosp Epidemiol 2024;46:305308. doi: 10.1017/ice.2024.164.Google ScholarPubMed
Perret, J, Schmid, A. Application of OpenAI GPT-4 for the retrospective detection of catheter-associated urinary tract infections in a fictitious and curated patient data set. Infect Control Hosp Epidemiol 2024;45:9699.CrossRefGoogle Scholar
Mull, HJ, Golenbock, SW, Basnet Thapa, D, et al. Cardiac implantable device infection surveillance algorithm. JAMA Netw Open 2025;8:e2514079.CrossRefGoogle ScholarPubMed
Alshanqeeti, S, Coffey, K, Mcdermott, K, et al. Comparing generative artificial intelligence vs. experts for detection of catheter-associated urinary tract infection (CAUTI). Clin Infect Dis 2025:ciaf486. Online ahead of print.Google Scholar
Wu, JT, Langford, BJ, Shenoy, ES, Carey, E, Branch-Elliman, W. Chatting new territory: large language models for infection surveillance from pilot to deployment. Infect Control Hosp Epidemiol 2025:1–3. doi: 10.1017/ice.2025.20.CrossRefGoogle Scholar
Wu, JT, Shenoy, ES, Carey, EP, Alterovitz, G, Kim, MJ, Branch-Elliman, W. ChatGPT: increasing accessibility for natural language processing in healthcare quality measurement. Infect Control Hosp Epidemiol 2024;45:910.CrossRefGoogle ScholarPubMed
Klang, E, Apakama, D, Abbott, EE, et al. A strategy for cost-effective large language model use at health system-scale. NPJ Digit Med 2024;7:320.10.1038/s41746-024-01315-1CrossRefGoogle ScholarPubMed
Singh, A, Patel, NP, Ehtesham, A, Kumar, S, Khoei, TT. A survey of sustainability in large language models: Applications, economics, and challenges. In: 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC). IEEE;2025:0000800014.CrossRefGoogle Scholar
Shekhar, S, Dubey, T, Mukherjee, K, Saxena, A, Tyagi, A, Kotla, N. Towards optimizing the costs of LLM usage. arXiv preprint arXiv:240201742 2024.Google Scholar
Sun, W, Wang, J, Guo, Q, Li, Z, Wang, W, Hai, R. Cebench: a benchmarking toolkit for the cost-effectiveness of LLM pipelines. arXiv preprint arXiv:240712797 2024.Google Scholar
Yang, W, Some, L, Bain, M, Kang, B. A comprehensive survey on integrating large language models with knowledge-based methods. Knowledge-Based Systems 2025;318:113503.CrossRefGoogle Scholar
Mull, HJ, Stolzmann, KL, Shin, MH, Kalver, E, Schweizer, ML, Branch-Elliman, W. Novel method to flag cardiac implantable device infections by integrating text mining with structured data in the Veterans Health Administration’s electronic medical record. JAMA Net Open 2020;3:e2012264e2012264.10.1001/jamanetworkopen.2020.12264CrossRefGoogle ScholarPubMed
Branch-Elliman, W, Lamkin, R, Shin, M, et al. Promoting de-implementation of inappropriate antimicrobial use in cardiac device procedures by expanding audit and feedback: protocol for hybrid III type effectiveness/implementation quasi-experimental study. Implement Sci 2022;17:12.Google ScholarPubMed
US Centers for Disease Control and Prevention (CDC). Surgical Site Infection Event (SSI). Updated 01/2025. https://www.cdc.gov/nhsn/pdfs/pscmanual/9pscssicurrent.pdf. Accessed July 07, 2025Google Scholar
Stanford University IT. AI Demystified: Introduction to large language models. Updated 12/13/2024. https://uit.stanford.edu/service/techtraining/ai-demystified/llm. Accessed October 02, 2025Google Scholar
Morgan, DJ. Using an LLM for frontline providers to identify healthcare-associated infections across 11 hospitals. Presented at: Medicine and Imaging (AIMI) Conference 2025; Palo Alto, CA.Google Scholar
Shenoy, ES, Branch-Elliman, W. Automating surveillance for healthcare-associated infections: rationale and current realities (Part I/III). Antimicrob Steward Healthc Epidemiol 2023;3:e25.CrossRefGoogle ScholarPubMed
Branch-Elliman, W, Sundermann, AJ, Wiens, J, Shenoy, ES. Leveraging electronic data to expand infection detection beyond traditional settings and definitions (Part II/III). Antimicrob Steward Healthc Epidemiol 2023;3:e27.10.1017/ash.2022.342CrossRefGoogle ScholarPubMed
Siems, A, Banks, R, Holubkov, R, et al. Structured chart review: assessment of a structured chart review methodology. Hospital Pediatrics 2020;10:6169.CrossRefGoogle ScholarPubMed
Zhan, X, Humbert-Droz, M, Mukherjee, P, Gevaert, O. Structuring clinical text with AI: old versus new natural language processing techniques evaluated on eight common cardiovascular diseases. Patterns 2021;2.CrossRefGoogle Scholar
Morgan, DJ, Branch-Elliman, W, Goodman, KE. Time to study implementation of AI-generated discharge summaries. JAMA Intern Med 2025;185:825826. doi: 10.1001/jamainternmed.2025.0827.CrossRefGoogle ScholarPubMed
Williams, CY, Subramanian, CR, Ali, SS, et al. Physician-and large language model–generated hospital discharge summaries. JAMA Intern Med 2025;185:818825. doi: 10.1001/jamainternmed.2025.0821.CrossRefGoogle ScholarPubMed
Uo, Washington. Dynamic LSTM-based MEMORY ENCODER FOR LONG-TERM LLM Interactions. https://arxiv.org/html/2507.03042v1#bib.Google Scholar
Figure 0

Figure 1. Hybrid process for directing GenAI tools to support sustainable scaling for healthcare-associated infection surveillance.GenAI systems have substantial computing infrastructure requirements that limit the feasibility of scaling in healthcare systems. A phased approach, in which less computationally complex algorithms are first applied to direct the GenAI tools to the highest probability cases and then GenAI tools are applied to a smaller, more curated dataset, may provide a pathway to feasible and sustainable deployment of GenAI tools for healthcare applications, such as healthcare-associated infection surveillance.

Figure 1

Table 1. Prompt revisions for accurate cardiovascular implantable electronic device (CIED) infection date extraction and attribution as procedure-related from GPT-4o

Figure 2

Table 2. Accuracy of individual data elements and data organization process conducted by GenAI versus manual review

Figure 3

Table 3. Accuracy of the generative artificial intelligence (GenAI) tool summary for measuring procedure-attributable cardiovascular implantable electronic device infections

Supplementary material: File

Basnet et al. supplementary material

Basnet et al. supplementary material
Download Basnet et al. supplementary material(File)
File 219.5 KB