Hostname: page-component-5db58dd55d-d6ndz Total loading time: 0 Render date: 2026-05-31T23:56:00.672Z Has data issue: false hasContentIssue false

Healthcare-associated infections in Italian long-term care facilities: a machine learning analysis of a 12-month cohort

Published online by Cambridge University Press:  08 April 2026

Anna Caterina Leucci*
Affiliation:
Department of Statistical Science, University of Padua, Padua, Italy Emilia-Romagna Region: Regione Emilia-Romagna, Italy
Elena Sasdelli
Affiliation:
Emilia-Romagna Region: Regione Emilia-Romagna, Italy
Luana Caselli
Affiliation:
Emilia-Romagna Region: Regione Emilia-Romagna, Italy
Elisa Fabbri
Affiliation:
Emilia-Romagna Region: Regione Emilia-Romagna, Italy
Elena Berti
Affiliation:
Emilia-Romagna Region: Regione Emilia-Romagna, Italy
Costanza Vicentini
Affiliation:
University of Turin: Universita degli Studi di Torino, Italy
Carla Maria Zotti
Affiliation:
University of Turin: Universita degli Studi di Torino, Italy
Katrien Latour
Affiliation:
Sciensano, Belgium
Enrico Ricchizzi
Affiliation:
Emilia-Romagna Region: Regione Emilia-Romagna, Italy
*
Corresponding author: Anna Caterina Leucci; Email: annacaterina.leucci@unipd.it
Rights & Permissions [Opens in a new window]

Abstract

Objectives:

To estimate the incidence of healthcare-associated infections (HAIs) in Italian long-term care facilities (LTCFs) and to evaluate whether an artificial intelligence (AI) approach, through unsupervised machine learning (ML), could stratify residents into clinically distinct groups with differing susceptibility to HAIs.

Design:

Prospective cohort study with 12-month follow-up.

Setting:

24 LTCFs in Italy, participating in the European Centre for Disease Prevention and Control 12-month longitudinal study on HAIs in LTCFs, 2022–2023.

Participants:

395 residents enrolled across the participating LTCFs.

Methods:

Incidence measures of HAIs (rate and ratio) were estimated, using generalized estimating equations. A hierarchical cluster analysis based on residents’ clinical and demographic characteristics was implemented as an unsupervised ML approach.

Results:

Overall, 75 HAIs per 100 residents (95% CI, 70.3–78.3) and 0.23 HAIs per 1,000 resident-days (95% CI, 0.11–0.76) were estimated. Respiratory tract infections (29.5%, 95% CI 24.2–31.1), COVID-19 (26.3%, 95% CI 22.1–28.4), and urinary tract infections (15%, 95% CI 11.0–35.4) were the most frequent. Clustering identified two reproducible resident groups: Group 1 (39%), more independent and cognitively preserved, with fewer comorbidities and lower infection incidence; and Group 2 (61%), more dependent and clinically complex, with higher incidence of HAIs. Cluster stability was high (mean ARI = 0.83).

Conclusions:

This study confirms the high burden of HAIs in Italian LTCFs and provides exploratory evidence that AI-based clustering can identify reproducible HAI susceptibility profiles in a setting where such approaches have been scarcely applied.

Information

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America
Figure 0

Table 1. Demographic, functional, clinical and comorbidity variables included in the machine-learning clustering analysisTable 1 long description.

Figure 1

Table 2. Demographic and clinical characteristics of LTCF residents, overall and by groups, after cluster analysisTable 2 long description.

Figure 2

Table 3. Crude percentage, estimated ratio and rate by type of HAIs in the total sample (n = 395)Table 3 long description.

Figure 3

Table 4. Percentage of type of HAIs by groupsTable 4 long description.

Figure 4

Figure 1. Figure 1 long description.Visual representation of the clustering results (N = 395). Note: G1 and G2 are represented by ovals with the corresponding clinical conditions reported within them. The conditions positioned at the intersection of the ovals are those that do not show statistically significant differences between the two groups. Conversely, the conditions placed within each oval are those that differ significantly between the groups; each condition is reported in the group where it has a higher prevalence. The font size is proportional to the percentage of residents presenting each clinical condition. The dashed circle indicates the HAIs that are more prevalent in each group.

Supplementary material: File

Leucci et al. supplementary material

Leucci et al. supplementary material
Download Leucci et al. supplementary material(File)
File 16.1 KB