Background
Antibiotic stewardship programs (ASPs) are successful in reducing inappropriate prescribing, improving patient outcomes, and curbing antibiotic resistance and are now required for hospitals by the Centers for Medicare and Medicaid Services. Reference Kimura, Uda, Sakaue, Yamashita, Nishioka and Nishimura1 ASP implementation may include a wide range of activities such as prospective audit and feedback, de-escalation, educating clinicians, tracking antibiotic use patterns, and reporting to leadership and government agencies. Reference Kimura, Uda, Sakaue, Yamashita, Nishioka and Nishimura1 ASPs involve complex interventions with multiple components including activities to support both individual patient health and population health, and effective communication with physicians and staff. Implementation of ASPs across diverse hospital settings provides crucial opportunities to compare experiences and also to identify determinants of successful ASP implementation.
Despite widespread recognition of the importance of ASPs, few accepted surveys exist to assess their implementation grounded in direct feedback from antibiotic stewards. Existing surveys of stewards have focused on other factors relating to antibiotic stewardship practice but have not specifically examined the implementation process. Reference Tebano, Dyar, Beovic, Claudot, Béraud and Thilly2,Reference Tebano, Dyar, Beovic, Béraud, Thilly and Pulcini3 While some ASP surveys have addressed specific implementation activities, none have used a theoretically derived implementation science determinant framework to identify facets of ASP implementation that may differentiate between more and less successful programs. To address the lack of validated surveys that assess implementation processes and identify determinants of successful ASP implementation, we developed a survey for antibiotic stewards using the Consolidated Framework for Implementation Research (CFIR).
Implementation is a complex endeavor characterized by social and contextual facets. Reference Sarkies, Francis-Auton, Long, Roberts, Westbrook and Levesque4–Reference Mody, Filiatreau, Goss, Powell and Geng7 The CFIR is rooted in knowledge from many disciplines, including organizational change and psychology. CFIR provides a conceptual foundation for studying implementation by defining a “menu” of constructs potentially associated with implementation effectiveness and providing a systematic, comprehensive, and tailorable approach to uncovering drivers of variability in implementation outcomes prospectively. The CFIR is useful for determining pathways to sustained intervention success as each construct represents a theoretically-based determinant. Reference Nevedal, Widerquist, Reardon, Arasim, Jackson and White8 Psychometric validation incorporates methods to assess measurement properties to determine whether a measure is assessing what it intends to measure. Psychometric validation of CFIR consistent survey measures has been used to identify optimal measures of implementation for pediatric Intensive Care Units and in behavioral health. Reference Dodds, Redsell, Timmons and Manning9,Reference Powell, Mettert, Dorsey, Weiner, Stanick and Lengnick-Hall10
The CFIR consists of five broad domains: 1. Characteristics of the Intervention , 2. Outer Setting , 3. Inner Setting , 4. Characteristics of Individuals , and 5. Process . Domains are comprised of constructs (39 in total) that describe more specific components of the domain. The CFIR has been used to assess implementation of many types of innovations across diverse settings. Reference Damschroder, Aron, Keith, Kirsh, Alexander and Lowery11–Reference Robey, Margolies, Sutherland, Rupp, Black and Hill13 Some constructs have not been quantitatively measured, limiting the opportunity for survey validation, and others have been measured only rarely. In 2016, Clinton-McHarg and colleagues conducted a review of survey measures aligned with CFIR constructs and administered in public health and community settings Reference Clinton-McHarg, Yoong, Tzelepis, Regan, Fielding and Skelton14 and found that 5 of the 39 CFIR constructs were not included by any of the measures evaluated. Our objective was to develop survey measures of determinants of implementation success across the CFIR’s 5 domains and assess the psychometric validity of those measures in the context of ASP implementation. This validation will permit future work examining implementation of ASPs across facilities using these survey measures.
Methods
Study Settings and Approach: The current study is one component of a larger mixed-methods study of antibiotic stewardship at 20 Intermountain Healthcare hospitals and 134 Veterans Health Administration (VHA) Medical Centers across the United States. Reference Barlam, Childs, Zieminski, Meshesha, Jones and Butler15 Our study examines the psychometric properties of the CFIR-based survey of ASP stewards in VHA settings only. We evaluated the factor structure of the survey using confirmatory factor analysis (CFA), the appropriate technique when there is a theoretical foundation underlying the expectations for the data structure. Reference Millsap and Everson16,Reference Tavakol and Wetzel17 CFA is designed to identify latent constructs in a data structure. Latent constructs are not directly observable but can be inferred from survey items. General examples include a construct such as motivation. Motivation cannot be measured directly but could be inferred based on specific questions assessing interest in performing a task.
Implementation science concepts – such as engaging – are latent constructs. Assessing whether the data structure based on survey items is, in practice, consistent with expected latent constructs in alignment with theory provides evidence for validity of the survey. We also assessed face validity (whether the questions seem to represent the constructs), discriminant validity of the constructs (statistical evidence that the constructs were measuring the distinct concepts), and internal consistency of the items within each construct (indicating that the items align with each other in measuring a similar construct). We reviewed site-level survey data to assess potential determinants (barriers and facilitators) to antibiotic stewardship implementation that can later be tied to implementation outcomes.
Survey Development and Characteristics: The study team developed initial survey items via team collaboration. Members of the study team who are antibiotic stewardship experts (MS, TB, MG, KMK, ES) and survey methodology, implementation science, or CFIR-specific experts (CR, LD, MLD, JB) used the online CFIR technical assistance website 18 to develop survey items consistent with antibiotic stewardship implementation in CFIR-recommended structure to assess constructs. Candidate questions were discussed by the entire study team at length and reviewed by key CFIR experts (LD, CR) and the modified survey was piloted with antibiotic stewards. Final revisions incorporated suggestions from all levels of review and piloting.
The administered survey comprised 72 items representing all five CFIR domains and 22 CFIR constructs considered relevant to antibiotic stewardship implementation. Survey items were rated on a Likert scale from 1 to 5, where 1 represented “strongly disagree” 2 “disagree, 3 “neither agree nor disagree” 4 “agree” and 5 represented “strongly agree,” with an additional “don’t know” option (survey items in results; Table 1).
Table 1. Survey descriptive summary

Note. * indicate reverse-coded items.
Recruitment and Participants: We identified 289 physician and pharmacist antibiotic stewards at VHA hospital sites based on a list of persons in those roles reported through VA surveys, identification of role on websites, or partners. In January 2018, we sent emails to each VA steward inviting them to complete the REDCap survey online. Reference Harris, Taylor, Thielke, Payne, Gonzalez and Conde19 At least one response was obtained from each of 110 VHA hospitals. Reference Burrowes, Drainoni, Tjilos, Butler, Damschroder and Goetz20 At the hospital level, the response rate was 81% whereas the individual steward response rate was 52%. Our analysis was at the hospital level, and 81% is a high response rate. A comparison of emographics respondents and non-respondents demonstrated significant differences in role of respondent between groups (Table 2).
Psychometrics evaluation of the antibiotic steward CFIR survey:
Although our survey measures assessed all 5 CFIR domains, we evaluated the psychometric properties of the 3 CFIR domains and 8 survey measures of constructs with three or more items. The methods used to evaluate the psychometric properties of the construct using CFA require a minimum of 3 items. For transparency and to support other research, all survey questions are included in Table 1. For hospitals with more than one survey respondent, responses were aggregated at the hospital level by averaging them. In the final analysis, 110 hospitals were included, among which 40 had more than 1 respondent. We reverse-scored items that were measuring the trait in the opposite direction (see asterisks by items in Table 1). All analyses were done at the hospital level.
We assessed the internal consistency and the unidimensional contribution of each construct using Cronbach’s alpha and McDonald’s Omega. Omega uses a more conservative standard with purportedly less bias, thus we present both. Reference Dunn, Baguley and Brunsden21 Internal consistency, an indicator that a group of questions are measuring the same underlying concept, was considered acceptable if >0.7. Reference Brown22,Reference Bentler23 Floor and ceiling rates were provided for each construct and individual items to demonstrate the percent of time respondents chose the lowest possible (floor) or highest possible (ceiling) rating for each item (or construct). For constructs, we considered floor and ceiling rates at <10% to be acceptable. To assess discriminant validity, we examined correlations between constructs. Correlations below 0.80 are considered below threshold and indicate good discriminant validity. Correlations above 0.80 suggest measurement overlap between constructs. Reference Campbell and Fiske24
We performed CFAs to assess whether the expected theoretical CFIR construct from our survey on antibiotic stewardship implementation was supported by the survey data. We used the LAVAAN statistical package available in R for analyses. 25
For constructs with ≥4 items, we fitted single-factor congeneric models. Constructs with only 3 items result in saturated congeneric models, which cannot be evaluated for goodness-of-fit. In such cases, we used the more restrictive tau-equivalent model, which assumes that the item loadings are equal. Item loadings represent a correlation between specific items (eg, survey questions) and the underlying factor. Thus in tau-equivalent models, each item is constrained to contribute equally to the factor. For congeneric models with inadequate fit, we relaxed the assumption that item residuals were uncorrelated. Modifying models to allow for correlated item residuals is appropriate when justified both statistically and from theoretical models of the items. 26 In leadership engagement, items relating to drivers of the intervention (eg, mandates), authority, and structure (protected time) were allowed to have correlated residuals. In Knowledge and Beliefs about Intervention items related to receptivity to the intervention (receptivity and understanding) as well as concerns (limits on autonomy and delays) were allowed to have correlated residuals. For the construct Engaging, items related to perceived success in collaboration (with teams or other individuals) were allowed to correlate. For transparency, the fit indices for congeneric models without the relaxed assumption are available in supplementary materials.
To assess model fit, we used widely recommended indices. Reference Hu, Bentler and Hoyle27–Reference Hooper and Mullen29 In these models we assessed Chi square (non-significant value = good fit), comparative fit index (CFI>0.95 = good fit), the Tucker-Lewis index (TLI>0.95 = good fit), the root mean square error of approximation (RMSEA, <0.08 = good fit) and standardized root mean square residual (SRMR<0.05 = good fit, <.08 = mediocre fit).
We used a multidimensional scaling plot to visualize relationships among items and constructs. Since there was a mixture of Likert and continuous items, relationships were quantified using Gower’s distance. Reference Gower30 We used uniform coloring for items within a construct (see Figure 1).

Figure 1. Relationships among CFIR items by Construct.

Figure 2. Construct responses.

Figure 3. Factor loadings by CFIR construct.
Results
A total of 110 hospitals participated in this survey. Reference Burrowes, Drainoni, Tjilos, Butler, Damschroder and Goetz20 Survey item mean scores ranged from 3.0–3.9 for 28 of 43 items (65%) indicating that the average response at the hospital level for those items was between “neither agree nor disagree” and “agree.” (Table 1). Construct means ranged between 3.2 (Access to Knowledge) and 4.0 (Knowledge and Beliefs About the Intervention) (Figure 2). High-ranked individual items exhibited ceiling effects. Ceiling effects were most pronounced within the construct Readiness for Implementation: Leadership engagement. Internal consistency was acceptable to high for 6 of the 8 constructs and marginal (>0.70) for the remaining 2, construct 6, Access to Knowledge and construct 7, Engaging, which had alpha values of 0.66 and 0.61, respectively.
Model Fit. The fit of models was excellent for 4 models –representing two constructs in the Inner Setting domain; Culture and Leadership Engagement and for two constructs in the Process domain Planning/Executing, and Engaging. (Table 3). For 3 models, the fit was adequate, one in Inner Setting , construct Access to Knowledge and Information, one in Characteristics of Individuals , construct Knowledge and Beliefs About the Intervention, and one in Process domain, Reflecting and Evaluating. The model in Characteristics of Individuals , construct Self-efficacy had mediocre fit but high reliability.
Table 2. Comparison of respondents and nonrespondents

Table 3. Construct mean, models and internal consistency (α,θ) and indications of fit

*Use tau equivalent model – constraining loadings to be equal.
^Allowing correlations among residuals.
Factor loadings were consistently high for construct 5, Self Efficacy. For construct 4, Knowledge and Beliefs about the Intervention (see Figure 3 low loadings were found on items relating to overall staff receptivity, nursing burden, pharmacy burden, and pharmacist ratings of importance For construct 7, Engaging, the item “I work well with individual clinicians” loaded poorly. High factor loadings represent a strong relationship between the individual item and the latent factor whereas low loadings suggest complexity in the relationship between these items and the factor.
Discriminant validity: Pairwise correlations between constructs were below the threshold of 0.80, indicating acceptable discriminant validity, with one exception. Construct 5, Self-Efficacy, and Construct 6, Engaging, had a correlation coefficient of 0.81, thus failing the test for discriminant validity (see Table 4).
Table 4. Correlations between constructs

Multidimensional Scaling: We screened correlations between individual items of the two construct scales which had below acceptable discriminant validity to understand the relationship between items. The highest correlation was 0.66 which was between the Engaging item “I work well with interdisciplinary teams” and the Self-efficacy item, “I offer clinicians options regarding antibiotic decision making at my facility.” Lower inter-item correlations were found for other single-scale items. The construct survey measures were highly correlated, and the specific inter-item correlations suggest similarity between the theoretical constructs (see supplement for full table). The multidimensional scaling plot indicated that items within the Self-Efficacy construct also particularly correlated (see Figure 1).
Discussion
The goal of this analysis was to develop and psychometrically evaluate a CFIR-based survey instrument in the context of ASP implementation. A psychometric validation process for a survey is designed to evaluate whether a survey is measuring concepts reliably (in a consistent way) and validly (measuring the constructs it intends to measure) and is a key step in conducting research and quality improvement. We assessed the responses to the CFIR survey to determine whether the survey questions within the models met our expectations for structure and consistency and whether the individual models were independent of each other (ie, not highly correlated) and thus able to provide novel information.
Other surveys have described the development and components of antibiotic stewardship programs, but many have focused primarily on establishing stewardship across sites for comparison Reference Burgess, Miller, Cooper, Moody, Englebright and Septimus31 and exploring attitudes toward stewardship. Reference Zetts, Garcia, Doctor, Gerber, Linder and Hyun32 Existing studies have demonstrated the validity of survey instruments intended to measure the Inner Setting domain and its component constructs, Reference Fernandez, Walker, Weiner, Calo, Liang and Risendal33,Reference Walker, Rodriguez, Vernon, Savas, Frost and Fernandez34 it is important to note that our study is the first to confirm construct validity in a measure in the context of antibiotic stewardship using 3 out of 5 CFIR domains. Our survey includes questions within the rarely measured Process domain, which envelops the Champions construct applied to stewardship. Reference Clinton-McHarg, Yoong, Tzelepis, Regan, Fielding and Skelton14 Validated survey measures that include multiple CFIR constructs advance the field of implementation science and of ASP implementation in particular.
Our survey demonstrated multidimensional validity based on our theory-based survey measure and the results of our CFA. Most of our CFA models exhibited excellent or very good fit to the data. We also demonstrated internal consistency of the survey measures. Our results showed discriminant validity for most constructs – indicating that each construct is different from the other survey constructs. Where discriminant validity was marginal, between the constructs Engaging and Self-Efficacy, there are distinct similarities between the theoretical and behavioral concepts being measured. Namely, Engaging addresses visibility, support, and capabilities whereas Self-Efficacy addresses beliefs that one can capably perform specific were actions. These similarities suggest further work. There were negligible floor and ceiling effects at the construct level. As a result, our survey should perform well at discriminating among sites with both low and high performance although this will need confirmation in future studies. Overall, our survey demonstrates psychometric validity and can be used as designed. Although we did not assess ASP implementation outcomes in this paper, validating these survey measures will allow our team and other teams to assess the relationship between these validated survey measures and implementation of stewardship programs and stewardship outcomes in future work. Reference Damschroder, Reardon, Opra Widerquist and Lowery35
construct.
However, our results also bring to light some key issues that should be considered by research teams examining determinants of antibiotic stewardship implementation. First, our work points to particular determinants that may be important to better understand or measure over time as possible harbingers of ASP success. Items in the Self-efficacy and Engaging constructs related particularly to individual characteristics, with most items beginning with “I” (11 of the 13 items across the survey measures). It will be important to investigate the relationship between individual sense of agency and ASP success in future work.
In some cases, models with correlated item variance may indicate that unmeasured variables remain. For example, the Readiness for Implementation: Leadership Engagement model, we potentially identified evidence of an unmeasured variable representing the perception of a “compulsory” component of the ASP intervention demonstrated by low factor loadings for the item relating to external mandates for ASP. It is possible that stewards associate a mandate with external pressure on leadership and that this may be different from other aspects of leadership engagement. These findings point to the complex interplay between individual beliefs, autonomy and motivation, and how to identify individual versus collective forces for change. This seems particularly suitable for antibiotic stewardship environments, which must carefully weigh individual versus collective priorities and motivations. Reference Sutton and Ashley36
Our findings also point to Leadership Engagement as potentially more motivating than a mandate, which implies low autonomy. Better understanding of the interplay between a mandate and engagement could support efficient design of mature stewardship programs. Our results also suggest a potentially unmeasured construct relating pharmacist beliefs about importance of ASP interventions to general receptivity to intervention among the staff. Potentially pharmacist communication about beliefs (even non-verbal) may have an outsized influence on their colleagues. If confirmed, this finding could promote additional practical and theoretical contributions to ASP development. Reference Orbell, Hodgkins and Sheeran37,Reference Hruza, Velasquez, Madaras-Kelly, Fleming-Dutra, Samore and Butler38 Our work points to the importance of individual cognition, motivation, and social cognitive approaches. This is consistent with other work addressing individual cognition and social dynamics for ASP interventions. Reference Taber, Weir, Butler, Graber, Jones and Madaras-Kelly5,Reference Jones, Butler, Graber, Glassman, Samore and Pollack39
Healthcare environments are complex, rapid-paced, cognitively challenging environments. Sociotechnical systems models and methods are designed to elucidate and solve important challenges related to communication, human-computer interaction, cognition, and motivation. Reference Sittig and Singh40–Reference Weir, Taber, Taft, Reese, Jones and Del Fiol42 It is imperative that we continue to tackle complex problems with the deeper, interdisciplinary approaches that sociotechnical systems and implementation science are advancing in antibiotic stewardship.
Implications for future ASP research
This psychometrically validated survey can be used by antibiotic stewards, quality improvement staff, and researchers to assess and report ASP implementation within and/or across ASPs. This survey may be useful for exploratory assessment of implementation domains and constructs at a specific site, for example, for a prospective hospital site champion to assess leadership readiness prior to ASP implementation. It is important to note that our survey is validated for antibiotic stewards, and further work is needed to validate performance by reporters in different clinical roles.
Limitations: Our results should be understood in the context of the following limitations. First, we have a relatively small sample for a confirmatory factor analysis based on the hospital-level analysis, but it is important to note that for all 8 constructs we tested single-factor models, thus power should be adequate. Conducting analyses at the hospital level may also have influenced some of the domains and constructs, particularly those within the Characteristics of Individuals domain. A site-level rating of Self-Efficacy may represent a combined “self” across respondents when answers by multiple individuals at a single site were different. Yet, despite these limitations, the data were a good fit for the models. Our sample of intermountain health hospitals was too small to assess whether the construct validity for our survey was equally strong across different health systems. Further validation work with larger non-VHA systems will allow this type of comparison. In addition, there may be other concepts that need exploration, such as whether the “Readiness for Implementation: Leadership Engagement” reports could be related to social desirability influencing responses. Reference Schramm, Byrne and Sweetnam43 In addition, CFIR constructs are evolving. Not all constructs included in our survey may be applicable to survey users who would like to focus on the updated CFIR. Reference Damschroder, Reardon, Opra Widerquist and Lowery35,Reference Damschroder, Reardon, Widerquist and Lowery44 Finally, this work was conducted before many hospitals were challenged in 2020 and 2021 by the COVID-19 pandemic. Establishing a baseline is very important but future work may be needed to understand subsequent changes in VHA.
Conclusions: Validated surveys are needed to assess the implementation of antibiotic stewardship across sites. In contrast to earlier work, this robust suite of CFIR survey questions specific to antibiotic stewardship can be used to complement other data collection methods addressing stewardship implementation and promote implementation science growth. In effect, use of our survey can address the contextual components of implementation in greater detail and its relationship to ASP outcomes than has previously been possible. Our findings may guide future scale modifications for teams interested in studying ASP implementation. Reference Dunn, Baguley and Brunsden21
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/ash.2025.65
Data availability statement
The datasets generated and/or analyzed during the current study are not publicly available due to data use restrictions but are available from the corresponding author on reasonable request with Institutional Review Board approval and a Data Use Agreement.
Acknowledgments
The authors wish to thank Pascal DeBrock and the University of Utah Consortium for Families and Health Research and Lynd Bacon for contributing psychometrics expertise throughout preparation of this manuscript. We also thank the antibiotic stewards who participated in this research, Jessica Cole and Catherine Loc-Carrillo for their able management of this project, and Carrie Milligan for her editorial review.
Author contribution
JB contributed to conception, analysis, and interpretation of this work and drafting all versions of the manuscript. EC, TB, MD, CR LD, and MS contributed substantially to the conception of this work and the acquisition and interpretation of data. YZ JS and PT contributed to analysis of data. KMK, MG, and ES contributed substantially to the acquisition of data. All authors have reviewed, contributed to, and approved the final manuscript. All authors agree to be personally accountable for the authors own contributions.
Financial support
This work was funded by the Agency for Healthcare and Research Quality (Grant Number 5RO1HS025175-03).
Competing interests
The authors have no competing interests to be declared.
Ethical standard
The Institutional Review Boards at Boston University Medical Campus and University of Utah reviewed and approved all study activities (U of Utah IRB # 00099983). A waiver of documentation of consent was approved.
Consent for Publication
Not applicable.
Appendix
List of Abbreviations:
Antibiotic Stewardship Programs (ASPs)
Confirmatory Factor Analysis (CFA)
Comparative fit index (CFI)
Consolidated Framework for Implementation Research (CFIR)
Root mean square error of approximation (RMSEA)
Standardized root mean square residual (SRMR)
Tucker-Lewis index (TLI)
Veterans Affairs (VA)