A Physics-Informed Fingerprint for Generalizable Prediction of Supramolecular Stability

10 September 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

The rational design of supramolecular systems is a central challenge in materials science, yet predictive modeling is often hampered by a trade-off between the black-box complexity of high-dimensional descriptors and the domain-specific limitations of specialist models. Brute-force featurization strategies inevitably suffer from informational dilution, obscuring the critical, localized physics of non-covalent interactions. This study introduces a definitive solution to this long-standing problem: the Kulkarni-NCI Fingerprint (KNF), a compact, 9-feature, physics-informed descriptor engineered to be both informationally dense and interpretable. On its native domain of 2,600 Deep Eutectic Solvent complexes, the KNF demonstrates robust and reliable predictive accuracy with R2 = 0.793, representing a 47% relative improvement over state-of-the-art structural descriptors. A particularly notable outcome is the KNF’s demonstrated capacity for generalization, suggesting promise for broader supramolecular applications. By training a single “Universal Model” on a pan-chemical dataset uniting DES complexes with the chemically diverse S66x8 benchmark, this work demonstrates that one model can master the distinct physics of both hydrogen-bond- and dispersion-dominated domains simultaneously, with R2 for DES ≈ 0.69 and R2 for S66 ≈ 0.96. SHAP analysis suggests that the KNF provides a clear signature, allowing the model to align with domain-specific physical rules and supporting a form of implicit domain adaptation in chemical AI. Finally, the KNF’s universal applicability is confirmed on the S30L benchmark of large host–guest systems, where a domain-specific model achieves notable performance with R2 = 0.803. This research provides a framework for developing generalizable and interpretable models, positioning the KNF as a promising tool for rational materials design.

Supplementary materials

Title
Description
Actions
Title
Supporting Information for: A Physics-Informed Fingerprint for Generalizable Prediction of Supramolecular Stability
Description
This Supporting Information provides a comprehensive technical appendix with exhaustive details, data, and validations that underpin the claims of the main manuscript. Section S1 presents the definitive performance benchmark of the 9-feature Kulkarni-NCI Fingerprint (KNF) against generic (Mordred), first-principles (Coulomb Matrix), and state-of-the-art structural (SOAP) descriptors on the ~2,600-complex DES dataset. Section S2 details the hierarchical feature engineering process, showing the methodical performance evolution of the KNF. Subsequent sections provide a comprehensive benchmark of fifteen machine learning model families (Section S3) and a full KNF ablation study (Section S4). Section S5 offers deeper model interpretability through global SHAP analyses and t-SNE visualisations of the learned chemical space. Finally, the document presents the full analysis of the KNF's performance on the S30L benchmark of large host-guest systems, including the data-driven analysis of the zero-shot prediction failure (Section S6) and the successful validation of the S30L-specialist model (Section S7).
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.