Abstract
The rational design of supramolecular systems is a central challenge in materials science, yet predictive modeling is often hampered by a trade-off between the black-box complexity of high-dimensional descriptors and the domain-specific limitations of specialist models. Brute-force featurization strategies inevitably suffer from informational dilution, obscuring the critical, localized physics of non-covalent interactions. This study introduces a definitive solution to this long-standing problem: the Kulkarni-NCI Fingerprint (KNF), a compact, 9-feature, physics-informed descriptor engineered to be both informationally dense and interpretable. On its native domain of 2,600 Deep Eutectic Solvent complexes, the KNF demonstrates robust and reliable predictive accuracy with R2 = 0.793, representing a 47% relative improvement over state-of-the-art structural descriptors. A particularly notable outcome is the KNF’s demonstrated capacity for generalization, suggesting promise for broader supramolecular applications. By training a single “Universal Model” on a pan-chemical dataset uniting DES complexes with the chemically diverse S66x8 benchmark, this work demonstrates that one model can master the distinct physics of both hydrogen-bond- and dispersion-dominated domains simultaneously, with R2 for DES ≈ 0.69 and R2 for S66 ≈ 0.96. SHAP analysis suggests that the KNF provides a clear signature, allowing the model to align with domain-specific physical rules and supporting a form of implicit domain adaptation in chemical AI. Finally, the KNF’s universal applicability is confirmed on the S30L benchmark of large host–guest systems, where a domain-specific model achieves notable performance with R2 = 0.803. This research provides a framework for developing generalizable and interpretable models, positioning the KNF as a promising tool for rational materials design.
Supplementary materials
Title
Supporting Information for: A Physics-Informed Fingerprint for Generalizable Prediction of Supramolecular Stability
Description
This Supporting Information provides a comprehensive technical appendix with exhaustive details, data, and validations that underpin the claims of the main manuscript.
Section S1 presents the definitive performance benchmark of the 9-feature Kulkarni-NCI Fingerprint (KNF) against generic (Mordred), first-principles (Coulomb Matrix), and state-of-the-art structural (SOAP) descriptors on the ~2,600-complex DES dataset. Section S2 details the hierarchical feature engineering process, showing the methodical performance evolution of the KNF.
Subsequent sections provide a comprehensive benchmark of fifteen machine learning model families (Section S3) and a full KNF ablation study (Section S4). Section S5 offers deeper model interpretability through global SHAP analyses and t-SNE visualisations of the learned chemical space.
Finally, the document presents the full analysis of the KNF's performance on the S30L benchmark of large host-guest systems, including the data-driven analysis of the zero-shot prediction failure (Section S6) and the successful validation of the S30L-specialist model (Section S7).
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)