Abstract
Achieving true transferability remains the central challenge for Machine Learning Interatomic Potentials (ML-IAPs) in modeling complex bimetallic nanoclusters across their vast potential energy surfaces. We systematically investi- gate data selection strategies to optimize the Chebyshev Interaction Model for Efficient Simulation (ChIMES) po- tential for the Bi-Pt nanoclusters by comparing three innovative sampling methods: Principal Component Analysis (PCA)/k-means (structural diversity), t-distributed Stochastic Neighbor Embedding (t-SNE)/k-means (force-space di- versity), and hierarchical clustering. Quantitatively, the PCA/k-means strategy proved most effective for global ac- curacy, yielding the lowest force errors and achieving energy root mean square errors (RMSE) values competitive with DFT, demonstrating excellent accuracy (19.16 meV/atom). Structural validation on 34 unique DFT-optimized isomers further confirmed the potential’s high fidelity, with the best model PCA/k-means reproducing structures with an average root mean square deviation (RMSD) of 0.10 Å. However, the t-SNE methods, by maximizing diversity in the force space, demonstrated superior extrapolative power, leading to the more precise prediction of a novel stel- lated octadecagon Bi18Pt24 cage structure, demonstrating the potential for exploring previously unseen morphologies. Our results establish a clear methodology for strategic data sampling that successfully maximizes ML-IAP transfer- ability, providing an accurate and computationally efficient tool that accelerates the theoretical discovery of complex bimetallic architectures.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)