Abstract
SMILES is the most dominant molecular representation used in AI-based chemical applications, but it has innate limitations associated with its internal structure.
Here, we exploit the idea that a set of structural fingerprints can be used as efficient alternatives to unique molecular representations. For this purpose, we trained neural-machine-translation based models that translate a set of various structural fingerprints to conventional text-based molecular representations, i.e., SMILES and SELFIES. The assessment of their conversion efficiency showed that our models successfully reconstructed molecules and achieved a high level of accuracy. Therefore, our approach brings structural fingerprints into play as strong representational tools in chemical natural language processing applications by restoring the connectivity information that is lost during fingerprint transformation. This comprehensive study addressed the major limitation of structural fingerprints, which precludes their implementation in NLP models. Our findings would facilitate the development of text or fingerprint-based chemoinformatic models for generative and translational tasks.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)