Abstract
Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design. In this work, we investigate various strategies for combining molecular representations and different model architectures with macroscopic pKa calculations, with the goal of finding a generally applicable aqueous-solubility-prediction method. We find that a wide range of different machine-learning approaches yield similar outcomes. We also show that the pH dependence of aqueous solubility can be accurately predicted by combining a single aqueous-solubility prediction with the pH-dependent microstate ensemble.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)