An Ensemble Model for Molecular Solubility Prediction Across Aqueous and Organic Solvents

17 November 2025, Version 2
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Solubility underpins decisions in drug discovery, separations, crystallization, formulation, and materials synthesis, yet accurate prediction across diverse solute–solvent pairs and temperatures remains challenging. We present a supervised learning framework that unifies two high-quality resources—AqSolDB for aqueous solubility and BigSolDB v2.0 for solubility in 213 solvents—into a standardized schema with RDKit-derived 2D descriptors for both solute and solvent, plus temperature. The modeling stack couples gradient-boosted decision trees (XGBoost) and a lightweight one-dimensional convolutional neural network trained on tabular descriptors. A validation-driven weight optimization determines ensemble contributions. The approach is data-efficient, reproducible, and deployable: it requires only SMILES of solute and solvent and temperature to estimate log10 S (in mol/L). On held-out validation and test splits, the ensemble achieves R² = 0.945 and RMSE = 0.341 log units, consistently outperforming single learners. We provide a complete, reproducible workflow with explicit data lineage and source code.

Keywords

solubility prediction
QSAR/QSPR
RDKit descriptors
ensemble learning
XG- Boost
convolutional neural networks
aqueous and organic solvents

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.