Accurately predicting solubility curves via a thermodynamic cycle, machine learning, and solvent ensembles

06 August 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Determining solubilities of organic molecules is critical in various fields such as pharmaceuticals, agrochemicals, and environmental science. Knowing how a solute will dissolve in different solvents and at different temperatures is essential for drug formulation, synthesis, purification, and crystallization. Hard-to-estimate solubility limits currently hinder the design of new processes, making innovation more expensive. We propose a fast and general method for predicting the solubilities of neutral organic molecules in a wide range of solvents and temperatures. Our method uses a thermodynamic fusion cycle to combine machine learning predictions of the activity coefficient, fusion enthalpy, and melting point temperature. This method was tested on a combined dataset with more than 100,000 experimental solubility values, showing better or comparable performance to competing methods on many solubility benchmarks even at elevated temperatures. We also introduce reference ensembling to leverage all available experimental solubilities for a given solute in estimating its solubility in a different solvent. Reference ensembling is also shown to enhance the robustness of models trained directly on solubility data.

Keywords

Solubility
Pharmaceuticals
Machine Learning
Property prediction
Melting point
Enthalpy of fusion
Activity coefficients

Supplementary materials

Title
Description
Actions
Title
Supporting Information: Accurately predicting solubility curves via a thermodynamic cycle, machine learning, and solvent ensembles
Description
The supporting information includes derivations of key equations, details on data distributions, parity plots, and a discussion of error propagation.
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.