Automated QSAR — how good is it in practice?

16 January 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Over the past two decades, quantitative structure–activity relationship (QSAR) modeling has evolved substantially, driven by improved data accessibility, open-source descriptor generation, mature ma- chine learning libraries, and scalable cloud computing. Large-scale benchmarking studies using public datasets have demonstrated the feasibility of building predictive models across hundreds of endpoints. In parallel, automated machine learning (Auto-ML) approaches have emerged as a promising means to lower the barrier to QSAR model development, enabling competitive performance without extensive expert intervention. Here, we describe the design and implementation of an automated QSAR modeling system inte- grated into the CDD Vault platform, referred to as CDD Vault Inference Models. The system auto- matically trains, evaluates, and deploys regression models whenever new assay data become available, without requiring users to select endpoints, descriptors, or learning algorithms. Using public datasets from ChEMBL, we developed a fully automated workflow for model training and continuous evaluation. Models are released when a conservative performance threshold is achieved. The system is currently focused on building regression models. To give users a handle on model uncertainty, we also provide conformal prediction intervals. We discuss the implications of deploying fully automated QSAR models in a production environ- ment and outline future extensions. Together, this work demonstrates that automated, continuously updated QSAR modeling can provide practical and scalable decision support for drug discovery, par- ticularly in settings where dedicated modeling expertise is limited.

Keywords

QSAR
Automated Machine Learning
Auto-ML
Conformal prediction intervals

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.