Abstract
Machine learning (ML) models play a crucial role in predicting properties essential to drug development, such as a drug’s logscale acid-dissociation constant (pKa). Despite recent architectural advances, these models often generalize poorly to novel compounds due to a scarcity of ground-truth data. Further, these models lack interpretability, in part due to a dependence on explicit encodings of input molecules’ molecular substructures. To this end, atomic-resolution information is accessible in chemical structures by observing model response to atomic perturbations of an input molecule; however, no methods exist that systematically utilize this information for model and molecular analysis. Here, we present BCL-XpKa, a substructure-independent, deep neural network (DNN)-based pKa predictor that generalizes well to novel small molecules. BCL-XpKa discretizes pKa prediction from a regression problem into a multitask-classification problem, which accumulates data for prediction at biologically relevant pH values and records the model’s uncertainty in its prediction as a discrete distribution for each pKa prediction. BCL-XpKa outperforms modern ML pKa predictors and accurately models the effects of common molecular modifications on a molecule’s ionizability. We then leverage BCL-XpKa’s substructure independence to introduce atomic sensitivity analysis (ASA), which quickly decomposes a molecule’s predicted pKa value into its respective atomic contributions without model retraining. When paired with BCL-XpKa, ASA informs that BCL-XpKa has implicitly learned high-resolution information about molecular substructures. We further demonstrate ASA’s utility in structure preparation for protein-ligand docking by identifying ionization sites in 97.8% and 83.4% of complex small molecule acids and bases. We then apply ASA with BCL-XpKa to understand the physicochemical liabilities and guide optimization of a recently published KRAS-degrading PROTAC.
Supplementary materials
Title
Supplementary Tables
Description
Hyperparameter Optimization for BCL-XpKa
Actions
Supplementary weblinks
Title
The Biology and Chemistry Library
Description
Open-source cheminformatics platform developed by the Meiler lab and used throughout this work.
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)