Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel workflow that combines graph neural network (GNN) models with semiempirical quantum mechanical (QM) features to achieve exceptional accu- racy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark datasets, with root mean square errors (RMSEs) between 0.5-0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Supplementary materials
Title
Supplementary Information
Description
Molecular descriptors for the initial training set, experimental training set, test sets, protonation and deprotonation differences with SMARTS patterns, graph features used in the model, feature importance rankings, similarity scores, best and worst predictions in test sets, and parallel performance of the model.
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)