Abstract
Determining the pKa values of various C-H sites in organic molecules offers valuable insights for synthetic chemists in predicting reaction sites. As molecular complexity increases, this task becomes more challenging. This paper introduces pKalculator, a quantum chemical (QM)-based workflow for automatic computations of C-H pKa values, which is used to generate a training dataset for a machine learning model (ML). The QM workflow is benchmarked against 695 experimentally determined C-H pKa values. The ML model is trained on a diverse dataset of 775 molecules with 3910 C-H sites. Our ML model predicts C-H pKa values with a mean absolute error (MAE) and a root mean squared error (RMSE) of 1.24 and 2.15 pKa units, respectively. Furthermore, we employ our model on 1043 pKa-dependent reactions (Aldol, Claisen, and Michael) and successfully indicate the reaction sites with a Matthew’s correlation coefficient (MCC) of 0.82.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)