Abstract
This work investigates the transfer of chemical knowledge through mechanism-based descriptors and their impact on reaction yield prediction. By encoding mechanistic insight into domain-specific features, the proposed approach introduces a beneficial inductive bias. Using a Mizoroki-Heck C-glycosylation reaction as a case study, we first provide an overview of the reaction mechanism, then describe the dataset construction and featurization strategy, followed by classification model training and evaluation. While regression predicts continuous outcomes, such as reaction yields, a classification task can indicate whether a reaction is likely to proceed under standard conditions. Identifying feasible reactions is often more important at early stages than achieving high yields. By prioritizing feasibility, this approach provides a practical tool for rapidly screening reaction conditions. The results demonstrate that incorporating mechanism-guided features leads to a significant improvement in predictive performances, with gains of up to + 14% for balanced accuracy and +25% for F1-score in a low-data classification. Feature importance analysis further highlights the relevance of these descriptors, linking model predictions to key mechanistic factors. Overall, this study shows that mechanism-based feature design is a promising and data-efficient strategy to enhance model performance, particularly in low-data regimes, and can serve as a complementary approach to quantum mechanical and physicochemical descriptors.
Supplementary materials
Title
Supplementary information
Description
The contains the details about the dataset and the classification models
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)