Mechanism-based features: an inductive bias for knowledge transfer in low-data yield prediction

12 January 2026, Version 2
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

This work investigates the transfer of chemical knowledge through mechanism-based descriptors and their impact on reaction yield prediction. By encoding mechanistic insight into domain-specific features, the proposed approach introduces a beneficial inductive bias. Using a Mizoroki-Heck C-glycosylation reaction as a case study, we first provide an overview of the reaction mechanism, then describe the dataset construction and featurization strategy, followed by classification model training and evaluation. While regression predicts continuous outcomes, such as reaction yields, a classification task can indicate whether a reaction is likely to proceed under standard conditions. Identifying feasible reactions is often more important at early stages than achieving high yields. By prioritizing feasibility, this approach provides a practical tool for rapidly screening reaction conditions. The results demonstrate that incorporating mechanism-guided features leads to a significant improvement in predictive performances, with gains of up to + 14% for balanced accuracy and +25% for F1-score in a low-data classification. Feature importance analysis further highlights the relevance of these descriptors, linking model predictions to key mechanistic factors. Overall, this study shows that mechanism-based feature design is a promising and data-efficient strategy to enhance model performance, particularly in low-data regimes, and can serve as a complementary approach to quantum mechanical and physicochemical descriptors.

Keywords

C-glycosylation
yield prediction
machine learning
Heck cross-coupling
inductive bias

Supplementary materials

Title
Description
Actions
Title
Supplementary information
Description
The contains the details about the dataset and the classification models
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.