Abstract
We present an end-to-end learning-based method for predicting possible human metabolites of small molecules including drugs. The metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation. We perform transfer leaning on a Seq2Seq Transformer model originally trained on chemical reaction data to predict the outcome of human metabolic reactions. We further build an ensemble model to account for multiple and diverse metabolites.
Extensive evaluation reveals that the proposed method generalizes well to different enzyme families, as it can correctly predict metabolites for phase I and phase II drug metabolism reactions as well as for other enzymes.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)