Machine Learning in Chemistry: A Data Centred, Hands-on Introductory Machine Learning Course for Undergraduate Students

17 October 2025, Version 2
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Machine learning (ML) is rapidly reshaping the chemical sciences, with applications spanning molecular property prediction, chemical reaction design, molecular structure generation, and other data-driven discovery. With the growing integration of ML into chemical research, undergraduate chemistry students increasingly need training that bridges traditional chemical education with ML methods. Here we present Machine Learning in Chemistry (MLChem), an undergraduate-level course designed with a chemistry-first perspective to lower barriers to entry into ML while maintaining disciplinary relevance. This course introduces fundamental ML algorithms using chemical datasets, such as the small molecule solubility dataset, and the peptide activity dataset. It progresses from traditional ML algorithms to neural networks. Each chapter is accompanied by tutorial notebooks and homework assignments focused on chemistry-relevant tasks. These course materials are open-source and available at https://xuhuihuang.github.io/mlchem. These fundamental chapters are also complemented by advanced modules on emerging topics such as reinforcement learning for retrosynthesis, ML-based force fields, deep learning for the predictions of protein structure and dynamics. By combining chemical context with hands-on coding and exposure to frontier applications, MLChem equips undergraduate chemistry students with both conceptual foundations and practical skills, preparing them to participate in ML-driven chemical research.

Keywords

machine learning in chemistry
tutorials

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.