Abstract
We evaluate the effectiveness of pre-trained and fine-tuned large language models (LLMs) for predicting the synthesizability of inorganic compounds and the selection of precursors needed to perform inorganic synthesis. The predictions of fine-tuned LLMs are comparable to—and sometimes better than—recent bespoke machine learning models for these tasks, but require only minimal user expertise, cost, and time to develop. Therefore, this strategy can serve both as an effective and strong baseline for future machine learning studies of various chemical applications and as a practical tool for experimental chemists.
Supplementary materials
Title
Supporting Information
Description
Description of data preparation. Plots of the distribution of number of unique reactions and number of precursors. Description of model construction and training. LLM prompts. Description for evaluation metrics. Tables of the model performance for the synthesizability task. Description of methods and results for re-evaluating top-5 predictions using GPT-4 and code for associated statistical tests. Description of PU learning prompt modification experiments and table of results. Histogram of top-10 precursors occurrences. (PDF)
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)