Accurate Property Prediction of Fluorescent Dyes with KPGT-Fluor

24 July 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Fluorescent dyes play a crucial role in various fields, including biology, chemistry, and material science, due to their unique optical properties. Accurately predicting these properties is essential for the rational design of high-performance dyes. Building on foundational models in molecular property prediction, we introduce KPGT-Fluor, a novel adaptation of the Knowledge-guided Pre-training of Graph Transformer (KPGT) framework, specifically tailored for fluorescent dye property prediction. While the original KPGT framework was developed for general molecular property estimation, KPGT-Fluor expands its capabilities by integrating solvent representations, allowing the model to better capture environmental effects that influence dye behavior. KPGT-Fluor demonstrates strong predictive performance, with Root Mean Square Errors (RMSE) of 18.91 nm and 18.56 nm for absorption and emission wavelengths, respectively. For the logarithm of the extinction coefficient and quantum yield, the RMSE values are 0.159 and 0.126, demonstrating high accuracy. Additionally, the model exhibits robust generalization across multiple downstream dye datasets. By comparing four representative fluorescent molecules, it was found that KPGT-Fluor has an absolute error of 7 to 10 nm when predicting emission wavelengths, with a single prediction time not exceeding 1 second. It significantly outperforms the Fluor-predictor and TD-DFT in terms of accuracy, stability, and efficiency. These capabilities make KPGT-Fluor a powerful and versatile tool for the data-driven design of next-generation fluorescent dyes.

Keywords

Deep learning
Fluorescent dyes
Photophysical property prediction

Supplementary materials

Title
Description
Actions
Title
Accurate Property Prediction of Fluorescent Dyes with KPGT-Fluor
Description
Baseline machine learning models and Supplementary Tables
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.