Predicting Sequence Dependent Fluorescence with Classic Machine Learning Models

Micheal Reed; Reza Zadegan

doi:10.26434/chemrxiv-2025-hs62c

Physical Chemistry

Search within Physical Chemistry

Predicting Sequence Dependent Fluorescence with Classic Machine Learning Models

23 December 2025, Version 1

Working Paper

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Terminally labeled DNA oligonucleotides have wide applications in modern biology and biotechnological applications. It has been observed that the fluorescent intensity of light released from these fluorescent labels is heavily influenced by the terminal sequence of nucleotides. Recent studies have assayed and published the raw fluorescent values of Cy3 and Cy5 as a function of the most adjacent 5 nucleotides resulting in 1024 data points. While experimentally tractable, an increase in the sequence space will vastly increase the experimental and time cost. Machine Learning is well suited to addressing the issue of experimental tractability however there is a wide design space in the choice of algorithms. In this work we use classic machine learning models such as Support Vector Machine, Multilayer Perceptrons and Random Forests to both predict the raw intensity value and classify the intensity magnitude of the fluorophore using the sequence as input. We demonstrate that the performance of these models is heavily dependent on the numerical transformation of the sequence and that Random Forest consistently outperforms all other models in both regression and classification tasks irrespective of the sequence transformation.

Keywords

Fluorescence

Biotechnology

Machine Learning

Supplementary materials

Title

Description

Actions

Title

Supplementary Figures and Data

Description

Contains all figures, tables and plots that were not included in the main body of the text. Such materials included neural net architectures, confusion matrices, validation error data, etc

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Dec 23, 2025 Version 1

Metrics

310

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-hs62c

Funding

NIH

1R16GM145671

NSF

MCB 2027738

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Predicting Sequence Dependent Fluorescence with Classic Machine Learning Models

Authors

Abstract

Keywords

Supplementary materials

Comments

Version History

Metrics

License

DOI

Funding

Author’s competing interest statement

Ethics

Share