Abstract
Polymers underpin critical technologies from medicine to energy, but their immense chemical and structural diversity makes rational design exceptionally difficult. Machine learning offers a way to navigate this space, yet prevailing approaches inherit small-molecule representations that fail to encode polymer-specific architecture; the distinction between random, block and other statistical copolymers is often collapsed into a categorical tag or ignored. Here, we introduce SCALE (Statistical Copolymer Architecture with Learning Edges), which recasts a copolymer as a Markovian sequence over a monomer alphabet and embeds the transition probabilities P(j/i) as edge features within a graph attention network. Message passing thus computes contextualized monomer states analogous to applying a transfer operator along the chain, while attention learns a data-driven kernel over paths that weights sequence heterogeneity versus block persistence. On a robotically synthesized, high-throughput fluorescence library, SCALE attained RMSE ≈228 and R² ≈0.84, surpassing polymer-adapted neural baselines and descriptor regressors (e.g., wDMPNN RMSE ≈326; XGBoost RMSE ≈254). The model is interpretable: edges dominate predictions for statistical (random) copolymers, whereas nodes prevail for block copolymers, consistent with NOESY 2D NMR. Beyond photophysics, SCALE generalized to antibacterial design across penta- and hexa-copolymer libraries with validation from <300 syntheses. By elevating sequence statistics to first-class learning variables, SCALE provides a generalizable, data-efficient route to closed-loop polymer discovery.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)