Abstract
Three-dimensional molecular generative models have emerged that produce de novo molecules both unconditionally and conditionally, e.g., within protein pockets. However, steering those models in a specific region of the chemical space that satisfies a set of desired properties remains challenging. In this study, we introduce a flexible reinforcement learning method for flow-matching based generative models, allowing the velocity field to be refined according to a user-defined reward function. In contrast to a pure conditional generation setup, where the set of conditions must be decided a priori, this framework allows fine-tuning of any unconditional or conditional model, reflecting a more realistic scenario where the target properties to be optimized often vary and are typically case-specific. This also enables joint optimization of continuous and discrete features in flow-matching models for the first time. Through extensive experiments across diverse optimization scenarios, we demonstrate that models trained with this strategy (agents) consistently outperform baseline approaches (priors) when evaluated against the target design criteria.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)