Molecular LEGION: Latent Enumeration, Generation, Integration, Optimization and Navigation. A case study of incalculably large chemical space coverage around the NLRP3 target

07 August 2025, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

The exploration and mapping of chemical space – the vast, multidimensional set of all small organic molecules – remains a central challenge in modern drug discovery. Traditional compound libraries and databases cover only a minute fraction of this space, limiting the discovery of novel, bioactive, and patentable chemotypes. Here, we present LEGION, an advanced workflow that integrates generative AI, AI-guided screening, and state-of-the-art cheminformatics tools within the Chemistry42 platform to enable comprehensive, large-scale exploration of chemical space around specific drug targets. Using NLRP3, a clinically relevant but structurally complex inflammasome protein, as a case study, LEGION combined ligand- and structure-based design strategies, in-house algorithms for 3D pharmacophore-aware scaffold extraction, and distinct library enumeration methods to identify over 34,000 unique scaffolds and generate approximately 110 million molecular structures. Iterative stages of 2D and 3D filtering, including machine learning and structure-based pharmacophore scoring, ensured the binding relevance and structural feasibility of the enumerated compounds. The resulting workflow proved effective for scaffold hopping, navigating unexplored regions of chemical space, and supporting intellectual property applications through the generation of structurally diverse and synthetically tractable structures. This work demonstrates, for the first time, an AI-enabled framework capable of both massive virtual screening and de novo compound generation, specifically tailored for extensive chemical space coverage around a biological target. LEGION establishes a new paradigm for the intelligent, scalable, and practical exploration of chemical space in drug discovery.

Keywords

AIDD
CADD
AI Screening
Generative Chemistry
Drug Discovery

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.