In Silico Isomerization Produces Apt Negative Data for VHTS Validation

15 January 2026, Version 2
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Early on in the emergence of virtual high-throughput screening (VHTS), it was recognized that for validation to be robust and reliable, decoys should match actives as closely as possible in as many aspects as possible. This has given rise to several generations of validation sets that address previously reported shortcomings of earlier collections. This is an iterative and expensive method of curating validation sets that leaves ample scope for discrepancies between actives and decoys to creep in. It has previously been conjectured that in silico isomerization offers an attractive alternative to generating decoys for drug-like compounds that naturally mitigates many of these discrepancies. Here, we explore this proposition and prove the conjecture. We show that isomerization can produce molecules that have hydrogen bond acceptor, donor, rotatable bonds counts, charge and surface area distributions that match more closely experimental actives than experimental decoys. While these are properties that receive a lot of attention in drug design, we also show that isomerization can effectively produce decoys that are positioned more closely to actives in property hyperspace than current experimental decoys which tend to be highly dissimilar from the actives. The latter is a significant shortcoming that has thus far remained unreported and unaddressed. Herein, we build upon the methods, tools, and work of others to facilitate the generation of new and better validation sets more cheaply and efficiently in the hope of moving the field of VHTS forward toward maturity. To that end, we make our code fully and freely available on GitHub (https://github.com/sivanovMU-Sofia/isomerization).

Keywords

cheminformatics
VHTS
virtual high-throughput screening
hit discovery
drug discovery
CADD
CAMD
cosine similarity
Tanimoto similarity
negative data
RDKit
computational chemistry
MAYGEN
Open Babel
DUD-E

Supplementary materials

Title
Description
Actions
Title
In Silico Isomerization Produces Apt Negative Data for VHTS Validation
Description
Molecular property distributions broken down by target
Actions

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.