Abstract
Quantitative Structure-Activity Relationship (QSAR) modeling is a pillar of computational drug discovery. However, standard machine learning (ML) models are often confounded by the high-dimensional and intensely correlated nature of molecular descriptors. A model may identify a "bulk" property (e.g., molecular weight) as highly predictive, when in fact it is merely a proxy for a true, specific pharmacophore (e.g., a hydrogen bond donor). This correlational insight can misdirect costly synthesis efforts. We propose a statistical framework to move from correlational QSAR to causal QSAR. Our approach uses Double/Debiased Machine Learning (DML) to estimate the unconfounded causal effect of each molecular descriptor on biological activity, treating all other p-1 descriptors as potential confounders. We then apply the Benjamini-Hochberg procedure to these p estimates to perform high-dimensional hypothesis testing and control the False Discovery Rate (FDR). We validate this framework using a simulation study that explicitly models the high-correlation and confounding structures endemic to chemoinformatics. We show that baseline models (Lasso, Random Forest) are easily misled, consistently ranking non-causal but confounded "bulk" descriptors as highly important. In contrast, our DML + FDR framework successfully "sees through" the confounding, correctly identifies the true causal descriptors, and rejects the spurious ones, while maintaining the target FDR. This causal inference framework provides a robust method for "deconfounding" the molecular descriptor space. By identifying features with a statistically significant causal link to activity, it can provide medicinal chemists with more reliable, interpretable, and actionable hypotheses for rational drug design.



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)