Hostname: page-component-6565fbc58-lkrg5 Total loading time: 0 Render date: 2026-03-12T05:57:28.388Z Has data issue: false hasContentIssue false

Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction

Published online by Cambridge University Press:  12 September 2025

Guna Gouru*
Affiliation:
Department of Health Sciences, Northeastern University – Boston Campus, Boston, MA, USA
*
Corresponding author: Guna Gouru; Email: gouru.g@northeastern.edu
Rights & Permissions [Opens in a new window]

Abstract

Cancer heterogeneity presents a major obstacle to effective drug treatment, emphasizing the need for personalized approaches that can accurately predict drug responses. Advances in high-throughput technologies have driven precision medicine initiatives toward integrating multi-omics data, enabling a more comprehensive understanding of tumor biology. However, integration of diverse omics layers poses challenges for computational modeling, as many traditional machine learning (ML) and statistical methods are not designed to capture complex, high-dimensional and multimodal data. This review examines the studies that integrate multi-omics datasets, aiming to enhance drug response prediction (DRP). Specifically, it outlines the most used omics types and computational approaches – classical ML models, as well as advanced deep learning and multimodal integration frameworks for improving DRP, detailing key methodologies and evaluation metrics, such as area under the dose–response curve, F1 score and mean square error, which assess model performance. By summarizing the integrated omics data, computational methods and challenges encountered, this review provides an in-depth overview of the existing landscape of precision medicine and future directions for advancing drug-response prediction.

Topics structure

Information

Type
Review
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Impact statement

Drug sensitivity assessment is critical for optimizing personalized cancer treatment and advancing precision oncology. This review aims to provide a comprehensive and in-depth examination of computational methods used for multi-omics data integration to predict drug response in cancer patients, highlighting the challenges in the integration of multi-omics. It further investigates how combining genomic, transcriptomic, epigenomic, proteomic and metabolomic data can improve the understanding of tumor heterogeneity and enhance predictive accuracy. This integrated approach may significantly contribute to the development of more effective, tailored therapies in cancer treatment by advancing precision medicine.

Introduction and background

Cancer is a highly complex and widespread disease, continuing to be a major global cause of mortality and a serious challenge to human health (Li et al., Reference Li, Guo, Gao and Wang2023). A key challenge in cancer treatment is cancer heterogeneity, the genetic and molecular variations among tumors, even within the same cancer type (Chen et al., Reference Chen, Cui, Lin and Chiang2024). This diversity complicates drug selection, as patients may respond differently to the same treatment (Chen et al., Reference Chen, Cui, Lin and Chiang2024).

Assessing drug sensitivity is essential for evaluating how well a drug works in individual patients, guiding treatment decisions. Therefore, developing predictive models for drug sensitivity plays a crucial role in advancing personalized medicine. Cancer is predominantly driven by genetic alterations, where variations in gene expression profiles and somatic mutations play a critical role in modulating therapeutic responses. Elucidating these molecular changes offers the potential to optimize treatment strategies and enhance efficacy at the individual patient level (Wang et al., Reference Wang, Zhang, Zhao, Li, Xiao and Zhang2023).

Precision Oncology, a branch of Precision Medicine, aims to deliver the most effective cancer treatment by tailoring the right therapy to the right patient, at the optimal dose and timing (Schwartzberg et al., Reference Schwartzberg, Kim, Liu and Schrag2017; Michele Araújo Pereira et al., Reference Michele Araújo Pereira, Couto, Penna, Alvim, Nani, Freire, Araúj, Masood and Malik2020). Various processes can disrupt genetic machinery at the DNA, RNA or protein levels, resulting in changes to the expression of the protein encoded by the gene (Schwartzberg et al., Reference Schwartzberg, Kim, Liu and Schrag2017). To address this complexity, Precision Oncology uses high-throughput “-omics” technologies to capture cellular, molecular and tissue-level variations, enabling more accurate predictions (Zhao et al., Reference Zhao, Jin, Chen, Dong, Liu, Guo, Song, Li, Chen, Han, Liang and Gu2023). Achieving the objectives of precision oncology requires more comprehensive profiling of tumors at multiple biological layers. “Omics” encompasses a wide range of biological data types, including genomic information, such as copy number variations (CNVs), mutations and single nucleotide polymorphisms (SNPs); epigenomic data, such as DNA methylation; transcriptomic data, such as messenger RNA (mRNA) sequencing; proteomic data, such as proteins; and lipid-related data (lipidomics) and metabolic compounds (metabolomics) (Llinas-Bertran et al., Reference Llinas-Bertran, Butjosa-Espin, Barberi and Seoane2025). However, relying on a single layer of omics data is insufficient to establish precise connections between molecular changes and their phenotypic effects. Thus, the integration of multiple omics is essential to achieve a more complete understanding of cancer heterogeneity (Llinas-Bertran et al., Reference Llinas-Bertran, Butjosa-Espin, Barberi and Seoane2025).

This literature review summarizes the type of omics used and available data integration (DI) methods utilized for improving drug response prediction (DRP), aiming to address the following questions:

  • What type of omics are used in selected studies for DRP?

  • What methods and categories are utilized to improve DRP?

  • What are some key challenges and future directions in improving DRP?

Methodology

This review aimed at a single objective: to evaluate computational methods used for multi-omics DI in DRP. The methodology was structured in two phases: (i) establishing a systematic search with a clearly defined initial query and (ii) refining and expanding the search to capture additional relevant studies. The original PubMed query was “multi-omics” OR “multiomics” OR “omics integration” AND “drug response” OR “drug sensitivity” OR “drug resistance” OR “drug efficacy” AND cancer OR neoplasm OR malignancy. This query was designed to identify studies published between 2021 and 2025 that combined at least two omics data types, or one omics type with other relevant modalities (e.g., clinical, imaging and chemical data), specifically in the context of predicting drug response in cancer.

Studies were excluded if they:

  • Did not include at least two distinct omics data types, or one omics data type combined with other relevant data (e.g., clinical, imaging or chemical data)

  • Did not apply an explicit computational or statistical method for integrating omics or other relevant data types.

The initial PubMed search returned 685 studies, of which 342 remained after applying the exclusion criteria. In the second phase, the focus was narrowed to studies describing or evaluating computational approaches for DRP. To ensure broader coverage of emerging methodologies, the search was refined using additional targeted queries

  1. 1. “multi-omics” AND “data integration” AND “cancer” AND “drug response”

  2. 2. “multi-omics” AND “machine learning” AND “drug sensitivity”

  3. 3. “multi-omics” AND “deep learning” AND “drug response”

These supplementary searches identified 27 additional studies, of which 9 met the inclusion criteria by explicitly employing a computational method for DRP.

The final selected studies meeting the inclusion criteria were assessed based on:

  • The study objective

  • The omics data types that are integrated (Genomics, Epigenomics, Transcriptomics, Proteomics, Metabolomics and Lipidomics).

Figure 1 provides an overview of the methodological framework employed in this literature review. Additionally, several relevant articles (Hasin et al., Reference Hasin, Seldin and Lusis2017; Zitnik et al., Reference Zitnik, Nguyen, Wang, Leskovec, Goldenberg and Hoffman2019; Chen and Zhang, Reference Chen and Zhang2022; Chakraborty et al., Reference Chakraborty, Sharma, Karmakar and Banerjee2024) were identified, although they did not align directly with the study’s primary objective.

Figure 1. Framework illustrating the methodological approach of the review.

Results

DRP aims to predict how a drug affects the biological system, such as patient cells or model cell lines. Individual responses can differ significantly between individuals due to multiple factors, often due to genetic and molecular variations (Xiao et al., Reference Xiao, Zheng, Popa, Mi, Hu, Zou and Zou2025). To gain a more precise understanding of these effects, researchers analyze multi-omics datasets from both patient samples and cell lines. In this context, “drug response” refers to the range of biological or clinical outcomes observed following drug administration, which may include changes in cell viability, proliferation or clinical efficacy, depending on the model system and study design. This definition highlights that drug response is a multifaceted, quantitative phenotype, shaped by numerous genetic, molecular and environmental factors, and can differ substantially when measured in cell lines versus patient samples (Zhang and Nebert, Reference Zhang and Nebert2017). A key aspect of advancing personalized medicine is the ability to predict drug effectiveness for a group of patients who share similar molecular profiles (Hernandez-Lemus and Ochoa, Reference Hernandez-Lemus and Ochoa2024). This approach aims to tailor treatments more accurately to individual patients’ requirements, potentially enhancing outcomes and reducing adverse effects (Athieniti and Spyrou, Reference Athieniti and Spyrou2023).

Predominant omics used for DRP

This analysis was performed using R software, which centers on determining the most utilized omic layers in selected studies. As shown in Figure 2a, transcriptomics emerged as the most frequently applied omic type, followed by genomics, epigenomics and proteomics. Next, the bilateral pairings of the omic types found in the selected studies are examined. Figure 2b illustrates prominent combinations of multiple omics types utilized within individual studies. The network diagram illustrates the most frequently combined omic layers. Each node represents an individual omic type, with the size of the node corresponding to how often that omic layer was utilized across the selected studies. The line between nodes indicates that the two omic types were used together in at least one study, and the number on each line represents the frequency of that specific combination.

Figure 2. (a) Representation of frequently used omics in selected studies. (b) Co-occurrence of omic pairs in selected studies.

Note: E: epigenomics; G: genomics; P: proteomics; T: transcriptomics.

Computational methods used for DRP

Computational methods used for DRP can be broadly categorized as under either classification or regression tasks, depending on the nature of the predicted outcome. Classification models assign samples to discrete categories, such as “sensitive” or “resistant,” based on thresholds applied to measured drug response values. Support vector machine (SVM) and neighborhood component analysis (NCA) fall under this category. In contrast, regression models predict continuous quantitative measures, such as IC50 (the concentration of drug needed to inhibit cell viability by 50%), AUC (area under the dose–response curve) or cell viability percentages. Multi-omics integrated collective variational autoencoders (MOICVAE), latent alignment and attention mechanism, DeepInsight-3D, novel drug sensitivity prediction (NDSP), multimodal contrastive learning for cancer drug responses, multimodal and omics machine learning integration (MOMLIN) and multistage multimodal drug representations (ModDRDSP) employ regression as their primary framework, directly modeling the continuous landscape of pharmacological response (Partin et al., Reference Partin, Brettin, Zhu, Narykov, Clyde, Overbeek and Stevens2023).

A key step in applying these computational methods is multi-omics DI, which can be performed using early, late or intermediate strategies. Early integration combines multiple omics datasets into a single table or graph format, which is then processed by a machine learning (ML) model. Late integration analyzes each omics layer independently and subsequently merges the individual predictions through an additional model. Intermediate integration allows the model to learn shared representations from multiple datasets simultaneously (Athieniti and Spyrou, Reference Athieniti and Spyrou2023).

Figure 3 displays a Sankey diagram illustrating the distribution of omics types across the reviewed studies, with the dots of each flow corresponding to the number of omics employed per study. Figure 4 illustrates the overall workflow for predicting drug response outcome. The effectiveness of these methods is evaluated based on their ability to accurately predict drug responses, with evaluation metrics for assessment (Jiang et al., Reference Jiang, Ye, Tan and Bao2025). Table 1 presents a summary of the selected studies, detailing the types of omics data integrated, the computational frameworks and algorithms employed, the datasets utilized and the evaluation metrics applied. To further assist researchers in selecting appropriate datasets, a comparative summary of commonly used benchmark datasets is provided below Table 2. A well-performing model should not only provide high prediction accuracy but also reveal biologically interpretable insights, such as the molecular mechanisms driving drug resistance or sensitivity.

Figure 3. Sankey chart representing the omics and associated computational methods used in different studies. Note: DL: deep learning; FS: feature selection; MMIA: multimodal integration approach; SL: supervised learning.

Figure 4. Multi-omics data integration workflow for drug response prediction.

Table 1. Computational methods used for multi-omics data integration identified studies published in 2021–2025, including their method category, application objectives, datasets employed, evaluation metric and whether a specific cancer type was investigated

Note: AUC/AUROC, area under curve/area under receiver operating characteristic curve; AUPR, area under precision recall curve; BC, breast cancer; CCLE, Cancer Cell Line Encyclopedia; CNA, copy number aberration; CNN, convolutional neural network; CNV, copy number variation; CVAE, conditional variational autoencoder; DMCN, deep message-crossing network; DMSZ, Leibniz Institute; DNA, deoxyribose nucleic acid; GC, gastric cancer; GCN, graph convolutional network; GDSC, Genomics of Drug Sensitivity in Cancer; GEO, Gene Expression Omnibus; KNN, K-nearest neighbor; MAE; mean absolute error; miRNA, micro ribonucleic acid; mRNA, ribonucleic acid; MSE, mean square error; RMSE, root mean square error; SMILES, simplified molecular input line entry system; SNP, single nucleotide polymorphism; TCGA, The Cancer Genome Atlas.

Table 2. Overview of benchmark datasets for drug response prediction

Supervised learning

SVMs classify patients based on the response to the treatment. In these, patients were classified into different groups, and an SVM model was trained using data from 153 patients for the feature extraction from gene expression and immunohistochemistry data, and classified new patients into response or resistance groups. A hyperplane was created to separate patients into two categories: responders (those who benefited from the treatment) and nonresponders (resistant) (Che et al., Reference Chen, Cui, Lin and Chiang2024).

NCA is a deep learning-based DRP model using multi-omics data to analyze breast cancer cell lines. After filtering poorly performing drugs, a dataset of 42 cell lines and 100 drug molecules was used. The model utilizes NCA for feature selection and a neural network regressor with Levenberg–Marquardt backpropagation for training, optimized with Bayesian optimization and fivefold cross-validation. Additionally, K-means clustering was applied to categorize drugs in Olaparib and nonresponders for the drugs such as Dabrafenib and Olaparib, and identifying outliers that negatively impacted model accuracy (Malik et al., Reference Malik, Kalakoti and Sundar2021; Ruiz-Ramos et al., Reference Ruiz-Ramos, Plaza-Diaz, Puig-Campmany, Sampol-Mayol, Blazquez-Andion, Serrano-Garcia-Calvo, Sanz-Lopez, Acebes-Roldan and Juanes-Borrego2025).

Deep learning

MOICVAE is designed to predict drug sensitivity in cancer patients by integrating multi-omics data. The datasets are processed into similar matrices to establish relationships between samples and omics features. The framework incorporates a multimodal deep autoencoder to fuse omics data, generating a fusion vector that encapsulates essential multi-omics patterns. This fusion vector is then passed into a conditional variational autoencoder (CVAE), which encodes and reconstructs data representations to classify samples as sensitive (S) or resistant (R) to a given drug. The trained MOICVAE model is applied to The Cancer Genome Atlas samples, where it utilizes mRNA + CNV data and combines with K-nearest neighbors to predict drug response. Samples are categorized into S or R groups, indicating their likelihood of responding to treatment. Further differential analysis is conducted to assess the biological significance of these predictions. Kaplan–Meier survival plots compare overall survival between S and R groups, while tumor inflammatory scores highlight differences in tumor immune responses. Additionally, immune checkpoint markers such as HAVCR2, LAG3, TIGIT and PDCD1 are analyzed to explore variations in the immune microenvironment, providing insights into the role of immune factors in drug resistance and sensitivity (Wang et al., Reference Wang, Zhang, Zhao, Li, Xiao and Zhang2023).

Latent alignment and attention mechanism is another deep learning model, which utilizes multiple biological data types to improve predictive accuracy. The framework begins with data preprocessing, where raw multi-omics data undergoes steps, such as overlapping, filtering and imputation to manage missing values and ensure consistency across datasets. Following this, feature extraction is performed separately for each omics data type to capture key biological patterns effectively. The extracted features are then aligned in a latent space, preserving relationships between different data modalities while reducing dimensionality. To enhance predictive performance, an attention module is incorporated, allowing the model to focus on the most informative features that contribute to drug response. Finally, a predictive model, likely utilizing a deep neural network, processes the refined feature representations to estimate drug viability across different concentrations. The model outputs a dose–response curve, identifying whether a cancer sample is sensitive or resistant to a particular drug (Chen et al., Reference Chen, Cui, Lin and Chiang2024).

DeepInsight-3D is an advanced deep learning model designed to enhance the analysis of multi-omics data, particularly for predicting anticancer drug responses. It transforms multilayered omics data into three-dimensional (3D) image formats, enabling convolutional neural networks (CNNs) to effectively process and extract features. Unlike its predecessor, which handled single-layer data, DeepInsight-3D can accommodate multiple omics layers, improving the complexity and depth of models. The method includes two image construction strategies, either prioritizing the most informative layer or combining all layers equally. CNNs are then applied to identify patterns and classify the data, with an element decoder used to provide biologically relevant insights. This model excels in handling small sample sizes and offers interpretability through class-activation maps, which pinpoint crucial genes or pathways, making it a powerful tool for understanding drug responses in cancer treatment (Sharma et al., Reference Sharma, Lysenko, Boroevich and Tsunoda2023).

Feature selection

NDSP is one model that integrates multi-omics data to enhance DRP by utilizing deep learning and similarity network fusion approaches. The method begins by extracting drug targets using an improved sparse principal component analysis (SPCA) for different omics data types, such as RNA sequencing, copy number aberration and methylation. These extracted features are then used to construct sample similarity networks, which are subsequently merged to create a comprehensive representation of the data. The merged similarity networks are input into a deep neural network for training, reducing dimensionality and mitigating overfitting issues, improving interpretability and accuracy in predicting drug sensitivity (Liu and Mei, Reference Liu and Mei2023).

Multimodal integration approach

Multimodal DI refers to the process of combining diverse data types from different sources to generate a comprehensive understanding of a subject (Hernandez-Lemus and Ochoa, Reference Hernandez-Lemus and Ochoa2024). In this, initially, data are collected from different sources, such as multi-omics data, medical imaging (histopathology, morphological and radiology), drug chemical structure and electronic health records. Then, data preprocessing is done before the integration. Advanced computational models, such as ML, artificial intelligence (AI), integrate these diverse datasets. This integration allows for advanced data analysis to achieve key objectives, ultimately aiding in personalized medicine and improved healthcare outcomes (Llinas-Bertran et al., Reference Llinas-Bertran, Butjosa-Espin, Barberi and Seoane2025).

MMCL-CDR is designed to enhance DRP by integrating multiple data modalities, including multi-omics, morphological images and the chemical structure of the drug. In this model, for multi-omics, an Encoder (a neural network) that transforms high-dimensional input into lower-dimensional features is used. For morphological images of cells, a CNN is specifically used to extract features of the morphological images. The extracted multi-omics and morphology images are passed through a projector for further transformation. An aggregation step combines both omics and image representations. For molecular drug data, a graph convolutional network processes these molecular graphs to learn meaningful drug representations then Max Pooling is applied to summarize the extracted features. All the omics representation and image representation are refined using contrastive learning, ensuring biologically relevant alignment. A fully connected neural network predicts whether a cancer cell line is resistant or sensitive to a particular drug based on the learned features (Li et al., Reference Li, Guo, Gao and Wang2023).

MOMLIN is a multimodal framework integrating clinical features, multi-omics and pathway activity to predict drug response and identify biomarker networks in cancer patients. It follows a three-stage process: first, weighted multiclass sparse canonical correlation analysis selects sparse latent components that capture key predictive features. Next, a logistic regression model is trained using these components by incorporating multi-omics data. Finally, biomarker networks are constructed through heatmap visualization and correlation analysis, revealing critical molecular signatures driving treatment response (Rashid and Selvarajoo, Reference Rashid and Selvarajoo2024).

ModDRDSP is another tool that integrates multi-omics data, drug molecular structure to predict drug sensitivity response in cancer cell lines, utilizing drug sensitivity, multi-omics and drug molecular structure data. Preprocessing involves KernelPCA for dimensionality reduction and ConvMolFeaturizer for molecular feature extraction. The framework employs deep learning models: a deep hierarchical bidirectional GRU network (DSBiGRU) processes simplified molecular input line entry system representations, a deep message-crossing network learns molecular graph embeddings and CellCNN extracts multi-omics features (Han et al., Reference Han, Fu, Wu, Zhao, Song, Huang, Zhang, Liu and Zhang2023). These features are fused into a multidimensional representation and analyzed using ML models to predict drug sensitivity (Song et al., Reference Song, Wei, Zhao, Zhai, Dai and Duan2025).

Discussion

Omics utilization in DRP

This review revealed distinct trends in omics usage and computational methodologies for DRP, with a predominant reliance on genomics, epigenomics and transcriptomics, while proteomics remains underrepresented despite its potential for providing functional and dynamic insights into tumor biology. This imbalance likely reflects differences in data availability and technological maturity; however, the sparing use of proteomic and metabolomic data limits the ability to capture post-translational modifications and metabolic shifts essential for accurate drug response modeling. Addressing this gap represents a key opportunity for future research.

Computational methodologies

In parallel with data considerations, computational strategies for DRP continue to evolve. By categorizing existing approaches into supervised learning, deep learning, feature selection with dimensionality reduction, and multimodal integration, this review highlights both their complementary strengths and shared limitations.

Traditional supervised learning methods such as SVMs and NCA have been widely applied. Yet, their effectiveness is constrained by the high dimensionality and small sample sizes typical of omics datasets, which predispose models to overfitting and poor generalizability. Moreover, their linear assumptions often fall short of capturing the complex nonlinear interactions that characterize cancer heterogeneity (Liu and Mei, Reference Liu and Mei2023).

Deep learning frameworks like MOICVAE, latent alignment with attention mechanisms and DeepInsight-3D address some of these challenges by enabling automatic feature extraction and nonlinear data fusion. These models have shown promise in capturing multimodal biological interactions, but their success depends heavily on access to large, high-quality datasets. Furthermore, their limited interpretability – the “black-box” problem – remains a barrier to clinical adoption, as clinicians require mechanistic insights alongside predictive accuracy (Liu and Mei, Reference Liu and Mei2023).

Feature selection and dimensionality reduction approaches, such as SPCA, used in models like NDSP, help mitigate issues of high dimensionality and noise. However, these approaches risk eliminating subtle but biologically meaningful signals, underscoring the ongoing challenge of balancing feature sparsity with biological relevance. Missing or incomplete data across omics layers further complicates integration, necessitating robust imputation and harmonization strategies to minimize bias and information loss.

Multimodal integration frameworks, such as MMCL-CDR, MOMLIN and ModDRDSP, represent an important step toward holistic modeling by combining diverse data types, including clinical variables, imaging features, chemical drug structures and multi-omics layers. While these approaches enhance the potential for comprehensive prediction, they face persistent hurdles in aligning heterogeneous data with different scales, distributions and levels of reliability. Their complex architecture also demands extensive parameter tuning and large sample sizes to ensure stability. Moreover, model interpretability and prospective clinical validation remain underdeveloped, slowing their translational impact.

Key challenges to clinical translation

Taken together, the models, integration techniques and methodological innovations reviewed here illustrate the transformative potential of informatics-driven approaches for optimizing therapeutic outcomes and advancing precision medicine (Shekhawat et al., Reference Shekhawat, Kashyap, Swain, Dhiman, Sharma, Kumar and Yadav2025). At the same time, several fundamental challenges continue to limit clinical translation.

Data heterogeneity

Despite considerable progress in multi-omics-based DRP, several critical challenges hinder the full translation of computational models into clinical practice. Foremost among these is data heterogeneity, arising from the integration of diverse omics layers generated by different platforms, protocols and laboratories. Variability in sample processing, measurement techniques and batch effects introduces noise and bias that complicate model training and reduce reproducibility. The complexity increases when combining multi-omics data with auxiliary modalities, such as clinical, imaging or drug molecular information, necessitating rigorous normalization and harmonization strategies that remain an ongoing challenge (Jiang et al., Reference Jiang, Ye, Tan and Bao2025).

Model interpretability

Another prominent limitation is the lack of model interpretability, particularly for advanced deep learning and multimodal integration frameworks. Although powerful in capturing complex nonlinearities and interactions, they often fail to provide transparent biological explanations for their predictions. This opacity poses a barrier to clinical adoption, as clinicians require not only accurate predictions but also mechanistic insights to make informed therapeutic decisions and to trust model recommendations. Developing explainable AI approaches and embedding biological prior knowledge are, therefore, critical future directions to enhance model transparency and trustworthiness (Ennab and McHeick, Reference Ennab and McHeick2024).

Generalizability across cohorts

Many established models exhibit poor generalizability across independent cohorts and diverse patient populations. Overfitting to small, homogeneous datasets undermines their robustness in real-world applications, where genetic backgrounds and environmental exposures vary widely. This limitation is exacerbated by the lack of standardized response metrics in preclinical studies and heterogeneity in experimental conditions, such as assay types, drug concentrations and cell line handling, which complicates comparisons across studies (Adam et al., Reference Adam, Rampasek, Safikhani, Smirnov, Haibe-Kains and Goldenberg2020). Addressing this limitation requires larger, multi-institutional datasets, rigorous cross-cohort validation and privacy-preserving strategies such as federated learning.

Future directions

The integration of multi-omics data continues to offer immense potential, with deep learning methods expected to play a central role in uncovering complex nonlinear patterns (Shekhawat et al., Reference Shekhawat, Kashyap, Swain, Dhiman, Sharma, Kumar and Yadav2025). A growing area of interest lies in single-cell multi-omics, which provides opportunities to dissect cellular heterogeneity and better understand molecular mechanisms in disease. Equally important are advances in visualization strategies, which can improve the accessibility and interpretation of results for a broader scientific audience. Establishing standardized repositories and collaborative platforms will be key to ensuring data availability, reproducibility and more effective cross-study comparisons (Zhang, Reference Zhang2024).

Another promising direction is the emphasis on model explainability. Tools such as Shapley Additive exPlanations (SHAP) values and attention mechanisms can help researchers interpret predictions, increasing trust and clinical relevance. Alongside this, harmonizing data across institutions and platforms remains a critical challenge for ensuring consistency and generalizability of findings. Emerging approaches like tabular-to-image conversion (e.g., DeepInsight) open new possibilities for applying CNN-based models in omics research, combining interpretability with predictive power (Sharma et al., Reference Sharma, Vans, Shigemizu, Boroevich and Tsunoda2019). Together, these advancements will help translate multi-omics research into more actionable insights for personalized medicine.

Conclusion

The integration of multi-omics data has significantly advanced DRP, offering deeper insights into tumor heterogeneity and enabling more precise therapeutic strategies. Current approaches demonstrate the value of genomics, epigenomics and transcriptomics in predictive modeling. Still, the limited use of proteomics and metabolomics restricts the ability to capture functional dynamics essential for understanding drug sensitivity and resistance. While computational methods from supervised learning to advanced deep learning and multimodal integration have shown considerable promise, they remain challenged by data heterogeneity, interpretability and limited generalizability across cohorts.

Addressing these challenges will be critical for translating computational advances into clinical practice. Future research should prioritize the development of explainable and generalizable models, the harmonization of multi-institutional datasets and the integration of underutilized omics layers. Combining methodological innovation with standardized pipelines, robust visualization tools and collaborative data-sharing infrastructures will accelerate the clinical applicability of multi-omics-driven DRP. These efforts will strengthen the foundations of precision oncology, enabling the delivery of more effective, personalized cancer treatments.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/pcm.2025.10003.

Acknowledgements

This review was undertaken as part of the capstone project for the MSc in Health Informatics program at Northeastern University. The author would like to express sincere gratitude to Professor Jay Spitulnik, Program Director, for his valuable guidance and support throughout the development and writing of this review. The author also wishes to thank Lauri Fennell, Health Sciences Librarian, for her assistance in navigating reference management tools, and to Amy Lewontin, Collection Development Librarian, for her invaluable help in identifying Northeastern University’s open access publishing agreements.

Author contribution

Guna Gouru: Conceptualization, methodology, formal analysis, writing – original draft preparation, review and editing and project administration.

Competing interests

The author declares none.

References

Adam, G, Rampasek, L, Safikhani, Z, Smirnov, P, Haibe-Kains, B and Goldenberg, A (2020) Machine learning approaches to drug response prediction: Challenges and recent progress. npj Precision Oncology 4, 19. https://doi.org/10.1038/s41698-020-0122-1.Google Scholar
Athieniti, E and Spyrou, GM (2023) A guide to multi-omics data collection and integration for translational medicine. Computational and Structural Biotechnology Journal 21, 134149. https://doi.org/10.1016/j.csbj.2022.11.050.Google Scholar
Chakraborty, S, Sharma, G, Karmakar, S and Banerjee, S (2024) Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochimica et Biophysica Acta (BBA) Molecular Basis of Disease 1870(5), 167120. https://doi.org/10.1016/j.bbadis.2024.167120.Google Scholar
Che, G, Yin, J, Wang, W, Luo, Y, Chen, Y, Yu, X, Wang, H, Liu, X, Chen, Z, Wang, X, Chen, Y, Wang, X, Tang, K, Tang, J, Shao, W, Wu, , Sheng, J, Li, Q and Liu, J (2024) Circumventing drug resistance in gastric cancer: A spatial multi-omics exploration of chemo and immuno-therapeutic response dynamics. Drug Resistance Updates 74, 101080. https://doi.org/10.1016/j.drup.2024.101080.Google Scholar
Chen, HO, Cui, YC, Lin, PC and Chiang, JH (2024) An innovative multi-omics model integrating latent alignment and attention mechanism for drug response prediction. Journal of Personalized Medicine 14(7), 694. https://doi.org/10.3390/jpm14070694.Google Scholar
Chen, Y and Zhang, L (2022) How much can deep learning improve prediction of the responses to drugs in cancer cell lines? Briefings in Bioinformatics 23(1), 18. https://doi.org/10.1093/bib/bbab378.Google Scholar
Ennab, M and McHeick, H (2024) Enhancing interpretability and accuracy of AI models in healthcare: A comprehensive review on challenges and future directions. Frontiers in Robotics and AI 11, 1444763. https://doi.org/10.3389/frobt.2024.1444763.Google Scholar
Han, S, Fu, H, Wu, Y, Zhao, G, Song, Z, Huang, F, Zhang, Z, Liu, S and Zhang, W (2023) HimGNN: A novel hierarchical molecular graph representation learning framework for property prediction. Briefings in Bioinformatics 24(5), 111. https://doi.org/10.1093/bib/bbad305.Google Scholar
Hasin, Y, Seldin, M and Lusis, A (2017) Multi-omics approaches to disease. Genome Biology 18, 83. https://doi.org/10.1186/s13059-017-1215-1.Google Scholar
Hernandez-Lemus, E and Ochoa, S (2024) Methods for multi-omic data integration in cancer research. Frontiers in Genetics 15, 1425456. https://doi.org/10.3389/fgene.2024.1425456.Google Scholar
Jiang, W, Ye, W, Tan, X and Bao, YJ (2025) Network-based multi-omics integrative analysis methods in drug discovery: A systematic review. Biodata Mining 18, 27. https://doi.org/10.1186/s13040-025-00442-z.Google Scholar
Li, Y, Guo, Z, Gao, X and Wang, G (2023) MMCL-CDR: Enhancing cancer drug response prediction with multi-omics and morphology images contrastive representation learning. Bioinformatics 39(12). https://doi.org/10.1093/bioinformatics/btad734.Google Scholar
Liu, XY and Mei, XY (2023) Prediction of drug sensitivity based on multi-omics data using deep learning and similarity network fusion approaches. Frontiers in Bioengineering and Biotechnology 11, 1156372. https://doi.org/10.3389/fbioe.2023.1156372.Google Scholar
Llinas-Bertran, A, Butjosa-Espin, M, Barberi, V and Seoane, JA (2025) Multimodal data integration in early-stage breast cancer. The Breast 80, 103892. https://doi.org/10.1016/j.breast.2025.103892.Google Scholar
Malik, V, Kalakoti, Y and Sundar, D (2021) Deep learning assisted multi-omics integration for survival and drug-response prediction in breast cancer. BMC Genomics 22(1), 214. https://doi.org/10.1186/s12864-021-07524-2.Google Scholar
Michele Araújo Pereira, MKL, Couto, PGP, Penna, MG, Alvim, LB, Nani, TF, Freire, MCM and Araúj, LH (2020) Cancer genomics in precision oncology: Applications, challenges, and prospects. In Masood, N and Malik, SS (eds), Essentials of Cancer Genomic, Computational Approaches and Precision Medicine. Singapore: Springer Nature Singapore Pte Ltd, pp. 453454.Google Scholar
Partin, A, Brettin, TS, Zhu, Y, Narykov, O, Clyde, A, Overbeek, J and Stevens, RL (2023) Deep learning methods for drug response prediction in cancer: Predominant and emerging trends. Frontiers in Medicine 10, 1086097. https://doi.org/10.3389/fmed.2023.1086097.Google Scholar
Rashid, MM and Selvarajoo, K (2024) Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): A case study on breast cancer clinical data. Briefings in Bioinformatics 25(4). https://doi.org/10.1093/bib/bbae300.Google Scholar
Ruiz-Ramos, J, Plaza-Diaz, A, Puig-Campmany, M, Sampol-Mayol, C, Blazquez-Andion, M, Serrano-Garcia-Calvo, A, Sanz-Lopez, N, Acebes-Roldan, X and Juanes-Borrego, A (2025) K-means clustering to identify high risk of early revisits in patients with drug-related problems attending the emergency department. European Journal of Hospital Pharmacy, 16. https://doi.org/10.1136/ejhpharm-2024-004414.Google Scholar
Schwartzberg, L, Kim, ES, Liu, D and Schrag, D (2017) Precision oncology: Who, how, what, when, and when not? American Society of Clinical Oncology educational book. American Society of Clinical Oncology. Annual Meeting 37, 160179. https://doi.org/10.1200/EDBK_174176.Google Scholar
Sharma, A, Lysenko, A, Boroevich, KA and Tsunoda, T (2023) DeepInsight-3D architecture for anti-cancer drug response prediction with deep-learning on multi-omics. Scientific Reports 13(1), 2483. https://doi.org/10.1038/s41598-023-29644-3.Google Scholar
Sharma, A, Vans, E, Shigemizu, D, Boroevich, KA and Tsunoda, T (2019) DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture. Scientific Reports 9(1), 11399. https://doi.org/10.1038/s41598-019-47765-6.Google Scholar
Shekhawat, RS, Kashyap, P, Swain, AK, Dhiman, V, Sharma, J, Kumar, A and Yadav, P (2025) Bioinformatics for multi-omics data integration. Encyclopedia of Bioinformatics and Computational Biology (Second Edition) 4, 615627. https://doi.org/10.1016/B978-0-323-95502-7.00159-7.Google Scholar
Song, J, Wei, M, Zhao, S, Zhai, H, Dai, Q and Duan, X (2025) Drug sensitivity prediction based on multi-stage multi-modal drug representation learning. Interdisciplinary Sciences: Computational LIfe Sciences 17, 231243. https://doi.org/10.1007/s12539-024-00668-1.Google Scholar
Wang, C, Zhang, M, Zhao, J, Li, B, Xiao, X and Zhang, Y (2023) The prediction of drug sensitivity by multi-omics fusion reveals the heterogeneity of drug response in pan-cancer. Computers in Biology and Medicine 163, 107220. https://doi.org/10.1016/j.compbiomed.2023.107220.Google Scholar
Xiao, M, Zheng, Q, Popa, P, Mi, X, Hu, J, Zou, F and Zou, B (2025) Drug molecular representations for drug response predictions: A comprehensive investigation via machine learning methods. Scientific Reports 15(1), 20. https://doi.org/10.1038/s41598-024-84711-7.Google Scholar
Zhang, G and Nebert, DW (2017) Personalized medicine: Genetic risk prediction of drug response. Pharmacology & Therapeutics 175, 7590. https://doi.org/10.1016/j.pharmthera.2017.02.036.Google Scholar
Zhang, J (2024) Emerging trends in multi-omics data integration: Challenges and future directions. Computational Molecular Biology 14(2), 6475. https://doi.org/10.5376/cmb.2024.14.0008.Google Scholar
Zhao, Z, Jin, T, Chen, B, Dong, Q, Liu, M, Guo, J, Song, X, Li, Y, Chen, T, Han, H, Liang, H and Gu, Y (2023) Multi-omics integration analysis unveils heterogeneity in breast cancer at the individual level. Cell Cycle 22(20), 22292244. https://doi.org/10.1080/15384101.2023.2281816.Google Scholar
Zitnik, M, Nguyen, F, Wang, B, Leskovec, J, Goldenberg, A and Hoffman, MM (2019) Machine learning for integrating data in biology and medicine: Principles. Practice, and Opportunities. Information Fusion 50, 7191. https://doi.org/10.1016/j.inffus.2018.09.012.Google Scholar
Figure 0

Figure 1. Framework illustrating the methodological approach of the review.

Figure 1

Figure 2. (a) Representation of frequently used omics in selected studies. (b) Co-occurrence of omic pairs in selected studies.Note: E: epigenomics; G: genomics; P: proteomics; T: transcriptomics.

Figure 2

Figure 3. Sankey chart representing the omics and associated computational methods used in different studies. Note: DL: deep learning; FS: feature selection; MMIA: multimodal integration approach; SL: supervised learning.

Figure 3

Figure 4. Multi-omics data integration workflow for drug response prediction.

Figure 4

Table 1. Computational methods used for multi-omics data integration identified studies published in 2021–2025, including their method category, application objectives, datasets employed, evaluation metric and whether a specific cancer type was investigated

Figure 5

Table 2. Overview of benchmark datasets for drug response prediction

Author comment: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR1

Comments

No accompanying comment.

Review: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR2

Conflict of interest statement

Nil

Comments

The manuscript by Guoru deals on an important field: Precision Oncology with ML approaches. The paper is also structured well on choosing the types of models for review. It is also generally well-written. Nevertheless, the author need to improve on the following:

1. The paper is inadequate. Unless it is mini-review, a review article should be very comprehensive in its assessment to the latest trends in the field. Here, it is rather limited. Since the word-count is significantly lower than an average review article, the author is encouraged to add more recent works (even from preprint servers) for review.

2. Figure 2 is too simplistic and not very informative. A better version or a table can be provided instead.

3. The description of the short-listed methods/models in Table 1 can be extended to describe slightly better technical details. In the current form, it is not informative.

4. A box is recommended to be included to describe in brief, some common ML terms like SVM, SPCA, GCN etc. with common mathematical terms.

Review: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

Precision Oncology: Computational Methods for Multi-Omics Data Integration to Improve Drug Response Prediction

Gouru, 2025

This review manuscript provides a useful overview of computational methods for integrating multi-omics data to improve drug response prediction (DRP) within the scope of precision oncology. It categorizes current approaches—such as supervised learning, deep learning, dimensionality reduction, and multimodal integration—and summarizes their application in recent studies. The topic is timely and relevant, and the paper has potential value for the field. However, several points should be addressed to improve its clarity, depth, and overall impact.

There is limited discussion on the use of standard datasets. Including a comparison of widely used benchmark datasets (e.g., GDSC, TCGA, CCLE) along with their limitations would enhance the review’s practical value for researchers in the field.

Writing Style: While the manuscript is generally well-written, certain phrases could be improved for a more professional tone. For instance, the search phrase “multi-omics AND data AND integration AND cancer OR neoplasm OR malignancies AND drug response” should be rephrased in a more scholarly manner.

First-Person Usage: The manuscript occasionally uses a first-person narrative (“I have examined...”). This should be revised to adopt a neutral academic tone (e.g., “This review examines...”) in accordance with standard review article conventions.

Challenges and Limitations: The review would benefit from a more detailed discussion of challenges in multi-omics DRP studies, such as: Data heterogeneity, Lack of model interpretability, Poor generalizability across cohorts

Future Directions: While the future directions section briefly touches on visualization and single-cell multi-omics, it could be strengthened by discussing: Model explainability (e.g., SHAP, attention mechanisms); Data harmonization across institutions and platforms; Tabular-to-image conversion using techniques like DeepInsight (PMID: 31388036), which is particularly relevant for CNN-based interpretability in omics

The review is comprehensive in scope but would benefit from more original synthesis. Consider including: A proposed taxonomy or framework for classifying DRP models; Identification of trends in integration strategies (e.g., early, late, hybrid integration); Discussion on bridging the gap between research models and real-world clinical applications

With the above revisions, this manuscript would make a meaningful contribution to the field of computational biology and precision oncology.

Review: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR4

Conflict of interest statement

Reviewer declares none.

Comments

Disclosure: These comments are submitted as a co-review, where I have taken the supervisory role.

In the review manuscript titled “Precision Oncology: Computational methods for multi-omics data integration to improve drug response prediction” the author presents several examples of data integration algorithms applied to cancer molecular datasets.

However, the review is incomplete and does not align with the scope suggested in the title and abstract. The concept “drug response” is not clearly defined, which is problematic since itcan refer to different outcomes, particularly when mixing studies from patients and cell lines. There are several comprehensive reviews about the topic that the author ignores, and other reviews where the author seems to pick whole sentences or paragraphs with minimal changes (“Studies have shown that cancer is a genetic disease, and individuals have multiple genetic characteristics, such as gene expression and genome mutations… which affect the effects of different drug treatments, “ from Wang 2023 and “Studies have shown that cancer is fundamentally a genetic disease, with variations in gene expression patterns and mutations significantly influencing drug responses.” from the manuscript.)

I have several concerns about the quality and the originality of the manuscript:

1. The methodology resembles tha of a comprehensive review, but the author does not clearly define the original query and then include some other queries to enrich the body of works.

2. In the abstract, the phrases “existing computational methods or “associated category” are undefined or vague, it is unclear which the authors is referring to.

3. Despite the abstract implying a focus on patient data, many methods reviewed are based on cell line data.

4. On page 3 line 20, the paragraphs lacks coherence. While the core ideas are understandable, the sentence ‘To accomplish these goals, precision medicine…’ appears structurally lifted from Llinas-Bertran 2024, which may explain the lack of logical flow with the previous two sentences.

5. Not all models described at 3.2 are classification (Sensitivity or Resistance), most use a regression (prediction of IC50, AUC…)

6. The metrics from Table 1 are incomplete.

7. The first method reviewed is not multi-omic, as the SVM model only uses immunohistochemistry data and dones not integrate with other omics.

8. The author seems to confuse generic machine learning techniques (e.g. SVM, NCA) with the specific names of the actual methods that she cites (e.g. MOMLIN).

9. The categorization of supervised learning/deep learning is at least not accurate.

10. The discussion section lacks original ideas, it appears to be a compilation of existing discussion points from other literature rather than the author’s own analysis.

11. Similarly in section 5 the author mostly relies on Zhang 2024 paper, with little added from author’s perspective.

Recommendation: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR5

Comments

Dear Authors,

Your paper while having a good core scaffold requires major revisions in both the breadth, and depth of methods and contemporary papers. Please examine the reviewers comments and resubmit with major revisions.

This review manuscript provides a useful overview of computational methods for integrating multi-omics data to improve drug response prediction (DRP) within the scope of precision oncology. It categorizes current approaches—such as supervised learning, deep learning, dimensionality reduction, and multimodal integration—and summarizes their application in recent studies. The topic is timely and relevant, and the paper has potential value for the field. However, several points should be addressed to improve its clarity, depth, and overall impact.

There is limited discussion on the use of standard datasets. Including a comparison of widely used benchmark datasets (e.g., GDSC, TCGA, CCLE) along with their limitations would enhance the review’s practical value for researchers in the field.

Writing Style: While the manuscript is generally well-written, certain phrases could be improved for a more professional tone. For instance, the search phrase “multi-omics AND data AND integration AND cancer OR neoplasm OR malignancies AND drug response” should be rephrased in a more scholarly manner.

First-Person Usage: The manuscript occasionally uses a first-person narrative (“I have examined...”). This should be revised to adopt a neutral academic tone (e.g., “This review examines...”) in accordance with standard review article conventions.

Challenges and Limitations: The review would benefit from a more detailed discussion of challenges in multi-omics DRP studies, such as: Data heterogeneity, Lack of model interpretability, Poor generalizability across cohorts

Future Directions: While the future directions section briefly touches on visualization and single-cell multi-omics, it could be strengthened by discussing: Model explainability (e.g., SHAP, attention mechanisms); Data harmonization across institutions and platforms; Tabular-to-image conversion using techniques like DeepInsight (PMID: 31388036), which is particularly relevant for CNN-based interpretability in omics

The review is comprehensive in scope but would benefit from more original synthesis. Consider including: A proposed taxonomy or framework for classifying DRP models; Identification of trends in integration strategies (e.g., early, late, hybrid integration); Discussion on bridging the gap between research models and real-world clinical applications

2. Figure 2 is too simplistic and not very informative. A better version or a table can be provided instead.

3. The description of the short-listed methods/models in Table 1 can be extended to describe slightly better technical details. In the current form, it is not informative.

4. A box is recommended to be included to describe in brief, some common ML terms like SVM, SPCA, GCN etc. with common mathematical terms.

In the review manuscript titled “Precision Oncology: Computational methods for multi-omics data integration to improve drug response prediction” the author presents several examples of data integration algorithms applied to cancer molecular datasets.

However, the review is incomplete and does not align with the scope suggested in the title and abstract. The concept “drug response” is not clearly defined, which is problematic since itcan refer to different outcomes, particularly when mixing studies from patients and cell lines. There are several comprehensive reviews about the topic that the author ignores, and other reviews where the author seems to pick whole sentences or paragraphs with minimal changes (“Studies have shown that cancer is a genetic disease, and individuals have multiple genetic characteristics, such as gene expression and genome mutations… which affect the effects of different drug treatments, “ from Wang 2023 and “Studies have shown that cancer is fundamentally a genetic disease, with variations in gene expression patterns and mutations significantly influencing drug responses.” from the manuscript.)

I have several concerns about the quality and the originality of the manuscript:

1. The methodology resembles tha of a comprehensive review, but the author does not clearly define the original query and then include some other queries to enrich the body of works.

2. In the abstract, the phrases “existing computational methods or “associated category” are undefined or vague, it is unclear which the authors is referring to.

3. Despite the abstract implying a focus on patient data, many methods reviewed are based on cell line data.

4. On page 3 line 20, the paragraphs lacks coherence. While the core ideas are understandable, the sentence ‘To accomplish these goals, precision medicine…’ appears structurally lifted from Llinas-Bertran 2024, which may explain the lack of logical flow with the previous two sentences.

5. Not all models described at 3.2 are classification (Sensitivity or Resistance), most use a regression (prediction of IC50, AUC…)

6. The metrics from Table 1 are incomplete.

7. The first method reviewed is not multi-omic, as the SVM model only uses immunohistochemistry data and dones not integrate with other omics.

8. The author seems to confuse generic machine learning techniques (e.g. SVM, NCA) with the specific names of the actual methods that she cites (e.g. MOMLIN).

9. The categorization of supervised learning/deep learning is at least not accurate.

10. The discussion section lacks original ideas, it appears to be a compilation of existing discussion points from other literature rather than the author’s own analysis.

11. Similarly in section 5 the author mostly relies on Zhang 2024 paper, with little added from author’s perspective.

Decision: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R0/PR6

Comments

No accompanying comment.

Author comment: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R1/PR7

Comments

No accompanying comment.

Review: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

i have no further comments

Review: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R1/PR9

Conflict of interest statement

Reviewer declares none.

Comments

The revised manuscript has improved.

Recommendation: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R1/PR10

Comments

Both reviews have returned with an acceptance recommendation.

Decision: Precision oncology: Computational methods for multi-omics data integration to improve drug response prediction — R1/PR11

Comments

No accompanying comment.