Hostname: page-component-6766d58669-mzsfj Total loading time: 0 Render date: 2026-05-16T17:42:23.797Z Has data issue: false hasContentIssue false

A Novel Approach for Pathway Analysis of GWAS Data Highlights Role of BMP Signaling and Muscle Cell Differentiation in Colorectal Cancer Susceptibility

Published online by Cambridge University Press:  20 January 2017

Aniket Mishra*
Affiliation:
Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and the Colorectal Cancer Family Registry (CCFR), and Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
Stuart MacGregor
Affiliation:
Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and the Colorectal Cancer Family Registry (CCFR), and Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
*
address for correspondence: Aniket Mishra, Statistical Genetics, QIMR Berghofer Medical Research Institute, 300 Herston Rd, Brisbane QLD 4006, Australia. E-mail: aniket.mishra@qimrberghofer.edu.au

Abstract

Genome-wide association studies (GWAS) have revolutionized the field of gene mapping. As the GWAS field matures, it is becoming clear that for many complex traits, a proportion of the missing heritability is attributable to common variants of individually small effect. Detecting these small effects individually can be difficult, and statistical power would be increased if relevant variants could be grouped together for testing. Here, we propose a VEGAS2Pathway approach that aggregates association strength of individual markers into pre-specified biological pathways. It accounts for gene size and linkage disequilibrium between markers using simulations from the multivariate normal distribution. Pathway size is taken into account via a resampling approach. Importantly, since the approach only requires summary data, the method can easily be applied in all GWASs, including meta-analysis, singleton-based, family-based, and DNA-pooling-based designs. This approach is implemented in a user-friendly web page https://vegas2.qimrberghofer.edu.au and a command line tool. The web implementation uses gene-sets from the gene ontology (GO), curated gene-sets from MSigDB (containing canonical pathways and gene-sets from BIOCARTA, REACTOME, KEGG databases), PANTHER, and pathway commons databases, enabling analysis of a wide range of complex traits. We applied this method on a colorectal cancer GWAS meta-analysis data set (10,934 cases, 12,328 controls) from the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). We report statistically significant enrichment of association signal for the ‘BMP signaling’ and ‘muscle cell differentiation’ pathways, suggesting a possible role for these pathways onto the risk of colorectal cancer.

Information

Type
Articles
Copyright
Copyright © The Author(s) 2017 
Figure 0

FIGURE 1 Schematic representation of VEGAS2Pathway strategy: After reading in the two-column (SNP id and GWAS p value) text file, the gene-based test statistics are calculated using the VEGAS approach. These genes are then assigned to pathways. If a pathway contains a gene within another gene then the smaller gene is dropped. In another scenario, if a pathway contains overlapping or genes less than 500 kb away then only one gene (chosen randomly) is used to represent the association signal of the region while all nearby genes are dropped. In this way, VEGAS2Pathway ensures that all gene-based test statistics assigned to a pathway are at least 500 kb away from each other. The gene-based test statistics are then summed to compute a statistic for each pathway. An empirical p value for each pathway is computed by comparing the summed set score against that of replicates of the same size obtained through resampling of gene-based test statistics.

Figure 1

FIGURE 2 Simulation results. (A) The distribution of absolute log10 p values of 1,000 replicates. The line represent 5% interval of top -log10 p values. (B) The distribution of Pearson's correlation between gene-based p value and number of SNPs in the gene. (C) The distribution of Pearson's correlation between pathway-based p value and number of SNPs in the pathway. (D) Pathway type 1 error rate versus pathway size.

Figure 2

TABLE 1 Top Five Pathways from Pathway Analysis of Colorectal Cancer GWAS Summary Data

Figure 3

TABLE 2 Top Five Pathways from VEGAS2Pathway Analysis of Colorectal Cancer GWAS Summary Data and their Test Statistics Using the INRICH and MAGENTA Approaches

Supplementary material: File

Mishra and MacGregor supplementary material S1

Supplementary Table

Download Mishra and MacGregor supplementary material S1(File)
File 49.4 KB
Supplementary material: File

Mishra and MacGregor supplementary material S2

Supplementary Table

Download Mishra and MacGregor supplementary material S2(File)
File 73.8 KB
Supplementary material: File

Mishra and MacGregor supplementary material S3

Supplementary Table

Download Mishra and MacGregor supplementary material S3(File)
File 56.8 KB
Supplementary material: File

Mishra and MacGregor supplementary material S4

Mishra and MacGregor supplementary material

Download Mishra and MacGregor supplementary material S4(File)
File 55.5 KB