Hostname: page-component-89b8bd64d-z2ts4 Total loading time: 0 Render date: 2026-05-07T11:00:42.591Z Has data issue: false hasContentIssue false

Gene network-based cancer prognosis analysis with sparse boosting

Published online by Cambridge University Press:  06 September 2012

SHUANGGE MA*
Affiliation:
School of Public Health, Yale University, New Haven, CT 06520, USA
YUAN HUANG
Affiliation:
Department of Statistics, Penn State University, University Park, PA 16802, USA
JIAN HUANG
Affiliation:
Departments of Statistics and Actuarial Science and Biostatistics, University of Iowa, Iowa City, IA 52242, USA
KUANGNAN FANG
Affiliation:
Department of Statistics, Xiamen University, Xiamen, China
*
*Corresponding author: School of Public Health, Yale University, New Haven, CT 06520, USA. Tel: 203-785-3119. Fax: 203-785-6912. E-mail: shuangge.ma@yale.edu
Rights & Permissions [Opens in a new window]

Summary

High-throughput gene profiling studies have been extensively conducted, searching for markers associated with cancer development and progression. In this study, we analyse cancer prognosis studies with right censored survival responses. With gene expression data, we adopt the weighted gene co-expression network analysis (WGCNA) to describe the interplay among genes. In network analysis, nodes represent genes. There are subsets of nodes, called modules, which are tightly connected to each other. Genes within the same modules tend to have co-regulated biological functions. For cancer prognosis data with gene expression measurements, our goal is to identify cancer markers, while properly accounting for the network module structure. A two-step sparse boosting approach, called Network Sparse Boosting (NSBoost), is proposed for marker selection. In the first step, for each module separately, we use a sparse boosting approach for within-module marker selection and construct module-level ‘super markers’. In the second step, we use the super markers to represent the effects of all genes within the same modules and conduct module-level selection using a sparse boosting approach. Simulation study shows that NSBoost can more accurately identify cancer-associated genes and modules than alternatives. In the analysis of breast cancer and lymphoma prognosis studies, NSBoost identifies genes with important biological implications. It outperforms alternatives including the boosting and penalization approaches by identifying a smaller number of genes/modules and/or having better prediction performance.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2012
Figure 0

Fig. 1. Parameter path of NSBoost: estimates as a function of number of iterations. (a) The four panels correspond to four modules in Step 1 of boosting. (b) The panel corresponds to four super markers in Step 2 of boosting. Vertical lines correspond to the selected number of iterations.

Figure 1

Fig. 2. Parameter path of NBoost: estimates as a function of number of iterations. (a) The four panels correspond to four modules in Step 1 of boosting. (b) The panel corresponds to four super markers in Step 2 of boosting. Vertical lines correspond to the selected number of iterations.

Figure 2

Table 1. Simulation study: median (sd) of the number of identified genes (T) and true positives (TP) computed over 200 replicates. Under each scenario, the first (second) row contains the summary statistics for gene (module) identification. Correlation structure: auto-regressive (auto), banded, and compound symmetry (comp)

Figure 3

Table 2. Description of datasets. Gene: number of genes profiled

Figure 4

Table 3. Data analysis results

Figure 5

Fig. A.1. Module construction result for dataset D4.