Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-06T10:34:23.715Z Has data issue: false hasContentIssue false

Sparse group penalized integrative analysis of multiple cancer prognosis datasets

Published online by Cambridge University Press:  12 August 2013

JIN LIU
Affiliation:
Division of Epidemiology and Biostatistics, UIC School of Public Health1603 W. Taylor Street (MC 923), Chicago, IL 60612-4394, USA
JIAN HUANG
Affiliation:
Department of Statistics and Actuarial Science, and Department of Biostatistics, University of Iowa, Iowa, USA
YANG XIE
Affiliation:
Department of Clinical Sciences, UT Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX 75390, USA
SHUANGGE MA*
Affiliation:
Department of Biostatistics, Yale University; and VA Cooperative Studies Program Coordinating Center, West Haven, CT, USA
*
*Corresponding author: Department of Biostatistics, School of Public Health, Yale University, 60 College ST, LEPH 206, New Haven, CT 06520, USA. Tel: 203-785-3119. Fax: 203-785-6912. E-mail: shuangge.ma@yale.edu
Rights & Permissions [Opens in a new window]

Summary

In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the ‘large d, small n’ characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.

Information

Type
Research Papers
Copyright
Copyright © Cambridge University Press 2013 
Figure 0

Table 1. Simulation under the homogeneity model. In each cell, the first/second row is the mean number (sd) of true/false positives. When γ=∞, MCP simplifies to Lasso

Figure 1

Table 2. Simulation under the heterogeneity model. In each cell, the first/second row is the mean number (sd) of true/false positives. When γ=∞, MCP simplifies to Lasso

Figure 2

Table 3. Analysis of breast cancer data using SGMCP: identified genes and their estimates

Figure 3

Table 4. Analysis of lung cancer data using SGMCP: identified genes and their estimates

Supplementary material: File

Liu et al. supplementary material

Supplementary Appendix and tables

Download Liu et al. supplementary material(File)
File 83.4 KB