Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-2pzkn Total loading time: 0 Render date: 2024-05-02T23:10:58.374Z Has data issue: false hasContentIssue false

19 - Regularization Techniques for Highly Correlated Gene Expression Data with Unknown Group Structure

Published online by Cambridge University Press:  05 June 2013

Brent A. Johnson
Affiliation:
Emory University
Kim-Anh Do
Affiliation:
University of Texas, MD Anderson Cancer Center
Zhaohui Steve Qin
Affiliation:
Emory University, Atlanta
Marina Vannucci
Affiliation:
Rice University, Houston
Get access

Summary

Introduction

In the analysis of high-dimensional genomic data, the absolute correlation among predictors routinely exceeds 0.9, or even 0.95. In the presence of such high collinearity, special techniques are needed to achieve reliable estimation and variable selection in the linear model because common techniques, such as the Lasso (Tibshirani, 1996), fail in this setting. Although some authors have offered improvements over the Lasso when certain features of the design matrix, such as grouping (Yuan and Lin, 2006) or ordering (Tibshirani et al., 2005), can be exploited, another challenging prediction problem occurs when no additional design structure is known or assumed. Several authors have tackled regularization amidst highly correlated predictors, with the “elastic net” (Zou and Hastie, 2005) being the most popular and most widely propagated among them. In this chapter, we examine deficiencies of the elastic net and argue in favor of a little-known competitor, the “Berhu” penalized least squares estimator (Owen, 2006), for high-dimensional regression analyses of genomic data.

Regularization describes a popular class of computational and statistical methods for estimation, prediction, or regression that can be applied to virtually any statistic. In regression, these methods may be summarized as biased regression tools that sacrifice bias to minimize prediction or classification error. These methods transcend Bayesian and frequentist paradigms and extend naturally to high-dimensional regression. Among all regularized regression estimators, ℓ1- and ℓ2-regularization are the most popular, but only the former method results in a sparse solution. Despite the popularity of ℓ1-regularization, it suffers drawbacks.

Type
Chapter
Information
Advances in Statistical Bioinformatics
Models and Integrative Inference for High-Throughput Data
, pp. 382 - 397
Publisher: Cambridge University Press
Print publication year: 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×