Hostname: page-component-6766d58669-kl59c Total loading time: 0 Render date: 2026-05-16T19:33:25.479Z Has data issue: false hasContentIssue false

Extreme value theory in analysis of differential expression in microarrays where either only up- or down-regulated genes are relevant or expected

Published online by Cambridge University Press:  08 October 2008

RENATA IVANEK*
Affiliation:
Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA
YRJÖ T. GRÖHN
Affiliation:
Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA
MARTIN T. WELLS
Affiliation:
Department of Biological Statistics and Computational Biology, 301 Malott Hall, Cornell University, Ithaca, NY 14853, USA
SARITA RAENGPRADUB
Affiliation:
Department of Food Science, 412 Stocking Hall, Cornell University, Ithaca, NY 14853, USA
MARK J. KAZMIERCZAK
Affiliation:
Channing Laboratory, Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA
MARTIN WIEDMANN
Affiliation:
Department of Food Science, 412 Stocking Hall, Cornell University, Ithaca, NY 14853, USA
*
*Corresponding author. Department of Population Medicine and Diagnostic Sciences, College of Veterinary Medicine, Cornell University, Ithaca, NY 14853, USA. Tel: +1(607) 2533052. Fax. +1 (607) 2533083. e-mail: ri25@cornell.edu
Rights & Permissions [Opens in a new window]

Summary

We propose an empirical Bayes method based on the extreme value theory (EVT) (BE) for the analysis of data from spotted microarrays where the interest of the investigator (e.g. to identify up-regulated gene markers of a disease) or the design of the experiment (e.g. in certain ‘wild-type versus mutant’ experiments) limits identification of differentially expressed genes to those regulated in a single direction (either up or down). In such experiments, unlike in genome-wide microarrays, analysis is restricted to the tail of the distribution (extremes) of all the genes in the genome. The EVT provides a platform to account for this extreme behaviour, and is therefore a natural candidate for inference about differential expression. We compared the performance of the developed BE method with two other empirical Bayes methods on two real ‘wild-type versus mutant’ datasets where a single direction of regulation was expected due to experimental design, and in a simulation study. The BE method appears to have a better fit to the real data. In the analysis of simulated data, the BE method showed better accuracy and precision while being robust to different characteristics of microarray experiments. The BE method, therefore, seems promising and useful for inference about differential expression in microarrays where either only up- or down-regulated genes are relevant or expected.

Information

Type
Paper
Copyright
Copyright © 2008 Cambridge University Press
Figure 0

Fig. 1. The BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics plotted against the contrast estimators from fitting linear models at the gene level, ‘alpha_g’ (denoted as \hat{\alpha}_g in the text), also translated into fold changes (FC), and against results reported by Kazmierczak et al. (2003). ‘K’ and the associated right y-axis indicate whether Kazmierczak et al. (2003) reported a gene as DE (‘yes’) or not (‘no’). ‘NeBC’=normal-exponential convolution background correction method. ‘MBC’=multiplicative background correction method. Two horizontal dashed lines (enclosing a shaded area) indicate the 5th and 95th percentiles of OT of the BE statistic estimated for the FDR fixed to 0. ‘FNR=(,)’ denotes false negative rate (5th and 95th percentiles) associated with the OT.

Figure 1

Table 1. Definitions of model parameters and hyperparameters in the empirical Bayes EVD mixture model (BE), and models of Lonnstedt & Speed (2002) modified by Smyth (2004) (BN) and Bhowmick et al. (2006) (BL)

Figure 2

Fig. 2. FDR vs. FNR plots of 100 simulated datasets overlaid by the horizontal average curve and box plots showing horizontal spread of the performance of the BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics in the four simulated model-datasets. ‘NeBC’=normal-exponential convolution background correction method; ‘MBC’=multiplicative background correction method; ‘ne’=not estimated.

Figure 3

Fig. 3. FDR vs. FNR plots showing horizontal average of the BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics from 100 simulated datasets in the four simulated model-datasets with the proportion of DE genes (pDE) reported in Kazmierczak et al. (2003) (25 and 20%, in osmotic stress and stationary-phase data, respectively) and simulating deviations from pDE of −10% and +10%. ‘NeBC’=normal-exponential convolution background correction method; ‘MBC’=multiplicative background correction method.

Figure 4

Fig. 4. FDR vs. FNR plots showing horizontal average of the BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics from 100 simulated datasets in the four simulated model-datasets simulating the number of biological replicates (n) equal to 2, 4 and 8. ‘NeBC’=normal-exponential convolution background correction method; ‘MBC’=multiplicative background correction method.

Figure 5

Fig. 5. FDR vs. FNR plots showing horizontal average of the BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics from 100 simulated datasets in the four simulated model-datasets simulating the number of technical replicates (m) equal to 1, 3 and 6. ‘NeBC’=normal-exponential convolution background correction method; ‘MBC’=multiplicative background correction method.

Figure 6

Fig. 6. FDR vs. FNR plots showing horizontal average of the BN (Lonnstedt & Speed, 2002; Smyth, 2004), BL (Bhowmick et al., 2006) and empirical Bayes EVD mixture model (BE) statistics from 100 simulated datasets in the four simulated model-datasets simulating an experiment with a low number of DE genes (10). ‘NeBC’=normal-exponential convolution background correction method; ‘MBC’=multiplicative background correction method.

Supplementary material: File

Ivanek supplementary material

Appendix

Download Ivanek supplementary material(File)
File 207.4 KB