Recent rapid technical advances in genome sequencing (genomics) and protein identification (proteomics) have given rise to research problems that require combined expertise from statistics, biology, computer science, and other fields. The interdisciplinary nature of bioinformatics presents many research challenges related to integrating concepts, methods, software, and multiplatform data. In addition to new tools for investigating biological systems via high-throughput genomic and proteomic measurements, statisticians face many novel methodological research questions generated by such data. The work in this book is dedicated to the development and application of Bayesian statistical methods in the analysis of high-throughput bioinformatics data that arise from problems in medical research, in particular cancer research, and molecular and structural biology. This book does not aim to be comprehensive in all areas of bioinformatics. Rather, it presents a broad overview of statistical inference problems related to three main high-throughput platforms: microarray gene expression, serial analysis gene expression (SAGE), and mass spectrometry proteomic profiles. The book's main focus is on the design, statistical inference, and data analysis, from a Bayesian perspective, of data sets arising from such high-throughput experiments.
Chapter 1 provides a detailed introduction to the three main data platforms and sets the scene for subsequent methodology chapters. This chapter is mainly aimed at nonbiologists and covers elementary biological concepts, details the unique measurement technology with associated idiosyncrasies for the different platforms, and generates an overall outline of issues that statistical methodology can address.
Subsequent chapters focus on specific methodology developments and are grouped approximately by the main bioinformatics platform, with several chapters discussing the integration of at least two platforms.