Abstract
The rapid development of high-throughput biotechnology has made it possible to perform high-resolution genome profiling on several platforms simultaneously, resulting in an abundance of multidimensional genomic data. Such data provide unique and unprecedented opportunities to explore the coordination and cooperation between regulatory mechanisms on multiple levels. In the field of computational cancer genomics, integrating multiple types of genomic data for the discovery of combinatorial patterns is becoming a valuable and challenging issue. This chapter reviews recent progress in this direction, focusing on three methods developed by the authors. Specifically, we introduce a joint matrix factorization method, a network-regularized joint matrix factorization method, and a partial least squares regression method. The methods address the problem of integrating multiple data sets in the unsupervised, semi-supervised, or supervised manners. We also describe their applications in specific biological contexts. Themethods described herein reveal biologically relevant patterns that would have been overlooked with only a single type of data, and uncover new associations between the different layers of cellular activities.
Introduction
Cellular systems are characterized by multiple levels of organization and complicated interactions between levels. The different levels (e.g., epigenetic status, transcriptions, and translations)must coordinate precisely to maintain the function and robustness of the cell. Gene expression, a crucial part of the cellular system, is a very complex process influenced by epigenetic, transcriptional, and posttranscriptional regulation, among other factors. In healthy cells, the dynamic interplay between these regulatory levels acts to maximize the efficiency and specificity of gene expression. Abundant studies support this view that gene regulation is governed by multiple, complex, and extensively coupled networks. However, studies of the coordination between cellular activities on different levels have been hindered by a lack of appropriate data. Therefore, most genomic research focuses on global profiling of only one level.
The rapid development of high-throughput genomics technologies in the past decade, especially sequencing technologies, has significantly facilitated the characterization of cellular systems at multiple levels simultaneously. Such data have enabled researchers to obtain a global view of the principles underlying gene regulation.