Hostname: page-component-848d4c4894-8bljj Total loading time: 0 Render date: 2024-06-15T01:25:13.931Z Has data issue: false hasContentIssue false

Clustering expressed genes on the basis of their association with a quantitative phenotype

Published online by Cambridge University Press:  25 November 2005

ZHENYU JIA
Affiliation:
Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
SHIZHONG XU
Affiliation:
Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Cluster analyses of gene expression data are usually conducted based on their associations with the phenotype of a particular disease. Many disease traits have a clearly defined binary phenotype (presence or absence), so that genes can be clustered based on the differences of expression levels between the two contrasting phenotypic groups. For example, cluster analysis based on binary phenotype has been successfully used in tumour research. Some complex diseases have phenotypes that vary in a continuous manner and the method developed for a binary trait is not immediately applicable to a continuous trait. However, understanding the role of gene expression in these complex traits is of fundamental importance. Therefore, it is necessary to develop a new statistical method to cluster expressed genes based on their association with a quantitative trait phenotype. We developed a model-based clustering method to classify genes based on their association with a continuous phenotype. We used a linear model to describe the relationship between gene expression and the phenotypic value. The model effects of the linear model (linear regression coefficients) represent the strength of the association. We assumed that the model effects of each gene follow a mixture of several multivariate Gaussian distributions. Parameter estimation and cluster assignment were accomplished via an Expectation-Maximization (EM) algorithm. The method was verified by analysing two simulated datasets, and further demonstrated using real data generated in a microarray experiment for the study of gene expression associated with Alzheimer's disease.

Type
Research Article
Copyright
© 2005 Cambridge University Press