Context-specific Independence Mixture Models for Cluster Analysis of Biological Data

B. Georgi

Ph.D. Thesis, Freie Universität Berlin, Jun 2009.

Clustering is a crucial first step in the exploratory analysis of biological data. This thesis is concerned with cluster analysis of biological data using mixture models. Mixture models is a class of powerful and versatile statistical models. We develop an extension to the conventional mixtures in form of the context-specific independence (CSI) framework. CSI mixtures are particularly suited for the analysis of biological data since they perform robustly in the presence of noise and uninformative features in the data. This is achieved by adapting the model complexity to the degree of variation observed in a given data set. We present a learning algorithm for CSI mixtures in a Bayesian framework. We apply CSI mixture clustering on data sets of transcription factor binding sites, protein sequences and complex disease phenotype data.

A reprint is available as PDF.

The publication includes results from the following projects or software tools: PyMix.

Further publications by Benjamin Georgi.