Partially-supervised context-specific independence mixture modeling

B. Georgi and A. Schliep

In workshop on Data Mining in Functional Genomics and Proteomics, ECML 2007, 2007.

Partially supervised or semi-supervised learning refers to machine learning methods which fall between clustering and classification. In the context of clustering, labels can specify link and do-not-link constraints between data points in different ways and constrain the resulting clustering solutions. This is a very natural framework for many biological applications as some labels are often available and even very few labels greatly improve clustering results. Context-specific independence models constitute a framework for simultaneous mixture estimation and model structure determination to obtain meaningful models for high-dimensional data with many, possibly uninformative, variables. Here we present the first approach for partial learning of CSI models and demonstrate the effectiveness of modest amounts of labels for simulated data and for protein sub-family determination.

A reprint is available as PDF.

The publication includes results from the following projects or software tools: PyMix, CSIMixtures.

Further publications by Alexander Schliep, Benjamin Georgi.