Schliep lab : Presentation Details : Mixture models for heterogeneous biological data

Mixture models for heterogeneous biological data

Invited Talk presented on April 6, 2007 by Alexander Schliep at Symposium on Bioinformatics and Biomathematics, Centrum voor Wiskunde en Informatica, Amsterdam.

Abstract: Recent years have seen many efforts generating biological data on a very large scale, from sequences to transcripts and from genotypes to phenotypes. The integration of these heterogeneous sources of data in order to arrive at conclusive information about a biological process is typically performed manually. We revisit classical mixture models, or convex combinations of density functions, and show how they can effectively model high-dimensional data by use of sufficiently constrained component models. Context-specific independence (CSI) provides a framework for learning relevant variables while avoiding over-fitting. The analysis of heterogeneous data then becomes possible either with a naive Bayes approach or with semi-supervised learning. In semi-supervised learning primary mass data is augmented with labels from possibly sparse secondary data. We will show several case studies, for example the detection of groups of syn-expressed genes from in-situ images and gene expression time-courses during embryogenesis.