Mining Heterogenous Data with Mixture Models

Invited Talk presented on March 8, 2006 by Alexander Schliep at Second German-Japanese Symposium on Classification, Berlin.

Abstract: In recent years we have observed an immense growth in abundance and variety of biological data. The fundamental molecular processes we desire to understand are investigated on the sequence, the transcriptional, the protein, or the metabolism level. While each type of data poses interesting novel problems for classification and clustering, the grand challenge remains the integration of these heterogeneous data sources for a joint analysis. We will present two approaches for integrating data with an emphasis on gene expression time-courses. The core method is build on classical mixture models which nicely reflect some aspects of biological reality. A case study indicates some of the difficulties inherent to integration. Also, we will demonstrate how complex phenotypes can be represented in our framework. This is joint work with Ivan G. Costa, Benjamin Georgi and Alexander Schönhuth. A. Schliep, I. G. Costa, C. Steinhoff, A. A. Schönhuth. Analyzing Gene Expression Time-Courses IEEE/ACM Transactions on Computational Biology and Bioinformatics (Special Issue on Machine Learning for Bioinformatics), 2005, 2(3):179-193. Learning With Constrained and Unlabeled Data. Tilman Lange, Martin H. Law, Anil K. Jain and Joachim Buhmann, CVPR 2005, 2005.