Mining Heterogenous Data with Mixture Models
Invited Talk presented on Nov. 3, 2005 by Alexander Schliep at Dagstuhl Seminar Managing and Mining Genome Information, Schloß Dagstuhl.
Abstract: Mixture models are an statistical framework for dealing with data which cannot easily be partitioned into groups, as it is frequently the case for biological applications. Heterogeneous, rich data as currently is available can be represented in appropriately complex models allowing for statistical queries. Time-courses models based on Hidden Markov Models are one example. While the integration of heterogeneous data is formally quite easy, a lot of important details have to be addressed. For example, partially supervised learning allows to integrate data of wildly different abundance, for example expression values for all genes versus experimentally confirmed transcription factor binding. We report on our method and sample applications which stress the importance of judicious selection of data sets to integrate.