The General Hidden Markov Model Library: Analyzing Systems with Unobservable States

A. Schliep, B. Georgi, W. Rungsarityotin, I.G. Costa and A. Schönhuth

In Forschung und wissenschaftliches Rechnen: Beiträge zum Heinz-Billing-Preis 2004, Gesellschaft für wissenschaftliche Datenverarbeitung, 121–135, 2005.

Hidden Markov Models (HMM) are a class of statistical models which are widely used in a broad variety of disciplines for problems as diverse as understanding speech to finding genes which are implicated in causing cancer. Adaption for different problems is done by designing the models and, if necessary, extending the formalism. The General Hidden Markov Model (GHMM) C-library provides production-quality implementations of basic and advanced aspects of HMMs. The architecture is build around the software library, adding wrappers for using the library interactively from the languages Python and R and applications with graphical user interfaces for specific analysis and modeling tasks. We have found, that the GHMM can drastically reduce the effort for tackling novel research questions. We focus on the Graphical Query Language (GQL) application for analyzing experiments which measure the expression (or mRNA) levels of many genes simultaneously over time. Our approach, combining HMMs in a statistical mixture model, using partially supervised learning as the paradigm for training results in a highly effective, robust analysis tool for finding groups of genes sharing the same pattern of expression over time, even in the presence of high levels of noise.

A reprint is available as PDF.

The publication includes results from the following projects or software tools: GQL, GHMM.

Further publications by Alexander Schliep, Ivan G Costa, Wasinee Rungsarityotin, Benjamin Georgi.