BayesianHMM: Fast MCMC Sampling for Hidden Markov Models to Determine Copy Number Variations

Hidden Markov Models are often used for analyzing Comparative Genomic Hybridization (CGH) data to identify chromosomal aberrations or copy number variations by segmenting observation sequences. For efficiency reasons often parameters of an HMM are estimated with maximum likelihood and a segmentation is obtained with the Viterbi algorithm. This introduces considerable uncertainty in the segmentation, which can be avoided with Bayesian approaches using Markov Chain Monte Carlo (MCMC) sampling. While their advantages have been clearly demonstrated, the likelihood based approaches are preferred in practice for their lower running times; datasets coming from high-density arrays and next generation sequencing amplify these problems.

We propose an approximate sampling technique inspired by discrete sequence compression for HMM and kd-trees to leverage spatial relations between data points in typical data sets to speed up the MCMC sampling.

For further information contact John Wiedenhoeft (john.wiedenhoeft@chalmers.se). This project is connected to the following projects: AlgoEngineering, HaMMLET.

Team

Members: John Wiedenhoeft, Alexander Schliep, Md P. Mahmud.

Publications

Bello et al.. Compressed computations using wavelets for hidden Markov models with continuous observations. PLOS One 2023, 6:18, e0286074.

Wiedenhoeft et al.. Bayesian localization of CNV candidates in WGS data within minutes. Algorithms for Molecular Biology 2019, 14:20.

Wiedenhoeft et al.. Using HaMMLET for Bayesian Segmentation of WGS Read-Depth Data.. Methods Mol Biol 2018, 1833, 83–93.

Wiedenhoeft. Dynamically Compressed Bayesian Hidden Markov Models using Haar Wavelets. Ph.D. Thesis, Rutgers, The State University of New Jersey, Oct 2018.

Wiedenhoeft et al.. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. In Research in Computational Molecular Biology: 20th Annual Conference, RECOMB 2016, Santa Monica, CA, USA, April 17-21, 2016, Proceedings, Springer, 9649, 263, 2016.

Wiedenhoeft et al.. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. PLoS Computational Biology 2016, 12:5, e1004871.

Wiedenhoeft et al.. Fast Bayesian Inference of Copy Number Variants using Hidden Markov Models with Wavelet Compression. biorXiv 2015.

Mahmud. Reduced representations for efficient analysis of genomic data; from microarray to high throughput sequencing. Ph.D. Thesis, Oct 2014.

Mahmud et al.. Speeding Up Bayesian HMM by the Four Russians Method. In Algorithms in Bioinformatics, Springer Berlin / Heidelberg, 6833, 188–200, 2011.

Mahmud et al.. Fast MCMC Sampling for Hidden Markov Models to Determine Copy Number Variations. BMC Bioinformatics 2011, 12:1, 428.

Mahmud et al.. Speeding Up Bayesian HMM by the Four Russians Method. In Algorithms in Bioinformatics, Springer Berlin / Heidelberg, 6833, 188–200, 2011.