Algorithms for Big Data

The data created from high-throughput experimental platforms, such as high-throughput sequencing (HTS), poses computational challenges in particular if advanced statistical approaches such as Bayesian methods are used in analysis. In collaboration with groups at Rutgers, CINJ and CWI we develop methods and analyze HTS data and develop efficient algorithms for Bayesian approaches. At the core of our compressive genomics approach, recently funded by the Big Data to Knowledge Program (BD2K) of the NIH, are reduced representations, compressed, lossless or lossy, transformations of the input, on which algorithms can operate directly and efficiently. These and cache-efficiency and cache-obliviousness are important algorithm engineering aspects for real-world performance.

Single-Cell Genomics

Single-Cell Genomics (SCG), is the union of high-throughput sequencing (HTS), FACS/microfluidics and WGA, powered by rapidly developing bioinformatics methods. SCG will fundamentally transform studies of microbial biology, biodiversity and phylogeny. speed and effectiveness of the treatment of microbial human pathogens in clinical settings and change our knowledge of microbial biology, bio-diversity and phylogeny. We generated a draft genome of eukaryotes from single-cells, designed fast, cache-efficient k-mer counter and provide advanced bioinformatics methods for SCG.

Functional Genomics

The methods we develop advance the understanding of gene function by analysis of complex, heterogeneous experimental data including data from imaging, for example from in situ gene expression experiments. Computer vision methods combined with statistical models and semi-supervised learning help to find functional modules in Drosophila development by their spatial and temporal co-expression patterns. Novel tree-models elucidate regulatory mechanisms in the development of the lymphoid system.


Computational thinking is becoming a core requirement across disciplines. Teaching computational and algorithmic ideas can benefit greatly from software tools. We develop animation systems for graph algorithms which are available on the desktop, as a web app, and soon as an iOS App; CATBox is a Springer textbook using Gato. Learners can concentrate on tackling exciting bioinformatics problems with our Hidden Markov Model library.


Find out about the people in the lab, our research areas, our publications, the software tools we develop and maintain or how to contact us.