Comparison of eukaryotic single cell assembly tools
Contributed Talk presented on Aug. 2, 2013 by Rajat Roy at ICOP XIV - International Congress of Protistology. July 28 to August 2 2013, Vancouver, Canada..
Abstract: Whole genome amplification (WGA) has made DNA sequencing possible for single cell (SC) organisms that cannot be cultured in the lab. However, the high and uneven coverage of these libraries makes assembly difficult for traditional de Bruijn graph-based assemblers. This motivated the development of dedicated SC assemblers like Velvet-SC, SPAdes, IDBA-UD, etc., most of which reduce the complexity introduced by sequencing errors by applying read correction as a pre-processing step. Since most published results use E.coli for bench-marking the assemblers, the performance of these tools for more complex genomes like those of Protists remain unknown. We benchmark a number of popular SC assemblers using a variety of eukaryotic WGA read libraries and present our observations. We found that for some of these datasets, the available tools require a large amount of memory which makes assembly infeasible in machines with moderate configurations (≤ 128GB). This motivated us to develop a novel error correction (based on k-mer counting and filtering) and assembly (using only reliable k-mers) method that produces competitive assembly while being fast and memory-efficient.
Download PDF of Contributed Talk.