CLEVER: Clique-Enumerating Variant Finder
T. Marshall, I.G. Costa, S. Canzar, M. Bauer, G. Klau, A. Schliep and A. Schönhuth
Bioinformatics 2012, 28:22, 2875–82..
Next-generation sequencing techniques have for the first time facilitated a large scale analysis of human genetic variation. However, despite the advances in sequencing speeds, achieved at ever lower costs, the computational discovery of structural variants is not yet standard. It is likely that a considerable amount of variants have remained undiscovered in many sequenced individuals. Here we present a novel internal segment size based approach, which organizes all, including also concordant reads into a read alignment graph where max-cliques represent maximal contradiction-free groups of alignments. A specifically engineered algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions (indels). We achieve highly favorable performance rates in particular on indels of sizes 30--99 bp. Beyond superior recall and precision, we predict nearly 25% of the annotations as the only tool whereas none of the other approaches makes more than 6% such unique and correct predictions. We achieve favorable performance rates also on larger indels (>=100 bp) and predict a non-negligible amount of correct, but so far undiscovered variants here as well. On very short indels (10-29 bp) we outperform all prior insert size approaches, while our unique predictions favorably complement the predictions of the split-read aligner considered. Our implementation is available from this http URL as open source software under the terms of the GNU General Public License.
A reprint is available as PDF.
The publication includes results from the following projects or software tools: HTSMethods.