## SLIQ: Simple Linear Inequalities for Efficient Contig Scaffolding

**R.S. Roy, K. Chen, A. Sengupta and A. Schliep**

*
2011. **Arxiv*.

Scaffolding is an important subproblem in de novo genome assembly in which mate pair data
are used to construct a linear sequence of contigs separated by gaps. Here we present SLIQ,
a set of simple linear inequalities derived from the geometry of contigs on the line that can be
used to predict the relative positions and orientations of contigs from individual mate pair reads
and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate
pairs and can be used as a preprocessing step for any scaffolding algorithm. We tested the SLIQ
inequalities on five real data sets ranging in complexity from simple bacterial genomes to complex
mammalian genomes and compared the results to the majority voting procedure used by many
other scaffolding algorithms. SLIQ predicted the relative positions and orientations of the contigs
with high accuracy in all cases and gave more accurate position predictions than majority voting
for complex genomes, in particular the human genome. Finally, we present a simple scaffolding
algorithm that produces linear scaffolds given a contig digraph. We show that our algorithm is very
efficient compared to other scaffolding algorithms while maintaining high accuracy in predicting
both contig positions and orientations for real data sets.