SLiQ: Simple linear inequalities based Mate-Pair reads filtering and scaffolding

Scaffolding is an important sub-problem in de novo genome assembly in which mate pair data are used to construct a linear sequence of contigs separated by gaps. A set of simple linear inequalities (SLIQ) derived from the geometry of contigs on the line can be used to predict the relative positions and orientations of contigs from individual mate pair reads and thus produce a contig digraph. The SLIQ inequalities can also filter out unreliable mate pairs and can be used as a pre-processing step for any scaffolding algorithm. This tool filters mate pairs and then produces a Directed Contig Graph (contig diGraph). We also provide a Naive scaffolder that can then produce scaffolds out of the contig diGraph.

The Python scripts and a 'readme' file containing the instructions are available for download here.

For further information contact Rajat S Roy (rajatroy@cs.rutgers.edu). This project is connected to the following projects: SCG, HTSMethods, AlgoEngineering.

Team

Members: Rajat S Roy. Collaborators: Kevin Chen (Department of Genetics, Rutgers), Anirvan Sengupta (Department of Physics, Rutgers).