npScarf for a metagenome bin? #11

liuxianghui · 2017-09-26T08:01:44Z

I assume that npScarf is designed for Single species bacteria. Anyhow, I want to check how it works for my bacteria in metagenomics sample. With Illumina MISEQ data, I did the assembly with SPADES and further contig binning using MetaBAT.
I try to change the workflow for one of the good bins.
using bin1.fasta as spades.fasta
mapping the nanopore reads to bin1.fasta to create sam file.
jsa.np.npscarf -input ONT.sam --spadesDir='spades' -format sam -seq bin1.fasta -prefix bin1_spades > a.log
However, I found that not like the example dataset, this takes a long time.... and the output a.log becomes huge and I have to kill the job... Please kindly suggest me if it is ok to run this way for a metagenome bin?

hsnguyen · 2017-09-26T08:15:43Z

The assembly graph from metagenomics data might be too complicated to traverse for exhautive gap-filling. I suggest to run without the --spadeDir option and see how it's going.

Cheers,

liuxianghui · 2017-09-27T08:38:56Z

Could you kindly explain more about this process using spades graph?
( please also specific in which java file the process is applied?).
It seems that without spades folder it is still ok to run the scaffolding process but ends with more scaffolds... Are the two contigs only merged if they are mapped to the same nanapore read and they meet certain criteria? Could you explain more about the criteria. Is it the reason why it can not get applied to metagenomics bin?
To solve the problem, is it ok to try the following steps?

For a Metabat bin, extract the MiSeq pair end short reads mapped to those contigs in the bin.
1'. also extract the contigs mapped by those pair end short reads.
Do a mapping of nanapore long reads to contigs in this bin and generate the sam file.
Run the jsa.np.npscarf process and scaffold those contigs in the bin.

It identifies the long reads that are aligned to two unique contigs, thereby establishing the relative
position (that is, distance and orientation) of these contigs. To minimize the effect
of false positives that can arise from aligning noisy long reads, npScarf groups reads
that consistently support a particular relative position into a bridge and assigns the
bridge a score based on the number of supporting reads and the alignment quality
of these reads. When two unique contigs are connected by a bridge, they are
merged into one larger unique contig. npScarf uses a greedy strategy based on
Kruskal’s algorithm39, which merges contigs from the highest scoring bridges. In
the newly created contig, the gap is temporarily filled with the consensus sequence
of the reads forming the bridge. npScarf then identifies repetitive contigs that are
aligned to this consensus sequence, and uses these contigs to fill in the gap.

hsnguyen · 2017-09-28T02:34:34Z

npScarf currently use spades's assembly graph for the gap-filling step. Instead of the consensus sequence, it will now try to find a path from the assembly graph that can practically connect two contigs.

We also have a version that work from assembly graph from the beginning but it's still experimental. If you clone the current git repository, you can play with jsa.dev.newScarf. It has a GUI to visualize how the assembly graph being resolved when we have long reads as bridges. The red vertices represent alleged unique contig while the black ones are repetitive or artifact contigs and could be ignored. It worked for simple assembly graph (~500 nodes) but if you load in metagenomics data, it'd be too much to handle.

To sum up, SPAdes assembly graph for metagenomics data is complicated and not yet supported by the tool. So for your problem, if you can bin the paired-end reads and run SPAdes on that subset (pretend we assemble 1 isolate only), the resulted assembly graph would be simpler and possible to handled by jsa.np.npScarf (or jsa.dev.newScarf). But again, we couldn't guarantee anything since metagenomics assembly is difficult problem and you can even get errors, chimeric contigs right from Illumina assembly step.

liuxianghui closed this as completed Sep 27, 2017

liuxianghui reopened this Sep 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

npScarf for a metagenome bin? #11

npScarf for a metagenome bin? #11

liuxianghui commented Sep 26, 2017

hsnguyen commented Sep 26, 2017

liuxianghui commented Sep 27, 2017

hsnguyen commented Sep 28, 2017

npScarf for a metagenome bin? #11

npScarf for a metagenome bin? #11

Comments

liuxianghui commented Sep 26, 2017

hsnguyen commented Sep 26, 2017

liuxianghui commented Sep 27, 2017

hsnguyen commented Sep 28, 2017