-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
npScarf for a metagenome bin? #11
Comments
The assembly graph from metagenomics data might be too complicated to traverse for exhautive gap-filling. I suggest to run without the --spadeDir option and see how it's going. Cheers, |
Could you kindly explain more about this process using spades graph?
It identifies the long reads that are aligned to two unique contigs, thereby establishing the relative |
npScarf currently use spades's assembly graph for the gap-filling step. Instead of the consensus sequence, it will now try to find a path from the assembly graph that can practically connect two contigs. We also have a version that work from assembly graph from the beginning but it's still experimental. If you clone the current git repository, you can play with jsa.dev.newScarf. It has a GUI to visualize how the assembly graph being resolved when we have long reads as bridges. The red vertices represent alleged unique contig while the black ones are repetitive or artifact contigs and could be ignored. It worked for simple assembly graph (~500 nodes) but if you load in metagenomics data, it'd be too much to handle. To sum up, SPAdes assembly graph for metagenomics data is complicated and not yet supported by the tool. So for your problem, if you can bin the paired-end reads and run SPAdes on that subset (pretend we assemble 1 isolate only), the resulted assembly graph would be simpler and possible to handled by jsa.np.npScarf (or jsa.dev.newScarf). But again, we couldn't guarantee anything since metagenomics assembly is difficult problem and you can even get errors, chimeric contigs right from Illumina assembly step. |
I assume that npScarf is designed for Single species bacteria. Anyhow, I want to check how it works for my bacteria in metagenomics sample. With Illumina MISEQ data, I did the assembly with SPADES and further contig binning using MetaBAT.
I try to change the workflow for one of the good bins.
using bin1.fasta as spades.fasta
mapping the nanopore reads to bin1.fasta to create sam file.
jsa.np.npscarf -input ONT.sam --spadesDir='spades' -format sam -seq bin1.fasta -prefix bin1_spades > a.log
However, I found that not like the example dataset, this takes a long time.... and the output a.log becomes huge and I have to kill the job... Please kindly suggest me if it is ok to run this way for a metagenome bin?
The text was updated successfully, but these errors were encountered: