-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow run times #70
Comments
Two thoughts jump to mind regarding slow performance:
If you didn't specify a thread count (with the Racon will take longer with more long reads, and Pilon will take longer with more short reads, so subsampling your read sets down can make those steps run faster. If either your short read set or long read set exceeds 500 Mbp, that's probably overkill - subsampling will improve speed but probably not impact assembly quality. I wrote Filtlong for subsampling long reads, so take a look at that tool. For short reads, you could try Trim Galore with stringent settings for Also, the last Pilon polish is probably the least important step, so you can turn that off with Let me know if any of these help! Ryan |
Awesome, thanks for the suggestions! I ran these with 4 threads each, but it was run on a HPC cluster, so I was favouring having more assemblies run in parallel vs giving one job a ton of CPUs. Actually, do you or anyone else have an analysis of num threads vs total run time? I could probably figure out the optimal number of threads to pass that way. And yep, did not consider subsampling, I definitely think that could improve run times. Did you ever consider subsampling as a part of the unicycler pipeline? I don't think I'd remove the polish step, we've noticed that indels make a huge difference in our downstream analysis (gene finding / annotation), so reducing the indels is high priority. |
I have considered integrated Filtlong (or something like it) into Unicycler. If nothing else, I should have Unicycler display a warning/suggestion if the long read set is very large. It's added to my never-ending to-do list 😄 Regarding threads, I'm not aware of any systematic analysis. Since performance rarely scales linearly with threads, I suspect that your approach (4 threads per assembly and many assemblies in parallel) is probably fine. Just keep an eye on memory, as nothing will make a computer slow down like running out of RAM. If memory is tight, then more threads and fewer simultaneous assemblies might be better. One final thought regarding the Racon polishing time. A little while ago I decided that hybrid assemblies didn't need so many rounds, so I limited it to five. If you're seeing more than five Racon rounds in your hybrid assemblies, that suggests you're using an older version of Unicycler and the current version would be faster. Ryan |
I've been using unicycler quite a bit over the past few months, and I've gotten really good assemblies out of it. However, I was surprised to notice that the program took a very long time to run. In the original paper, it mentioned ~24 hours, but in my cases, everything took between 3-5 days to complete. I was interested in what was taking so long, so I recorded the times for each of the 33 bacterial assemblies I've done.

By far the most time consuming step was the polishing steps with Pilon and Racon. These are followed by Aligning reads (which I expect to be slow), and then by SPAdes read error correction. One thing that surprised me here was that the spades read error correction takes much longer than the actual SPAdes assemblies.
I'm just wondering if these are patterns that you (or others) have noticed in the past, and whether there is anything we can do to make this pipeline a little quicker?
The text was updated successfully, but these errors were encountered: