-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds an optimized path when creating a consensus from a single input read. #790
Conversation
} | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything from here on down is identical, just indented under an else, and github has gotten confused.
…used it in consensus calling.
… of chunks on the input side which really ups utilization and throughput.
Codecov Report
@@ Coverage Diff @@
## master #790 +/- ##
=======================================
Coverage 95.47% 95.48%
=======================================
Files 121 121
Lines 6855 6866 +11
Branches 463 452 -11
=======================================
+ Hits 6545 6556 +11
Misses 310 310
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor suggestion (progress), one more potential optimization (min reads), and one tim-format change.
This probably needs a very careful review, and we may want to add more tests to make sure it is doing the right thing in all cases, but in my hands this cuts consensus calling time in half for
CallMolecularConsensusReads
when there are a lot of input molecules with only a single supporting read.On an example BAM file these two commits get me from a runtime on an example BAM of 10.80 minutes on master to 5.00 after the first commit and 4.45 minutes after the second commit.
Third commit adds a new ParIterator based on the iterata one though cleaned up a bit and extended a bit. Using that instead of the prior parallelization cuts runtime to ~3.4 minutes with the same number of threads.
And with the fourth commit (adding an async iterator on the input side too) we're down to 2.14 minutes!