Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds an optimized path when creating a consensus from a single input read. #790

Merged
merged 7 commits into from
Feb 26, 2022

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Feb 25, 2022

This probably needs a very careful review, and we may want to add more tests to make sure it is doing the right thing in all cases, but in my hands this cuts consensus calling time in half for CallMolecularConsensusReads when there are a lot of input molecules with only a single supporting read.

On an example BAM file these two commits get me from a runtime on an example BAM of 10.80 minutes on master to 5.00 after the first commit and 4.45 minutes after the second commit.

Third commit adds a new ParIterator based on the iterata one though cleaned up a bit and extended a bit. Using that instead of the prior parallelization cuts runtime to ~3.4 minutes with the same number of threads.

And with the fourth commit (adding an async iterator on the input side too) we're down to 2.14 minutes!

@tfenne tfenne requested a review from nh13 February 25, 2022 13:44
}
}
else {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything from here on down is identical, just indented under an else, and github has gotten confused.

@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2022

Codecov Report

Merging #790 (bced5b0) into master (7512807) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #790   +/-   ##
=======================================
  Coverage   95.47%   95.48%           
=======================================
  Files         121      121           
  Lines        6855     6866   +11     
  Branches      463      452   -11     
=======================================
+ Hits         6545     6556   +11     
  Misses        310      310           
Flag Coverage Δ
unittests 95.48% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...crumgenomics/umi/CallMolecularConsensusReads.scala 100.00% <100.00%> (ø)
...cala/com/fulcrumgenomics/umi/ConsensusCaller.scala 94.44% <100.00%> (ø)
...fulcrumgenomics/umi/ConsensusCallingIterator.scala 100.00% <100.00%> (ø)
...om/fulcrumgenomics/umi/SimpleConsensusCaller.scala 100.00% <100.00%> (ø)
...ulcrumgenomics/umi/VanillaUmiConsensusCaller.scala 91.58% <100.00%> (+1.70%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7512807...bced5b0. Read the comment docs.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion (progress), one more potential optimization (min reads), and one tim-format change.

@tfenne tfenne merged commit 4b981f2 into master Feb 26, 2022
@tfenne tfenne deleted the tf_optimize_single_read_consensus branch February 26, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants