Adds an optimized path when creating a consensus from a single input read. #790

tfenne · 2022-02-25T13:44:37Z

This probably needs a very careful review, and we may want to add more tests to make sure it is doing the right thing in all cases, but in my hands this cuts consensus calling time in half for CallMolecularConsensusReads when there are a lot of input molecules with only a single supporting read.

On an example BAM file these two commits get me from a runtime on an example BAM of 10.80 minutes on master to 5.00 after the first commit and 4.45 minutes after the second commit.

Third commit adds a new ParIterator based on the iterata one though cleaned up a bit and extended a bit. Using that instead of the prior parallelization cuts runtime to ~3.4 minutes with the same number of threads.

And with the fourth commit (adding an async iterator on the input side too) we're down to 2.14 minutes!

…read.

tfenne · 2022-02-25T13:46:18Z

src/main/scala/com/fulcrumgenomics/umi/VanillaUmiConsensusCaller.scala

        }
+      }
+      else {


Everything from here on down is identical, just indented under an else, and github has gotten confused.

…used it in consensus calling.

… of chunks on the input side which really ups utilization and throughput.

codecov-commenter · 2022-02-25T23:12:07Z

Codecov Report

Merging #790 (bced5b0) into master (7512807) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #790   +/-   ##
=======================================
  Coverage   95.47%   95.48%           
=======================================
  Files         121      121           
  Lines        6855     6866   +11     
  Branches      463      452   -11     
=======================================
+ Hits         6545     6556   +11     
  Misses        310      310

Flag	Coverage Δ
unittests	`95.48% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...crumgenomics/umi/CallMolecularConsensusReads.scala	`100.00% <100.00%> (ø)`
...cala/com/fulcrumgenomics/umi/ConsensusCaller.scala	`94.44% <100.00%> (ø)`
...fulcrumgenomics/umi/ConsensusCallingIterator.scala	`100.00% <100.00%> (ø)`
...om/fulcrumgenomics/umi/SimpleConsensusCaller.scala	`100.00% <100.00%> (ø)`
...ulcrumgenomics/umi/VanillaUmiConsensusCaller.scala	`91.58% <100.00%> (+1.70%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7512807...bced5b0. Read the comment docs.

src/main/scala/com/fulcrumgenomics/util/ParIterator.scala

nh13

One minor suggestion (progress), one more potential optimization (min reads), and one tim-format change.

src/main/scala/com/fulcrumgenomics/umi/CallMolecularConsensusReads.scala

src/main/scala/com/fulcrumgenomics/umi/VanillaUmiConsensusCaller.scala

Adds an optimized path when creating a consensus from a single input …

a7dc5eb

…read.

tfenne requested a review from nh13 February 25, 2022 13:44

tfenne commented Feb 25, 2022

View reviewed changes

tfenne added 3 commits February 25, 2022 10:24

More consensus speedups.

51f4bfd

Added a new ParIterator (that will go into commons after review) and …

80dfee7

…used it in consensus calling.

One more small change to also use an async iterator to cache a couple…

03414aa

… of chunks on the input side which really ups utilization and throughput.

tfenne commented Feb 25, 2022

View reviewed changes

src/main/scala/com/fulcrumgenomics/util/ParIterator.scala Outdated Show resolved Hide resolved

tfenne commented Feb 25, 2022

View reviewed changes

src/main/scala/com/fulcrumgenomics/util/ParIterator.scala Outdated Show resolved Hide resolved

tfenne mentioned this pull request Feb 26, 2022

A parallel iterator implementation. fulcrumgenomics/commons#76

Merged

tfenne added 2 commits February 26, 2022 08:54

Bumped to a snapshot of commons and removed ParIterator from fgbio.

b67a18d

A little cleanup.

0b5c661

nh13 approved these changes Feb 26, 2022

View reviewed changes

Review fixups

bced5b0

tfenne merged commit 4b981f2 into master Feb 26, 2022

tfenne deleted the tf_optimize_single_read_consensus branch February 26, 2022 16:26

nh13 mentioned this pull request Jul 20, 2022

Fix a bug where consensus reads are produced with zero depth #859

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds an optimized path when creating a consensus from a single input read. #790

Adds an optimized path when creating a consensus from a single input read. #790

tfenne commented Feb 25, 2022 •

edited

Loading

tfenne Feb 25, 2022

codecov-commenter commented Feb 25, 2022 •

edited

Loading

nh13 left a comment

Adds an optimized path when creating a consensus from a single input read. #790

Adds an optimized path when creating a consensus from a single input read. #790

Conversation

tfenne commented Feb 25, 2022 • edited Loading

tfenne Feb 25, 2022

Choose a reason for hiding this comment

codecov-commenter commented Feb 25, 2022 • edited Loading

Codecov Report

nh13 left a comment

Choose a reason for hiding this comment

tfenne commented Feb 25, 2022 •

edited

Loading

codecov-commenter commented Feb 25, 2022 •

edited

Loading