Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462

davidbenjamin · 2018-11-29T16:48:31Z

Closes #3561.

@vruano could you review this? Note that I ended up implementing Dijkstra's algorithm instead of using a library, but it's only a few lines of code. This PR does not affect the outputs of HC or M2 at all.

Also, @vruano, I recall your misgivings about the current haplotype enumeration (which this preserves):

However, the current algorithm and the k-dijkstra still would show the same problems in terms of doing a suboptimal selection of haplotypes in terms of their coverage of plausible variation. I had implemented an alternative that fixed that issue . . . simulate haplotypes based on those same furcation likelihoods and wstop when we have not discovered anything new for a while... the problem of such an approach is to make it deterministic.

Although this PR doesn't do that, it could easily be extended to do so just by running Dijkstra's algorithm until you have the amount of variation you want. That is, instead of terminating when the Dijkstra priority queue is empty or when we have discovered the maximum number of haplotypes, we could terminate based on some Predicate<List<KBestHaplotype>> on the list of haplotypes found so far. And it's deterministic since Dijkstra's algorithm is greedy.

So basically, it's a nice refactoring for now but it also sets up some worthwhile extensions if we want.

vruano · 2018-12-13T21:58:47Z

...org/broadinstitute/hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinder.java

-        final List<KBestSubHaplotypeFinder> sourceFinders = new ArrayList<>(sources.size());
-        for (final SeqVertex source : sources) {
-            sourceFinders.add(createVertexFinder(source));
+    public List<KBestHaplotype> findBestHaplotypes(final int maxNumberOfHaplotypes) {


ok... perhaps you can mention that this is described in https://en.wikipedia.org/wiki/K_shortest_path_routing. I now wonder what is the advantages in O() vs the previous solution... though it would be roughly the same but perhaps I was mistaken... that article says that this alg is O(m + n log n + k)... The dinamamic programming one seems to be O(m + n x log x + k) where x is = the number of outgoing edges from a vertex.... ~ (m / n). I would argument hat x is normally quite low like 1-3 and so O(x log x) is effective O(1)... that said perhaps the constants are much larger in the previous solution and in practice this solution is faster also is clearly much simpler. I bet the contribution of this part to the overall HC cost is so low that its not worth the difference anyway so i think that regardless of the O(.) your solution stands just based on the merits of simplicity.

good point. Done.

vruano · 2018-12-13T22:06:31Z

...dinstitute/hellbender/tools/walkers/haplotypecaller/graphs/KBestHaplotypeFinderUnitTest.java

-
-        final Path<SeqVertex,BaseEdge> refPath = bestPathFinder.get(0).path();
-        final Path<SeqVertex,BaseEdge> altPath = bestPathFinder.get(1).path();
+        final List<KBestHaplotype>bestPaths = new KBestHaplotypeFinder(graph,top,bot).findBestHaplotypes();


Missing blank space just before bestPahts =?

vruano · 2018-12-24T05:02:42Z

No need to change anything really, go ahead and merge once conflicts are addressed and tests pass.

…Dijkstra's algorithm

codecov-io · 2018-12-27T02:58:12Z

Codecov Report

Merging #5462 into master will increase coverage by 0.012%.
The diff coverage is 86.42%.

@@               Coverage Diff               @@
##              master     #5462       +/-   ##
===============================================
+ Coverage     87.075%   87.087%   +0.012%     
+ Complexity     31334     31225      -109     
===============================================
  Files           1921      1915        -6     
  Lines         144602    144079      -523     
  Branches       15951     15891       -60     
===============================================
- Hits          125912    125474      -438     
+ Misses         12896     12834       -62     
+ Partials        5794      5771       -23

Impacted Files	Coverage Δ	Complexity Δ
...s/walkers/haplotypecaller/graphs/PathUnitTest.java	`93.258% <ø> (-0.22%)`	`7 <0> (ø)`
...rs/haplotypecaller/graphs/AdaptiveChainPruner.java	`95.349% <100%> (ø)`	`16 <0> (ø)`	⬇️
...ller/readthreading/ReadThreadingGraphUnitTest.java	`95.238% <100%> (+0.018%)`	`55 <0> (ø)`	⬇️
...rs/haplotypecaller/graphs/ChainPrunerUnitTest.java	`99.194% <100%> (-0.006%)`	`40 <0> (ø)`
...der/tools/walkers/haplotypecaller/graphs/Path.java	`96.491% <100%> (+1.33%)`	`24 <1> (-2)`	⬇️
...walkers/haplotypecaller/graphs/KBestHaplotype.java	`100% <100%> (+13.514%)`	`5 <5> (-8)`	⬇️
...pecaller/readthreading/ReadThreadingAssembler.java	`67.578% <33.333%> (-0.961%)`	`51 <0> (ø)`
...r/graphs/SharedVertexSequenceSplitterUnitTest.java	`94.545% <66.667%> (-4.35%)`	`33 <3> (-5)`
.../readthreading/ReadThreadingAssemblerUnitTest.java	`98.712% <66.667%> (ø)`	`38 <0> (ø)`	⬇️
...otypecaller/graphs/CommonSuffixMergerUnitTest.java	`94.286% <85.714%> (+1.603%)`	`20 <3> (-1)`	⬇️
... and 9 more

davidbenjamin assigned vruano Nov 29, 2018

davidbenjamin requested a review from vruano November 29, 2018 16:48

davidbenjamin mentioned this pull request Dec 12, 2018

HaplotypeCaller makes different variant calls depending on input padding #3697

Open

vruano approved these changes Dec 24, 2018

View reviewed changes

vruano assigned davidbenjamin and unassigned vruano Dec 24, 2018

davidbenjamin force-pushed the db_dijkstra branch 2 times, most recently from afd69ab to 5e74f30 Compare December 27, 2018 01:45

Simplified and sped KBestHaplotypeFinder by replacing recursion with …

2834e09

…Dijkstra's algorithm

davidbenjamin force-pushed the db_dijkstra branch from 5e74f30 to 2834e09 Compare December 27, 2018 01:49

davidbenjamin merged commit fef36e3 into master Dec 27, 2018

davidbenjamin deleted the db_dijkstra branch December 27, 2018 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462

Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462

davidbenjamin commented Nov 29, 2018

vruano Dec 13, 2018

davidbenjamin Dec 26, 2018

vruano Dec 13, 2018

davidbenjamin Dec 26, 2018

vruano commented Dec 24, 2018

codecov-io commented Dec 27, 2018

Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462

Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462

Conversation

davidbenjamin commented Nov 29, 2018

vruano Dec 13, 2018

Choose a reason for hiding this comment

davidbenjamin Dec 26, 2018

Choose a reason for hiding this comment

vruano Dec 13, 2018

Choose a reason for hiding this comment

davidbenjamin Dec 26, 2018

Choose a reason for hiding this comment

vruano commented Dec 24, 2018

codecov-io commented Dec 27, 2018

Codecov Report