-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified KBestHaplotypeFinder by replacing recursion with Dijkstra's algorithm #5462
Conversation
final List<KBestSubHaplotypeFinder> sourceFinders = new ArrayList<>(sources.size()); | ||
for (final SeqVertex source : sources) { | ||
sourceFinders.add(createVertexFinder(source)); | ||
public List<KBestHaplotype> findBestHaplotypes(final int maxNumberOfHaplotypes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok... perhaps you can mention that this is described in https://en.wikipedia.org/wiki/K_shortest_path_routing. I now wonder what is the advantages in O() vs the previous solution... though it would be roughly the same but perhaps I was mistaken... that article says that this alg is O(m + n log n + k)... The dinamamic programming one seems to be O(m + n x log x + k) where x is = the number of outgoing edges from a vertex.... ~ (m / n). I would argument hat x is normally quite low like 1-3 and so O(x log x) is effective O(1)... that said perhaps the constants are much larger in the previous solution and in practice this solution is faster also is clearly much simpler. I bet the contribution of this part to the overall HC cost is so low that its not worth the difference anyway so i think that regardless of the O(.) your solution stands just based on the merits of simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. Done.
|
||
final Path<SeqVertex,BaseEdge> refPath = bestPathFinder.get(0).path(); | ||
final Path<SeqVertex,BaseEdge> altPath = bestPathFinder.get(1).path(); | ||
final List<KBestHaplotype>bestPaths = new KBestHaplotypeFinder(graph,top,bot).findBestHaplotypes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing blank space just before bestPahts =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
No need to change anything really, go ahead and merge once conflicts are addressed and tests pass. |
afd69ab
to
5e74f30
Compare
…Dijkstra's algorithm
5e74f30
to
2834e09
Compare
Codecov Report
@@ Coverage Diff @@
## master #5462 +/- ##
===============================================
+ Coverage 87.075% 87.087% +0.012%
+ Complexity 31334 31225 -109
===============================================
Files 1921 1915 -6
Lines 144602 144079 -523
Branches 15951 15891 -60
===============================================
- Hits 125912 125474 -438
+ Misses 12896 12834 -62
+ Partials 5794 5771 -23
|
Closes #3561.
@vruano could you review this? Note that I ended up implementing Dijkstra's algorithm instead of using a library, but it's only a few lines of code. This PR does not affect the outputs of HC or M2 at all.
Also, @vruano, I recall your misgivings about the current haplotype enumeration (which this preserves):
Although this PR doesn't do that, it could easily be extended to do so just by running Dijkstra's algorithm until you have the amount of variation you want. That is, instead of terminating when the Dijkstra priority queue is empty or when we have discovered the maximum number of haplotypes, we could terminate based on some
Predicate<List<KBestHaplotype>>
on the list of haplotypes found so far. And it's deterministic since Dijkstra's algorithm is greedy.So basically, it's a nice refactoring for now but it also sets up some worthwhile extensions if we want.