Skip to content

Commit

Permalink
Fix an instance of non-determinism in HaplotypeCaller (#6104)
Browse files Browse the repository at this point in the history
* Fixing a non-deterministic point in HaplotypeCaller's KBestHaplotypeFinder
 * It uses a priority queue to compare scores, if there are ties the tie breaking is arbitrary and seems to be different depending on circumstances of the run.
 * For some as of yet unknown reason reading from a gs:// path vs a local path can cause this to be triggered.
 * Adding a tie breaker which uses the entirety of the bases in the Path in cases where the score is tied, this is unique per path.
 * Also pointlessly lookup a target edge and throw it away.


Co-authored-by: jamesemery <[email protected]>
Co-authored-by: Louis Bergelson <[email protected]>
  • Loading branch information
lbergelson and jamesemery authored Aug 27, 2019
1 parent 9e8c93a commit 8f4efec
Showing 1 changed file with 5 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.broadinstitute.hellbender.tools.walkers.haplotypecaller.graphs;

import org.apache.commons.lang3.mutable.MutableInt;
import org.broadinstitute.hellbender.utils.BaseUtils;
import org.broadinstitute.hellbender.utils.Utils;
import org.jgrapht.alg.CycleDetector;

Expand All @@ -14,6 +15,9 @@
*/
public final class KBestHaplotypeFinder {

public static final Comparator<KBestHaplotype> K_BEST_HAPLOTYPE_COMPARATOR = Comparator.comparingDouble(KBestHaplotype::score)
.reversed()
.thenComparing(KBestHaplotype::getBases, BaseUtils.BASES_COMPARATOR.reversed()); // This is an arbitrary deterministic tie breaker.
private final SeqGraph graph;
final Set<SeqVertex> sinks;
final Set<SeqVertex> sources;
Expand Down Expand Up @@ -66,7 +70,7 @@ public KBestHaplotypeFinder(final SeqGraph graph) {
*/
public List<KBestHaplotype> findBestHaplotypes(final int maxNumberOfHaplotypes) {
final List<KBestHaplotype> result = new ArrayList<>();
final PriorityQueue<KBestHaplotype> queue = new PriorityQueue<>(Comparator.comparingDouble(KBestHaplotype::score).reversed());
final PriorityQueue<KBestHaplotype> queue = new PriorityQueue<>(K_BEST_HAPLOTYPE_COMPARATOR);
sources.forEach(source -> queue.add(new KBestHaplotype(source, graph)));

final Map<SeqVertex, MutableInt> vertexCounts = graph.vertexSet().stream()
Expand All @@ -86,7 +90,6 @@ public List<KBestHaplotype> findBestHaplotypes(final int maxNumberOfHaplotypes)
}

for (final BaseEdge edge : outgoingEdges) {
final SeqVertex targetVertex = graph.getEdgeTarget(edge);
queue.add(new KBestHaplotype(pathToExtend, edge, totalOutgoingMultiplicity));
}
}
Expand Down

0 comments on commit 8f4efec

Please sign in to comment.