-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BIOIN-1570] Fixed edge case in variant annotation #8810
[BIOIN-1570] Fixed edge case in variant annotation #8810
Conversation
…s close to the edge of the reference
I'Il excuse myself out. Sorry for my late reply. I am not very much fond of this code to provide proper review. James will be back from vacation soon I guess. |
Sounds good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any conception of how often this should happen? This requires indexing errors for indels that are close to the edge of the reference context? Should that ever be a problem?
@@ -250,7 +250,11 @@ protected void isHmerIndel(final VariantContext vc, final LocalContext localCont | |||
// get byte before and after | |||
final byte before = getReferenceNucleotide(localContext, vc.getStart() - 1); | |||
final byte[] after = getReferenceHmerPlus(localContext, vc.getEnd() + 1, MOTIF_SIZE); | |||
|
|||
if (after.length==0){ | |||
logger.warn("Failed to get hmer from reference, isHmerIndel and RightMotif annotations will not be calculated. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to log many many times for every variant. We have the concept of a OneShotLogger
in GATK that wraps the logging code and skips reporting if this error has already come up before. Wrap these messages in that at the very least so it doesn't inundate the user with millions of error lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. I actually used the logger on purpose (there is a oneshotlogger in this file already) because I thought that I would like the user to see that they are trying to call variants really close to the edge and not get this warning buried in the log. I don't expect to get millions messages like this (there are not that many edges of the contigs in a reasonable genome).
If this is convincing, let me know and I will revert this change.
logger.warn("Failed to get hmer from reference, isHmerIndel and RightMotif annotations will not be calculated. " + | ||
"Start index: " + vc.getEnd() + 1 + " Reference length: " + localContext.ref.getBases().length); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking inside of establishReadGroupFlowOrder() and it looks like it tries to disable annotation if it can't establish what the flow order is for the local context (which is explicitly the case for the user that brought up this issue)... Can you disentangle what is supposed to happen for this annotation class in the event that no-flow order can be established? The annotation requires a flow order in order to determine what kind of indel it is no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a method called isActualFlowOrderRequired in this abstract class that returns false, the annotations that the user enabled all inherit from that abstract class and do not require flow order. The only annotation that does require it is CycleSkipStatus
.
@jamesemery - thanks for the review. This is obviously very rare issue that should not happen on hg38 chromosomes at all (there are long stretches of Ns on the edges), but can probably happen if one has a genome with short contigs where they want to call variants, like happened in #8788 (there was a contig of length 333 or so). In any case, it is probably a bug worth fixing, no harm that I can envision. |
@jamesemery - could you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not fully understand what was being fixed in the previous review pass...
I still want the guard of one-shot loggers but i think you should update the messages to point out that it is likely a result of the variant context being too close to the edge of the reveference... i'm not sure its worth the trouble to do the ideal thing, which is only to warn once, per reference contig. Just update the messages to make it clear "motif not found, some info about it, probably you are too close to the reference Y annotation will not be computed at the edges of a contig" so its crystal clear what is going on. Otherwise this is good.
@@ -69,7 +69,25 @@ public Object[][] getTestData() { | |||
// ins hmer indel | |||
"TATCT C ATTGACCAA", "CA", | |||
"ins", "1", "2", "A", "ATCTC", "TTGAC", "0.3", "NA", "h-indel" | |||
}, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add to these comments a little more context for the nature of this error and what those are testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments
try { | ||
Utils.validIndex(index, bases.length); | ||
} catch (IllegalArgumentException e) { | ||
flowMissingOneShotLogger.warn("Failed to get hmer from reference. Start index: " + index + " Reference length: " + bases.length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also tied to being toward the end of the reference? You aren't asking the reference context for its actual length but the lenght of the bases view which is arbitrary and going to be deeply confusing to users. This error message should either try to give the actual reference name and index or give a clue as to why this could happen since that bases.length view is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, could not find an easy way to get the length of the reference, localContext.ref.getContigLength
is somehow hidden
@jamesemery - thanks for the useful comments, good catch about the reference length. Back to you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments look good
Fixes #8788. Crash in calculating some annotations when the variant is very near the edge of a contig