-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funcotator: add 5'/3' flank support, with tests #5403
Conversation
@jonn-smith and @LeeTL1220 please review. I still plan to add a few more tests (in particular, for overlapping genes and for all-transcripts mode), but I will add these during review rather than holding this branch up any longer. |
Codecov Report
@@ Coverage Diff @@
## master #5403 +/- ##
===============================================
+ Coverage 87.027% 87.085% +0.058%
- Complexity 30440 30468 +28
===============================================
Files 1852 1853 +1
Lines 140936 141335 +399
Branches 15508 15517 +9
===============================================
+ Hits 122652 123081 +429
+ Misses 12645 12614 -31
- Partials 5639 5640 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments.
@@ -85,6 +85,20 @@ | |||
) | |||
public List<String> annotationOverrides = new ArrayList<>(); | |||
|
|||
@Argument( | |||
fullName = FuncotatorArgumentDefinitions.FIVE_PRIME_FLANK_SIZE_NAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the documentation for these two inputs to refer to transcript
rather than gene
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.setTumorSeqAllele2(altAllele.getBaseString()) | ||
.setGenomeChange(getGenomeChangeString(variant, altAllele)) | ||
.setAnnotationTranscript(transcript.getTranscriptId()) | ||
.setOtherTranscripts(gtfFeature.getTranscripts().stream().map(GencodeGtfTranscriptFeature::getTranscriptId).collect(Collectors.toList())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to set other transcripts here - it'll get populated in createGencodeFuncotationsByAllTranscripts
in the call to populateOtherTranscriptsMapForFuncotation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
final GencodeFuncotation gencodeFuncotation = (GencodeFuncotation)funcotations.get(0); | ||
Assert.assertEquals(gencodeFuncotation.getVariantClassification(), GencodeFuncotation.VariantClassification.IGR); | ||
Assert.assertNull(gencodeFuncotation.getHugoSymbol()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in person it would be beneficial to make sure testCreateFuncotations
is run with a non-zero flanking size for 5' and 3'.
See line 1475
(Putting this here because I can't put it lower in github.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned on both 5' and 3' flanks for testCreateFuncotations(), and all test cases are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droazen Shockingly, I have no real comments other than what I told you offline (please make a test with multiple genes overlapping the variant).
9e6d89f
to
caf227d
Compare
Unfortunately I don't have time to implement the multi-gene test before I catch my flight, so I've spun it off into a separate issue to be addressed in a future PR: #5417 |
@jonn-smith All comments (except for #5417) addressed, and the branch has been rebased onto your latest changes. If you give me a final thumbs up, I'll merge once tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
When tests pass, feel free to merge.
Added support for annotating 5'/3' flanks via new FIVE_PRIME_FLANK and THREE_PRIME_FLANK funcotations.
Added --five-prime-flank-size and --three-prime-flank-size arguments to control the size of each flanking region.
Refactored datasource classes to allow for padded/custom queries to make this feature possible.
We now emit IGR funcotations in more cases (in particular, when a gene has no basic transcripts, and when the basic transcripts do not fully span a gene and the flank size is small).
Added comprehensive unit tests, and updated integration test data.
Resolves #4771