-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export sites only vcf STEP 1-- 317 add AC, AN, AF to the final VCF #7279
Conversation
2a914f9
to
605fa54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the tests we are still running Gnarly (which outputs these attributes) so I'm not sure which output we are seeing... Can you disable gnarly for the integration test and make sure things are still passing?
@@ -33,6 +33,8 @@ | |||
import java.util.*; | |||
import java.util.stream.Collectors; | |||
|
|||
import static htsjdk.variant.variantcontext.VariantContextUtils.calculateChromosomeCounts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find static imports a bit hard to follow in the code, not sure what the GATK style is though (but I haven't seen it very often)
// add AC/AF/AD annotations | ||
Set<String> founderIds = new HashSet<>(); | ||
Map<String, Object> attributes = new HashMap<>(vc.getAttributes()); | ||
Map<String, Object> chromosomeCounts = calculateChromosomeCounts(vc, attributes, true, founderIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a form of this call that doesn't take the last parameter, might bel more clear
Set<String> founderIds = new HashSet<>(); | ||
Map<String, Object> attributes = new HashMap<>(vc.getAttributes()); | ||
Map<String, Object> chromosomeCounts = calculateChromosomeCounts(vc, attributes, true, founderIds); | ||
builder.putAttributes(chromosomeCounts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs on this are confusing (about removing stale attributes, why we pass things in and out, etc) so I'm not sure we need to call putAttribues here...
but the bigger question would be, why not just use the GATK annotation infrastructure?
8a50a8a
to
00996c3
Compare
So... AC, AN and AF annotations are back in the final VCF.
A few things to note:
These do not respect the FT tag---working with Louis and Laura to make that happen since Louis feels the change will have to happen in htsjdk itself (specifically in the getCalledChrCount method in VariantContext and its different versions)
This solution doesn't pass in any founderIds--likely I'm just missing where they should get pulled from? I assume there are people in AoU that are related? Or is that way outside the scope of any of this?
We may want to add a boolean to make the sites-only VCF require less cleanup from the user.
example is from manually running:
./gatk --java-options "-Xmx9g" ExtractCohort --mode GENOMES --ref-version 38 -R "/Users/aurora/projects/references/hg38/v0/Homo_sapiens_assembly38.fasta" -O "addannotations.vcf.gz" --local-sort-max-records-in-ram 10000000 --sample-table "spec-ops-aou.anvil_100_for_testing.sample_info" --project-id "spec-ops-aou" --cohort-extract-table "spec-ops-aou.anvil_100_for_testing.Rori_Test" -L chr20:1-1000000