-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGE settings for Reblocking and JointCalling pipelines #1076
Conversation
Remember to squash merge! |
2 similar comments
Remember to squash merge! |
Remember to squash merge! |
@jessicaway @samuelklee This is ready for review. As discussed in the meeting this morning the two remaining issues are:
I don't think either of these should stop an initial review though. Thanks! |
@ekiernan This PR will need some documentation updates for the BGE pipeline. I also submitted a request through the portal. Happy to work with you on this. |
Great, thanks @meganshand! Should be able to get an initial review back to you in the next couple days. Would be great to also get broadinstitute/gatk#8131 in before the next GATK release. Let's chat about whether that necessitates any changes here---I don't think it should, but that PR has been on the back burner for a bit, so I'll have to double check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @meganshand, just a few minor comments/questions from me!
|
||
if (run_vets) { | ||
String allele_specific_extra_args = if allele_specific_annotations then " --use-allele-specific-annotations " else "" | ||
String resource_args = " --resource:hapmap,training=true,calibration=true " + hapmap_resource_vcf + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to expose this and demote the default resources from required arguments at some point, but this is fine for now.
if (num_gvcfs > snps_variant_recalibration_threshold) { | ||
call Tasks.SNPsVariantRecalibratorCreateModel { | ||
|
||
if (run_vets) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we'd have separate VQSR and VETS subworkflows, and the latter would be a direct import of the JointVcfFiltering WDL in the GATK repo. Either we'll get there in the future or we'll remove VQSR, let's see which happens first!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I can take a stab at pulling out VQSR into it's own subworkflow if that's helpful, but it sounds like now might not be the right time if we're close to deprecating VQSR completely.
@@ -1,3 +1,9 @@ | |||
# 2.1.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to update some of these changelogs now that #1081 is in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did this to myself :(
I'll fix this when I have to resolve conflicts.
"JointGenotyping.one_thousand_genomes_resource_vcf": "gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz", | ||
"JointGenotyping.one_thousand_genomes_resource_vcf_index": "gs://gcp-public-data--broad-references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi", | ||
|
||
"JointGenotyping.snp_recalibration_annotation_values": ["QD", "MQRankSum", "ReadPosRankSum", "FS", "MQ", "SOR", "DP"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these not allele-specific annotations, when use_allele_specific_annotations = true
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo good catch, I think dragen might not be able to generate allele specific annotations, or at the very least we're not currently set up to do that for BGE so we might have to run without allele specific annotations for now. I changed use_allele_specific_annotations
to false for now, but this will also be a non-issue when we move to the new version of GATK/VETS (based on your other PR)
"JointGenotyping.callset_name": "bge_joint_genotyping_plumbing", | ||
"JointGenotyping.run_vets": true, | ||
|
||
"JointGenotyping.unpadded_intervals_file": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were unpadded_intervals_file
, eval_interval_list
, and targets_interval_list
intentionally left empty for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to work with @jessicaway on copying the input files (the Twist targets interval list and the sample files) into the test bucket get these two new tests to run.
tasks/broad/JointGenotypingTasks.wdl
Outdated
@@ -273,6 +273,7 @@ task HardFilterAndMakeSitesOnlyVcf { | |||
|
|||
String variant_filtered_vcf_filename | |||
String sites_only_vcf_filename | |||
File? targets_interval_list # filters out all variants outside of the targets interval list for targetted sequencing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
targetted
-> targeted
.
I'm also unclear about whether such changes to tasks need to be reflected in a changelog somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put in a line in the JointGenotyping and UltimaJointGenotyping changelogs (which I think are the only pipelines that uses this task):
Added option to hard filter sites outside of provided interval list
Let me know if there's a clearer way to put that.
reblocking test docker fixing TDR index issue in reblock fix the rest of the bge changes adding bge inputs fixes comment renaming inputs fix
Remember to squash merge! |
Remember to squash merge! |
The failing smart tests here are due to a change in the VCF header when updating to GATK 4.5.0.0:
This change seems harmless to me, but I'm not sure who needs to sign off on this change. The PR with the rest of the GATK updates to 4.5.0.0 is still in progress, but should be coming soon. |
Remember to squash merge! |
Remember to squash merge! |
Failing Scientific Tests that can be updated:
|
Remember to squash merge! |
Remember to squash merge! |
after merging in the latest develop, we re-ran JG, variant calling, and reblocking against Megan's truth (which was previously validated): https://gotc-jenkins.dsp-techops.broadinstitute.org/job/warp-workflow-tests/65222/console |
Description
This is a draft of how we'd like to run joint calling on BGE with a few changes to the current pipeline:
--format-annotations-to-remove PRI
--add-site-filters-to-genotype
Making two paths in the JointCalling pipeline for two different filtering strategies was a bit messy, so I'm happy to hear if there are any recommendations on how to clean that up. I also tried to run the tests that already exist for these workflows, but ran into errors that seemed unrelated.
@jessicaway For now I left out any changes to the ByChromosome workflows and we can cross that bridge when we come to it.
Checklist
If you can answer "yes" to the following items, please add a checkmark next to the appropriate checklist item(s) and notify our WARP documentation team by tagging either @ekiernan or @kayleemathews in a comment on this PR.