-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run GIAB comparisons #7237
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
continue | ||
|
||
if (len(sys.argv) > 2 and sys.argv[2] == "loose"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see this option in the readme. When should this be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great to me on a read-through!
# extract each of the samples | ||
INPUT_VCF=gvs.chr20.vcf.gz | ||
|
||
gatk SelectVariants -V ${INPUT_VCF} --sample-name SM-G947Y --select-type-to-exclude NO_VARIATION -O NA12878.gvs.chr20.vcf.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may be minor, but can we document explicitly what --select-type-to-exclude NO_VARIATION
does? i'm guessing it means only return sites at which this sample has a variant, i.e. exclude ref sites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
## script to add "AS_MAX_VQSLOD" to VCFs | ||
``` | ||
for sample in NA12878 SYNDIP BI_HG002 BI_HG003 UW_HG002 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow 😍 for this block
tabix warp_tieout_acmg_cohort_v1.chr20.vcf.gz | ||
|
||
INPUT_VCF=warp_tieout_acmg_cohort_v1.chr20.vcf.gz | ||
SOURCE=warp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be SOURCE="warp"
? and does INPUT_VCF assignment also need quotes? or is this one of those bash things that doesn't matter if there's no spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right -- only needed for quoting special characters, but I changed it just to be consistent
@@ -1,5 +1,5 @@ | |||
PROJECT="spec-ops-aou" | |||
DATASET="gvs_tieout_acmg_v1" | |||
DATASET="gvs_tieout_acmg_v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i KNEW we'd need a v2!!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always...
- bcftools | ||
- tabix | ||
- python 3.7+ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my immediate reaction was that I should pip install them, but that's clearly wrong.
I know I'm a noob with this project, but a little more context on the prereqs would be immensely helfpul and would have saved me a lot of time googling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add samtools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conda create --name gvs python=3.8
conda activate gvs
conda install -c bioconda samtools=1.9 --force-reinstall
conda install -c bioconda bcftools
conda install -c bioconda rtg-tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add GATK to prereqs -- should this be done through conda tho? isn't gatk technically part of this repository anyway?
|
||
## Obtain Truth sample VCFs | ||
|
||
First, create a full cohort extract (as described in README.md) using the desired filter_set_name. Assuming this is in a single gathered VCF of `gvs.vcf.gz` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be more like "once you have a full cohort extract you want to compare" ?
${BASE_CMD} -b truth/CHM.full.38.vcf.gz -e truth/CHM.gvs.evaluation.bed -c SYNDIP.${SOURCE}.chr20.maxas.vcf.gz -o syndip_${SOURCE}${SUFFIX} | ||
``` | ||
|
||
The do the same thing but use all records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"then" ?
Addresses
https://github.com/broadinstitute/dsp-spec-ops/issues/280
Analysis has been done and delivered, this is primarily documentation of how to do it in the future
If someone wants to test-drive the instructions, there is a GVS VCF at