Using truvari to merge short variant calls with SVs #202
JosephLalli
started this conversation in
Ideas
Replies: 1 comment 2 replies
-
Hello, That isn't going to give you the results you're expecting. There might be ways to get truvari collapse to behave closer to what you're trying to accomplish, but even then you'll have unintended side effects. The best way to do this would be to make a new script. Truvari has a python package that can be leveraged for creating new tools (docs). So you could take advantage of that to make the coding easier. A rough sketch of what you'd want to build is below. Note that I haven't tested this so please use with caution. import sys
import truvari
import pysam
def variant_is_within_larger_variant(small, larges):
"""
Check if small variant is within a large variant's span
"""
for large in larges:
start, end = truvari.entry_boundaries(large)
if truvari.entry_within(small, start, end):
return True
return False
if __name__ == '__main__':
vcf_fn1 = sys.argv[1]
vcf_fn2 = sys.argv[2]
vcf1 = pysam.VariantFile(vcf_fn1)
vcf2 = pysam.VariantFile(vcf_fn2)
# If you have a bed file, use regions to subset the input vcfs
regions = truvari.build_region_tree(vcf1, vcf2, bed_filename)
# Only needed when there are potentially overlapping bed entries
truvari.merge_region_tree_overlaps(regions)
# Make iterators that will only return variants in regions of interest
# This also helps zip vcfs that have different chromosome orders
vcf1_iter = truvari.region_filter(vcf1, regions)
vcf2_iter = truvari.region_filter(vcf2, regions)
# Matchers control what variants are filtered from VCFs as well as thresholds for matching
# For this, you probably(?) won't need to worry about thresholds.
matcher = truvari.Matcher()
matcher.params.sizemin = 0
matcher.params.sizefilt = 0
matcher.params.passonly = True
# Make the chunker which will return a list of dictionaries
chunks = truvari.chunker(matcher, ('small', vcf1_iter), ('large', vcf2_iter))
# Make an output VCF
# Note that this assumes the two headers are compatible such that vcf1 and vcf2 entries
# can be written to the same file. If they aren't (e.g. vcf2 has an INFO header not in vcf1)
# you'll need to make a custom header
out = pysam.VariantFile("/dev/stdout", header=vcf1.header)
for chunk, m_id in chunks:
# At this point, a chunk holds all small and large variants within matcher.params.chunksize
# of one another
for small in chunk['small']:
# Return true if this is something that should be filtered
if not variant_is_within_larger_variant(small, chunk['large']):
out.write(small)
for large in chunk['large']:
out.write(large) |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there! I'm trying to use truvari collapse, but not as intended. While this code appears to be working, I wanted to check in and make sure that there aren't any glaring problems that I am missing.
I am trying to combine Pangenie calls with deepvariant calls from the same sample. The issue I am encountering is that sometimes, deepvariant will call several low-coverage SNPs and short indels in the middle of a Pangenie deletion call. Eg:
These variants are generally of low quality and low coverage. However, Google suggests retaining all variants, even low quality ones, to maximize sensitivity. Meanwhile, I have a pangenie deletion (let's say a 100bp deletion) that is a confident call, and is too large to be called by Deepvariant.
I'd like to identify Deepvariant calls that are inside pangenie calls, and remove them where they are contradictory.
To do so, I am running this code. Are there any unintended side effects of these settings that I should be aware of?
Thanks,
Joe
Beta Was this translation helpful? Give feedback.
All reactions