-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double entries from merge command #16
Comments
Hello there! 1 24257001 CNVnator_del_34 N These two will not be merged, because their END position differs too much. You may avoid a few of these cases by changing the --overlap and --bnd_distance parameters, but there will always be a risk of getting a few of these. I just made a fix to SVDB, if your vcf files have "contig entries" in the vcf-header, the chromosomes should be sorted correctly! Thanks! Happy to hear that the tool works for you! It would be great to know if you find any bugs or ideas of new features! |
Bump on this one due to a recent MIP issue, although the use case was somewhat unexpected. I could perhaps phrase this as a question: should the docs perhaps state that SVDB merge require input to be split (and normalised)? I have a feeling it won't really handle many-way merging of multi-allelic input entries confidently? |
Hi, @J35P312 We also experience a similar problem We observed that overlapping variants are not merged when trying to merge sequence resolved and non-sequence resolved variants from Truvari. I think that this is what happens
Intuitively, I would change |
Woopsiedaisy! could you send me the variants you are trying to merge? //JEsper |
Thank you for looking into this! I have tried to make a minimal test set consisting of
I have been able to get the expected overlap by either
|
Hello! posB=posA+abs(int(description["SVLEN"])) and now the variants merge nicely! 1 95690510 MantaDEL:manta_95690510|DEL00000389:delly_95690510 AAAATAAATATAATTGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAGATCGAGACCATCCCGGCTAAAAAACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGCGTAGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAATAAAT A994 PASS SVTYPE=DEL;SVLEN=-319;END=95690830;VARID=DEL00000389:delly_95690510;set=Intersection GT 1/1 Feel free to give this a try! |
Thank you! |
Hi again, It seems that also INS is affected by a similar issue
Attached are tree INSes that only merge if I use the suggested fix. I have tested this
|
Hello! these changes are found in the branch "insertion_distance". Feel free to try it! To me this solves the issue of END=POS+abs(SVLEN), insertions not being merged properly etc in a good enough way. according to these rules, all the insertions in the test data get merged. #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 HG003 HG004 Do you agree with these rules? Feel free to comment! I will do some tests on mobile element insertions etc, probably I will merge the insertion_distance by the end of the week! |
Thanks! |
Hi there,
I just tested this tool for combining the outputs of Delly, Manta, and TIDDIT. It seems to work great!
I'm not necessarily sure this is an issue and you probably already know -
Here is a command I ran to merge three VCFs for one sample :
svdb --merge --pass_only --no_var --vcf ${dellysv}:delly ${mantasv}:manta ${tidditsv}:tiddit --priority delly,manta,tiddit
Note that I added
--pass_only
and--no_var
after I noticed that when two softwares identify different variants at the same position two lines are output. I'm not sure that I noticed any difference when these flags were there vs when they weren't. From what I can tell, duplicate position lines break downstream processing with bcftools... errors on first occurrence of duplicate record. I haven't tested what would happen if I ranbcftools norm -m +any
on thesvdb --merge
output prior to additional processing, because that is typically used for multi-allelic sites from what I understand.I got around the issue with a "take first line if the next line has the same CHROM POS" awk command, but wanted to let you know. Maybe, the
--priority
flag can have that built in if the user requests it?Another minor thing I noticed is that my chromosome order got rearranged after the
--merge
command - From I,II,III,IV,V,X,MtDNA to I,II,II,IV,MtDNA,V,X, I am assuming this happens because the output sorts the chromosomes by alpha-numeric?Apart from that, this tool seems great! I'll let you know if I notice anything else.
Best,
S
The text was updated successfully, but these errors were encountered: