-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools concat --ligate drops variants when overlap between two VCFs is empty #1567
Comments
The |
The use case was a phasing pipeline where the overlapping windows for the phasing tasks are decided ahead of time, regardless of the list of markers passing quality control. What happened is that in one case the overlap between two consecutive windows turned out to be empty, due to sparsity of the array in a given region of the genome. I don't know how difficult it would be to make it possible to concatenate without dropping variants when two consecutive windows have no overlap reverting to the non-phase-aware concatenation method. I would not be worried that the phase would not be reliably ligated in that case. Even in the canonical case of a consistent window overlap, there is always the possibility that a given sample is homozygous across the overlap, so the user already has to accept that the ligation can fail in a subset of samples. As I personally see the primary use of |
OK, I just pushed a commit which changes the default behavior to throw an error when non-overlapping chunks or sites present in one chunk but absent in the other are encountered. To drop such sites and proceed, one can use the new Please let me know if you spot anything odd, handling arbitrary overlaps between arbitrary number of overlapping files can be tricky. |
@freeseek I just got this error running MoChA WDL. Should we be adding flag |
I have not looked at the
vcfconcat.c
code yet, but the following behavior seems puzzling.Generate the following VCFs:
All four variants are output when performing a simple concatenation:
One variant goes missing when the overlap between the first two VCFs is empty:
If the overlap between the first two VCFs is not empty, all variants are retained both with a simple concatenation:
And with a concatenation using phase ligation:
The text was updated successfully, but these errors were encountered: