-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bcftools query 1.18 prints new lines when looping through samples #1969
Comments
This is an intentional change Lines 89 to 91 in f6a4ae6
although it may need some fine tuning as I can see. Why to force the newline in the first place? In offline polls there was not a single user who said the explicit newline was a good thing, therefore I decided to insert it automatically, as I could not think of a use case where it would be beneficial to print the entire VCF as a single line. If you are aware of a use case where this is desired and preferred, I am happy to add a command line option to override the default. As for the behavior in 1.18, whenever newline is given explicitly, the program will not interfere with user formatting. But when the newline is not given, it will be inserted to avoid a very common error. Here it can be somewhat unpredictable, as I just tested on this case
but
So the program makes a decision on its own, whether to place the newline per-sample or per-site, depending on the context - i.e is the expression site-oriented or is it sample-oriented? This may not be the best behavior, I am open to a discussion. |
I must admit that having to write explicitly '\n' in the format was a slightly uncomfortable surprise for me years ago when I started using bcftools query, but I also must say that I've been taking advantage of it (well, from its absence when needed) for years when printing just a few genotypes from a VCF file. I understand the rationale behind such decission, but as a programmer I wouldn't recommend letting the program decide whether to write a '\n' character or not depending on the site or sample orientation of the query. Instead, IMHO, I would definitely go for a program option. In fact, my suggestion would be to leave the default option to be to write '\n' as this is the only way the code would be retrocompatible with all previous bcftools versions (so older codes wouldn't need to be rewritten) and add an option to write new lines like e.g. 'perl -l' does. If, for any reason, you still want to move to write '\n' by default, then I would definitely consider adding an option to allow forcing the older behaviour if needed. Right now, in the original example where I need to print 3 genotypes only in a row, the only thing I can to right now is to pipe bcftools query's output to tr, as there's no way now to get the previous output directly from bcftools query:
Thank you anyway for opening this issue for discussion. |
Backward compatibility was a big concern. However, the decision was to change the behavior anyway as it is extremely unlikely anyone is using expressions without a newline in automated pipelines, it's just too impractical for vast majority of VCFs out there. There are two counter arguments against the However, I accept the proposal to add a backward compatibility option. Also I agree that whenever the program does an automatic insertion of the newline, the behavior must be very clear and understandable, which currently it is not. Therefore I propose new default behavior:
|
This sounds perfectly reasonable to me. Backwards compatibility would then only require adding '-N' if needed. Again, thank you for the discussion and for the great work. |
This is now modified in c7cbe0b, as discussed |
bcftools query 1.18 prints new lines when looping through samples.
bcftools query's previous versions:
bcftools query's previous versions:
bcftools query 1.18
The text was updated successfully, but these errors were encountered: