Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow for matching on INFO fields using annotate #2151

Closed
ejgardner-insmed opened this issue Apr 8, 2024 · 1 comment
Closed

Comments

@ejgardner-insmed
Copy link

Hello,

Currently, annotate only allows for matching additional fields with the '~' operator for the ID and POS columns. I was wondering if it was possible to allow for matching on additional INFO fields? As an example, I have an annotation that is transcript-specific. Thus a single variant sometimes has two scores, one for the 1st overlapping transcript, the second (or more) for the nth transcript (tsv format):

CHROM POS REF ALT SCORE ENST
chr1 10 A T 0.1 ENST1
chr1 10 A T 0.4 ENST2

and I have a variant that is annotated to intersect the 1st transcript (vcf format):

#CHROM POS ID REF ALT FILTER INFO
chr1 10 . A T . PASS ENST=ENST1

Thus, when running a command like (note the '~'):

bcftools annotate -o annotated.vcf -a score.tsv.gz -c 'CHROM,POS,REF,ALT,SCORE,~ENST' input.vcf

I would expect the annotation to be:

chr1 10 . A T . PASS ENST=ENST1;SCORE=0.1

I hope this makes sense!

@pd3 pd3 closed this as completed in e9bff3f Apr 11, 2024
@pd3
Copy link
Member

pd3 commented Apr 11, 2024

I just added the feature. It should be now possible to do

bcftools annotate -o annotated.vcf -a score.tsv.gz \
      -c CHROM,POS,REF,ALT,SCORE,ENST -i'ENST={ENST}' -k input.vcf

The option -k is required if all sites should be printed, even the ones that did not match the expression, and therefore were not modified.

The above command implicitly matches REF,ALT. If that's not desired, one can run as

bcftools annotate -o annotated.vcf -a score.tsv.gz \
       -c CHROM,POS,-,-,SCORE,ENST -i'ENST={ENST}' -k input.vcf

Please try it out

pd3 added a commit that referenced this issue Apr 19, 2024
For example, in the two cases below the field 'STR' from the -a file is required to match
the INFO/TAG in VCF. In the first example the alleles REF,ALT must match, in the second
example they are ignored. The option -k is required to output also records that were not
annotated:

    bcftools annotate -a ann.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf
    bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,-,SCORE,~STR     -i'TAG={STR}' -k in.vcf

Resolves #2151
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants