Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NC_045512 shows up in MAT when extracting from a WNV auspice tree #378

Open
jcw349 opened this issue Jul 1, 2024 · 5 comments
Open

NC_045512 shows up in MAT when extracting from a WNV auspice tree #378

jcw349 opened this issue Jul 1, 2024 · 5 comments

Comments

@jcw349
Copy link

jcw349 commented Jul 1, 2024

Hi,

I am trying to convert a West Nile virus auspice tree to a MAT. The matUtils extract keeps including "NC_045512" in the output file, but it's not in my input auspice file.

matUtils extract -i auspice/WNV-global.json -o usher/WNV-global.pb

I tried specifying the reference files and metadata as well, but it's still creating the same MAT.
matUtils extract -i auspice/WNV-global.json -g config/reference.gtf -f config/reference.fasta -o usher/WNV-global.pb

reference: NC_009942

VCF from the output MAT:
matUtils extract -i usher/WNV-global.pb -g config/reference.gtf -f config/reference.fasta -v usher/mutations.vcf

First 5 rows and 12 columns of the vcf:

##fileformat=VCFv4.2
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AY765264        KJ831223        FJ159130
NC_045512       1       A1C,A1G,A1T     A       C,G,T   .       .       AC=16,67,3;AN=4560      GT      0       0       0
NC_045512       2       G2A,G2C,G2T     G       A,C,T   .       .       AC=12,4,20;AN=4560      GT      0       0       0
NC_045512       3       T3A,T3C,T3G     T       A,C,G   .       .       AC=89,25,4;AN=4560      GT      1       0       0

Not sure what I'm doing wrong.

Thank you,
Jade W

@AngieHinrichs
Copy link
Contributor

Sorry about that @jcw349! When writing VCF, "NC_045512" is hardcoded! We should be able to do better than that.

Hopefully something like this will work for you in the meantime?:

sed -e 's/^NC_045512/NC_009942/;' usher/mutations.vcf > usher/mutations.renamed.vcf

@jcw349
Copy link
Author

jcw349 commented Jul 2, 2024

Sorry about that @jcw349! When writing VCF, "NC_045512" is hardcoded! We should be able to do better than that.

Hopefully something like this will work for you in the meantime?:

sed -e 's/^NC_045512/NC_009942/;' usher/mutations.vcf > usher/mutations.renamed.vcf

No worries!! Thank you for looking into this and sharing a solution to fix the VCF.

The matUtil extract -i <json_file> -o <mat.pb> also seems to be labeling non-covid trees the same reference, NC_045512. It was in the output MAT file too. Not sure if that'll have a big impact on doing other things with the file?

For now I remade the MAT.pb using UShER, which did end up using NC_009942, but it didn't keep all of the same things I had initially put into the nextstrain tree like filters and colors, etc.

@AngieHinrichs
Copy link
Contributor

Yes, the MAT protobuf contains only the mutation annotated tree, not the other many things that can be layered onto Nextstrain's Auspice JSON format. The matUtils extract options for adding in a reference, metadata etc. are only used when generating JSON output AFAIK.

@jcw349
Copy link
Author

jcw349 commented Jul 3, 2024

Sorry, what I mean is, when I tried to use matUtil extract to convert the Auspice json to MAT protobuf, for some reason the MAT.pb ended up having the NC_045512 in it too, even though that's not in the input json.

image

@AngieHinrichs
Copy link
Contributor

Yes, NC_045512 is hardcoded when importing JSON, sorry. Does auspice/WNV-global.json contain "NC_009942" anywhere in it? (If you're able to share auspice/WNV-global.json privately then I can take a look at how matUtils might figure out what the reference name should be.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants