Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greengenes2 Database Use with VSEARCH #12

Open
Oceazh opened this issue Nov 22, 2024 · 2 comments
Open

Greengenes2 Database Use with VSEARCH #12

Oceazh opened this issue Nov 22, 2024 · 2 comments

Comments

@Oceazh
Copy link

Oceazh commented Nov 22, 2024

Hello, the release of the Greengenes2 database has been incredibly beneficial to my work.
Recently, I’ve been attempting to use VSEARCH in combination with the Greengenes2 database to annotate some 16S V4 region sequences. Since VSEARCH requires sequence identifiers in the FASTA file to include annotation information during database construction, I utilized 2024.09.backbone.v4.fna.qza and 2024.09.backbone.tax.qza, exporting them to FASTA and TXT formats, respectively. However, I’ve encountered an issue where the feature IDs in the TXT file do not correspond to the sequence identifiers in the FASTA file, which has left me quite perplexed.
Can you give some suggestions? Thank you!
Example:

X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteri
ales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli,t:str._K-12_substr._MG1655

@wasade
Copy link
Member

wasade commented Nov 22, 2024

Hi @Oceazh, thank you for reaching out and the kind words. I'm not sure I understand the issue? The IDs seem consistent in those two artifacts, and the broader data?

$ zgrep X80725 2024.09.taxonomy.id.tsv.gz
X80725	d__Bacteria; p__Pseudomonadota; c__Gammaproteobacteria; o__Enterobacterales_737866; f__Enterobacteriaceae_A_725029; g__Escherichia; s__Escherichia ruysiae	0.3
$ zgrep X80725 2024.09.seqs.fna.gz
>X80725
$ grep X80725 df7224e0-bc4e-43f8-a489-514bf5273bbf/data/dna-sequences.fasta
>X80725
$ grep X80725 b7c3e691-ea51-4547-94dd-f79f49e41a36/data/taxonomy.tsv 
X80725	d__Bacteria; p__Pseudomonadota; c__Gammaproteobacteria; o__Enterobacterales_737866; f__Enterobacteriaceae_A_725029; g__Escherichia; s__Escherichia ruysiae

@Oceazh
Copy link
Author

Oceazh commented Nov 23, 2024

Thank you for your reply!
I use the files: 2024.09.backbone.v4.fna.qza and 2024.09.backbone.tax.qza. The command to export: qiime tools export --input-path 2024.09.backbone.tax.qza --output-path table; qiime tools export --input-path 2024.09.backbone.v4.fna.qza --output-path table
I got two files: taxonomy.tsv and dna-sequences.fasta

taxonomy.txv:
Feature ID Taxon
MJ006-1-barcode39-umi49105bins-ubs-7 d__Bacteria; p__Bacillota_A_368345; c__Clostridia_258483; o__Lachnospirales; f__Lachnospiraceae; g__Eubacterium_G_180878; s__Eubacterium_G_180878 ventriosum

dna-sequences.fasta:

KJ398158.24385.25772
ATTGTAATAAAAGAGTTTGATCCTGGCTCAGAATGAACGTTAATGGTTAGCTTAATACATGCAAGTTGGATTAATTTTATTTTAAAAATTAATAGCGAACGGGTGAGTAAGATACAGAAAAAAACCTTAGAAAATTGTTTAATTCATGAAAAAATTTATTTTGTTCTAAGAAAAG ......

Did I use the wrong files? I used the files name with backbone.
2024-11-23 112257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants