Greengenes2 Database Use with VSEARCH #12

Oceazh · 2024-11-22T03:24:22Z

Hello, the release of the Greengenes2 database has been incredibly beneficial to my work.
Recently, I’ve been attempting to use VSEARCH in combination with the Greengenes2 database to annotate some 16S V4 region sequences. Since VSEARCH requires sequence identifiers in the FASTA file to include annotation information during database construction, I utilized 2024.09.backbone.v4.fna.qza and 2024.09.backbone.tax.qza, exporting them to FASTA and TXT formats, respectively. However, I’ve encountered an issue where the feature IDs in the TXT file do not correspond to the sequence identifiers in the FASTA file, which has left me quite perplexed.
Can you give some suggestions? Thank you!
Example:

X80725_S000004313;tax=d:Bacteria,p:Proteobacteria,c:Gammaproteobacteria,o:Enterobacteri
ales,f:Enterobacteriaceae,g:Escherichia/Shigella,s:Escherichia_coli,t:str._K-12_substr._MG1655

wasade · 2024-11-22T18:58:16Z

Hi @Oceazh, thank you for reaching out and the kind words. I'm not sure I understand the issue? The IDs seem consistent in those two artifacts, and the broader data?

$ zgrep X80725 2024.09.taxonomy.id.tsv.gz
X80725	d__Bacteria; p__Pseudomonadota; c__Gammaproteobacteria; o__Enterobacterales_737866; f__Enterobacteriaceae_A_725029; g__Escherichia; s__Escherichia ruysiae	0.3
$ zgrep X80725 2024.09.seqs.fna.gz
>X80725
$ grep X80725 df7224e0-bc4e-43f8-a489-514bf5273bbf/data/dna-sequences.fasta
>X80725
$ grep X80725 b7c3e691-ea51-4547-94dd-f79f49e41a36/data/taxonomy.tsv 
X80725	d__Bacteria; p__Pseudomonadota; c__Gammaproteobacteria; o__Enterobacterales_737866; f__Enterobacteriaceae_A_725029; g__Escherichia; s__Escherichia ruysiae

Oceazh · 2024-11-23T03:19:22Z

Thank you for your reply！
I use the files: 2024.09.backbone.v4.fna.qza and 2024.09.backbone.tax.qza. The command to export: qiime tools export --input-path 2024.09.backbone.tax.qza --output-path table; qiime tools export --input-path 2024.09.backbone.v4.fna.qza --output-path table
I got two files: taxonomy.tsv and dna-sequences.fasta

taxonomy.txv:
Feature ID Taxon
MJ006-1-barcode39-umi49105bins-ubs-7 d__Bacteria; p__Bacillota_A_368345; c__Clostridia_258483; o__Lachnospirales; f__Lachnospiraceae; g__Eubacterium_G_180878; s__Eubacterium_G_180878 ventriosum

dna-sequences.fasta:

KJ398158.24385.25772
ATTGTAATAAAAGAGTTTGATCCTGGCTCAGAATGAACGTTAATGGTTAGCTTAATACATGCAAGTTGGATTAATTTTATTTTAAAAATTAATAGCGAACGGGTGAGTAAGATACAGAAAAAAACCTTAGAAAATTGTTTAATTCATGAAAAAATTTATTTTGTTCTAAGAAAAG ......

Did I use the wrong files? I used the files name with backbone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greengenes2 Database Use with VSEARCH #12

Greengenes2 Database Use with VSEARCH #12

Oceazh commented Nov 22, 2024

wasade commented Nov 22, 2024

Oceazh commented Nov 23, 2024 •

edited

Loading

Greengenes2 Database Use with VSEARCH #12

Greengenes2 Database Use with VSEARCH #12

Comments

Oceazh commented Nov 22, 2024

wasade commented Nov 22, 2024

Oceazh commented Nov 23, 2024 • edited Loading

Oceazh commented Nov 23, 2024 •

edited

Loading