Converting counts table to biom or anndata format

This walkthrough goes through converting your counts table (with or without metadata) to anndata or biom format.

What you'll end up with at the end of this is either a .h5ad or .biom format file for anndata and biom, respecitvely.

It is assumed you've completed either the end-to-end metagenomics or the recovering viruses from metatranscriptomics and the read mapping/counts tables walkthroughs.

anndata is developed for single-cell genomics but I've been using it for my metagenomics workflow. I recommend trying it out.

Steps:

Provide just the counts table
Provide a counts table and sample metadata
Provide a counts table, sample metadata, and

Conda Environment: conda activate VEBA. Use this for intermediate scripts.

1. Let's convert to a Python pickle object without any metadata

convert_counts_table.py -i veba_output/counts/X_slcs.tsv.gz -f pickle -o veba_output/counts/X_slcs.no_metadata.pkl

Now in Python we can load it with Pandas

import pandas as pd
X_slcs = pd.read_pickle("veba_output/counts/X_slcs.no_metadata.pkl")

2. Let's convert to a biom object with sample metadata

You probably have sample metadata so use that instead. Make sure that all the sample identifiers in your counts table are accounted for in your metadata table.

Here we don't have any so I'm going to just get the reads manifest for a toy sample metadata table.

# Generate a sample metadata for example but use your own instead
# Fastq/
# Fastq/S1_1.fastq.gz
# Fastq/S1_2.fastq.gz
# Fastq/etc.

compile_reads_table.py -f Fastq/ -n [ID]_[DIRECTION].fastq.gz --header > veba_output/misc/reads_table.tsv

Convert to biom format using the sample metadata

# Convert table
convert_counts_table.py -i veba_output/counts/X_slcs.tsv.gz -f biom -o veba_output/counts/X_slcs.with_sample_metadata.biom --sample_metadata veba_output/misc/reads_table.tsv

Now in Python we can load it with biom

import biom
table = biom.load_table("veba_output/counts/X_slcs.with_sample_metadata.biom")
# 16 x 4 <class 'biom.table.Table'> with 59 nonzero entries (92% dense)

3. Let's convert to a anndata object with sample and feature metadata

We are going to use the sample metadata we generated in previous step. We are going to use the taxonomy classifications from the classify.py module for the feature metadata.

convert_counts_table.py -i veba_output/counts/X_slcs.tsv.gz -f anndata -o veba_output/counts/X_slcs.with_sample_and_feature.metadata.h5ad --sample_metadata veba_output/misc/reads_table.tsv --feature_metadata veba_output/classify/taxonomy_classifications.clusters.tsv

Now in Python we can load it with anndata

import anndata as ad
adata = ad.read_h5ad("veba_output/counts/X_slcs.with_sample_and_feature.metadata.h5ad")
#AnnData object with n_obs × n_vars = 4 × 16
#    obs: 'forward-absolute-filepath', 'reverse-absolute-filepath'
#    var: 'domain', 'consensus_classification', 'homogeneity', 'number_of_unique_classifications', #'number_of_genomes', 'genomes', 'classifications', 'weights', 'score', #'number_of_unique_classification', 'consensus_taxon_id'

Next steps:

Now it's time to analyze the data (recommend using compositional data analysis (CoDA) approach).

If you have a pickle format, then you probably know what to do already.

If you have a biom format look into the QIIME2 ecosystem or BIRDMAn for CoDA-aware bayesian differential abundance.

If you have an anndata format, look into the Scanpy ecosystem. Though Scanpy does not nativly support CoDA-aware methodologies, they are easy to adapt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

converting_counts_tables.md

converting_counts_tables.md

Converting counts table to biom or anndata format

Steps:

1. Let's convert to a Python pickle object without any metadata

2. Let's convert to a biom object with sample metadata

3. Let's convert to a anndata object with sample and feature metadata

Next steps:

Recommended reading for CoDA:

Files

converting_counts_tables.md

Latest commit

History

converting_counts_tables.md

File metadata and controls

Converting counts table to biom or anndata format

Steps:

1. Let's convert to a Python pickle object without any metadata

2. Let's convert to a biom object with sample metadata

3. Let's convert to a anndata object with sample and feature metadata

Next steps:

Recommended reading for CoDA: