Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARNING: 10X Multiome (GEX + ARC) should not use bustools 10xv3 internal barcode whitelist #284

Open
davemcg opened this issue Jan 30, 2025 · 4 comments

Comments

@davemcg
Copy link

davemcg commented Jan 30, 2025

tldr: If you want to quantify a 10x Multiome experiment (just the Gene / GEX part) you MUST provide the proper onlist to kb count.

This is not a bug per-se, but rather an oversight for an edge case. The 10X Multiome GEX+ATAC chemistry (at least for the GEX part) uses the same barcode setup (16 + 12) as 10xv3.

BUT

If you run the kb count workflow with -x 10xv3 you may (at least in my case) return far too few cells. I was very miffed until I realized that 10x uses a different barcode whitelist for the multiome compared to the 10xv3.

If you give kb count the 10x multiome barcode whitelist as -w gex_737K-arc-v1.txt you get the proper result.

Versions

I noticed this error on 0.28.2 and confirmed it on 0.29.1 (though I did use a 0.28.2 idx...which I don't think matters)

Retrieve data

# get a gene 10x multiome fastq pair
fasterq-dump SRR29226057; pigz -p 8 SRR29226057*
wget https://teichlab.github.io/scg_lib_structs/data/10X-Genomics/gex_737K-arc-v1.txt.gz
gunzip gex_737K-arc-v1.txt.gz

Run kb count the "default" way

This gives you about 679 cells after sc.pp.filter_cells(adata, min_genes = 300)

# not the exact command, but no one wants to see the full paths
 kb count  --workflow nac   --sum total  -g  t2g.txt -t 12  -x 10xv3  -i index.idx  -c1 t2c.cdna.txt -c2 t2c.unprocessed.txt                        -o SRR29226057_kb_provided_whitelist   --h5ad  SRR29226057_1.fastq.gz SRR29226057_2.fastq.gz

# now in python
import scanpy as sc
adata = sc.read_h5ad('SRR29226057_kb_provided_whitelist/counts_unfiltered/adata.h5ad')
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)
sc.pp.filter_cells(adata, min_genes=300)
adata 

Give kb count the correct onlist

Same, but adding the -w flag -> now you get 12582 cells

kb count  -w gex_737K-arc-v1.txt --workflow nac   --sum total  -g  t2g.txt -t 12  -x 10xv3  -i index.idx  -c1 t2c.cdna.txt -c2 t2c.unprocessed.txt   -o SRR29226057_arc   --h5ad  SRR29226057_1.fastq.gz SRR29226057_2.fastq.gz 

# python
import scanpy as sc
adata = sc.read_h5ad('SRR29226057_arc/counts_unfiltered/adata.h5ad')
sc.pp.calculate_qc_metrics(adata, percent_top=None, log1p=False, inplace=True)
sc.pp.filter_cells(adata, min_genes=300)
adata
@davemcg davemcg changed the title WARNING: 10X Multiome (GEX + ARC) should not use bustools 10xv3 pre-built barcode whitelist WARNING: 10X Multiome (GEX + ARC) should not use bustools 10xv3 internal barcode whitelist Jan 30, 2025
@davemcg
Copy link
Author

davemcg commented Jan 30, 2025

Related - you may want to add "10xv4" chemistry as a "-x" option as the barcodes are different

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 30, 2025

Thanks! I think we need a new technology string for that.

As for 10xv4, I believe that is already added on the latest version of kb-python.

@davemcg
Copy link
Author

davemcg commented Jan 30, 2025

I don't see 10xv4 on 0.29.1

I also don't see "10xv4" in this repo (though I don't know whether github only searches the main branch when you use the web gui).

Anyways, have fun adding a string that conveys "10x multiome v1 but only gex"

Image

@Yenaled
Copy link
Collaborator

Yenaled commented Jan 30, 2025

Ah you are correct; I put the 10xv4 into one of the dependencies but forgot to update the kb-python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants