Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset bam to one cell per bam file #17

Open
Jiayi-Zheng opened this issue Oct 19, 2022 · 1 comment
Open

subset bam to one cell per bam file #17

Jiayi-Zheng opened this issue Oct 19, 2022 · 1 comment

Comments

@Jiayi-Zheng
Copy link

Jiayi-Zheng commented Oct 19, 2022

Hello, I am trying to split my 10X output to one bam file per cell for some downstream analysis.

I guess one way to do it is by listing every cell barcode in a tsv file and run:
subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode1.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode1.bam subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode2.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode2.bam ......
Until I get my three thousand bam files for three thousand cells.

However, I wonder if there are faster ways of doing this?

Thank you!

@ghuls
Copy link

ghuls commented Jan 30, 2024

You can use: https://github.com/aertslab/single_cell_toolkit/blob/master/subset_bam_per_cb.sh .
You need to have samtools and mawk installed.

# 
$ ./subset_bam_per_cb.sh
Usage:
  subset_bam_file_per_cb bam_file barcodes_file bam_output_prefix [chunk_size]

Arguments:
  bam_file:          BAM file to subset per provided cell barcode.
  barcodes_file:     File with cell barcodes to subset input BAM file.
  bam_output_prefix: Prefix used for output per CB BAM files.
  chunk_size:        Number of cell barcodes to process simultaniously.
                     If more cell barcodes are provided, the input BAM file
                     will be read multiple times. It is recommended to not
                     set this value too high as the same number of parallel
                     spawned samtools processes will be created for writing
                     the per CB BAM files.
                     Default: 1000.


$ mkdir /tmp/per_cb_bam

$ ./subset_bam_per_cb.sh possorted_genome_bam.bam cell_barcodes.txt /tmp/per_cb_bam/subset

==> took 8 minutes and 10 seconds to process a 3.9 GB BAM file and write 23043 per CB BAM files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants