subset bam to one cell per bam file #17

Jiayi-Zheng · 2022-10-19T11:19:59Z

Hello, I am trying to split my 10X output to one bam file per cell for some downstream analysis.

I guess one way to do it is by listing every cell barcode in a tsv file and run:
subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode1.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode1.bam subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode2.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode2.bam ......
Until I get my three thousand bam files for three thousand cells.

However, I wonder if there are faster ways of doing this?

Thank you!

The text was updated successfully, but these errors were encountered:

ghuls · 2024-01-30T15:02:06Z

You can use: https://github.com/aertslab/single_cell_toolkit/blob/master/subset_bam_per_cb.sh .
You need to have samtools and mawk installed.

# 
$ ./subset_bam_per_cb.sh
Usage:
  subset_bam_file_per_cb bam_file barcodes_file bam_output_prefix [chunk_size]

Arguments:
  bam_file:          BAM file to subset per provided cell barcode.
  barcodes_file:     File with cell barcodes to subset input BAM file.
  bam_output_prefix: Prefix used for output per CB BAM files.
  chunk_size:        Number of cell barcodes to process simultaniously.
                     If more cell barcodes are provided, the input BAM file
                     will be read multiple times. It is recommended to not
                     set this value too high as the same number of parallel
                     spawned samtools processes will be created for writing
                     the per CB BAM files.
                     Default: 1000.


$ mkdir /tmp/per_cb_bam

$ ./subset_bam_per_cb.sh possorted_genome_bam.bam cell_barcodes.txt /tmp/per_cb_bam/subset

==> took 8 minutes and 10 seconds to process a 3.9 GB BAM file and write 23043 per CB BAM files.

ghuls mentioned this issue Jan 30, 2024

Is there a way to run subset-bam faster #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subset bam to one cell per bam file #17

subset bam to one cell per bam file #17

Jiayi-Zheng commented Oct 19, 2022 •

edited

Loading

ghuls commented Jan 30, 2024

subset bam to one cell per bam file #17

subset bam to one cell per bam file #17

Comments

Jiayi-Zheng commented Oct 19, 2022 • edited Loading

ghuls commented Jan 30, 2024

Jiayi-Zheng commented Oct 19, 2022 •

edited

Loading