You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to split my 10X output to one bam file per cell for some downstream analysis.
I guess one way to do it is by listing every cell barcode in a tsv file and run: subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode1.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode1.bam subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode2.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode2.bam ......
Until I get my three thousand bam files for three thousand cells.
However, I wonder if there are faster ways of doing this?
Thank you!
The text was updated successfully, but these errors were encountered:
#
$ ./subset_bam_per_cb.sh
Usage:
subset_bam_file_per_cb bam_file barcodes_file bam_output_prefix [chunk_size]
Arguments:
bam_file: BAM file to subset per provided cell barcode.
barcodes_file: File with cell barcodes to subset input BAM file.
bam_output_prefix: Prefix used for output per CB BAM files.
chunk_size: Number of cell barcodes to process simultaniously.
If more cell barcodes are provided, the input BAM file
will be read multiple times. It is recommended to not
set this value too high as the same number of parallel
spawned samtools processes will be created for writing
the per CB BAM files.
Default: 1000.
$ mkdir /tmp/per_cb_bam
$ ./subset_bam_per_cb.sh possorted_genome_bam.bam cell_barcodes.txt /tmp/per_cb_bam/subset
==> took 8 minutes and 10 seconds to process a 3.9 GB BAM file and write 23043 per CB BAM files.
Hello, I am trying to split my 10X output to one bam file per cell for some downstream analysis.
I guess one way to do it is by listing every cell barcode in a tsv file and run:
subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode1.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode1.bam subset-bam -b /usersdata/user/GW15_Trachea/GW15-Trachea/outs/possorted_genome_bam.bam -c barcode2.tsv -o /usersdata/user/GW15_Trachea/GW15-Trachea/outs/barcode2.bam ......
Until I get my three thousand bam files for three thousand cells.
However, I wonder if there are faster ways of doing this?
Thank you!
The text was updated successfully, but these errors were encountered: