Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4 chemistry #250

Open
diazlab opened this issue Sep 25, 2024 · 5 comments
Open

v4 chemistry #250

diazlab opened this issue Sep 25, 2024 · 5 comments

Comments

@diazlab
Copy link

diazlab commented Sep 25, 2024

Hi,
I would like to run Cassiopia preprocessing on a 10X v4 chemistry library. Based on my understanding of your code, and v4 vs. v3 library structure, it seems that I could run cas.pp.convert_fastqs_to_unmapped_bam with chemistry='10xv3' to generate the needed bams. But then provide the v4 cell barcode whitelist in cas.pp.error_correct_cellbcs_to_whitelist . Otherwise, I could execute the pipeline as described in the tutorials without further modification. Is that right?

Also, I don't quite understand the difference in the code between the chemistry='10xv3' and '10xv2' invocations of convert_fastqs_to_unmapped_bam. I can't seem to find the instantiation of the ngs.chemistry object. Can you point me to that so I can implement code for v4 if necessary?

thanks

@tzeitim
Copy link

tzeitim commented Sep 26, 2024

Also, I don't quite understand the difference in the code between the chemistry='10xv3' and '10xv2' invocations of convert_fastqs_to_unmapped_bam. I can't seem to find the instantiation of the ngs.chemistry object. Can you point me to that so I can implement code for v4 if necessary?

https://github.com/Lioscro/ngs-tools/blob/aa3e864e59ae78467a331f671967c93d62a6e2ad/ngs_tools/chemistry/SingleCellChemistry.py#L125

@mattjones315
Copy link
Collaborator

Hi @diazlab ,

Thanks so much for using Cassiopeia and posting this issue!

The major difference between v2 and v3 chemistry, for the purpose of processing libraries in Cassiopeia, is the extension of the UMI sequence from 10nt to 12nt. It sounds like the v4 chemistry has identical R1 structure to v3 (judging from this resource from the ever helpful Teichmann Lab). So @diazlab, I believe you are correct that you can run Cassiopeia here using the '10xv3' chemistry setting, but passing in the v4 cellBC whitelist for cellBC error correction.

As @tzeitim pointed out, the ngs.chemistry object is implemented in a separate codebase and linked above. Thanks for linking that @tzeitim !

Please let me know how this works, and if there are any unanticipated issues you run into that I can help with.

Best,
Matt

@Zejian-Wang
Copy link

Dear @mattjones315 ,

I’m trying to use Cassiopeia with my files from a 10X v4 chemistry library. After sequencing, we have two subfolders with the following structure:

-MJZ001
--MJZ001_S36_L005_I1_001.fastq.gz
--MJZ001_S36_L005_I2_001.fastq.gz
--MJZ001_S36_L005_R1_001.fastq.gz
--MJZ001_S36_L005_R2_001.fastq.gz
-MJZ001F
--MJZ001F_S1_L005_I1_001.fastq.gz
--MJZ001F_S1_L005_R1_001.fastq.gz
--MJZ001F_S1_L005_R2_001.fastq.gz

My question is, which files should I use as input for the cas.pp.convert_fastqs_to_unmapped_bam() function, since the '10xv3' mode only supports two fastq files as input? Does this suggest I need to run Cellranger first and then convert the BAM file into two fastq files containing **R1 and **R2? Or should I just input MJZ001_S36_L005_R1_001.fastq.gz and MJZ001_S36_L005_R2_001.fastq.gz directly?

Thank you for your help!

@mattjones315
Copy link
Collaborator

@Zejian-Wang,

Apologies for the delay and thanks for using Cassiopeia. It's unclear to me what these files are -- are these all lineage-tracing amplicon libraries? What exactly is the difference between MJZ001 and MJZ001F? From my understanding, v4 chemistry should behave similarly to v3 chemistry in terms of read structure, so you should proceed as instructed above in my previous reply.

Hope that helps, and happy to answer any other questions you might have.

Best,
Matt

@Zejian-Wang
Copy link

Hi Matt,

Thanks for your reply! After discussing with my collaborator, it seems that the MJZ001F/ folder contains the target seq data for the barcoding region, while MJZ001/ is the standard scRNA-seq data. I ran Cassiopeia on the MJZ001F folder’s FASTQ files and successfully obtained some results, but, as expected, MJZ001/ alone didn’t yield sufficient counts.

Thank you for your assistance and for providing such useful tools!

Best,

Zejian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants