Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is MAESTRO compatible with 10X data derived from nuclear RNA? #98

Closed
Dazcam opened this issue Jan 7, 2021 · 8 comments
Closed

Is MAESTRO compatible with 10X data derived from nuclear RNA? #98

Dazcam opened this issue Jan 7, 2021 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@Dazcam
Copy link

Dazcam commented Jan 7, 2021

Hello,

I'm currently installing the MAESTRO prerequisites and, after reading the paper, I'd like to ask if MAESTRO is compatible with 10X data derived from nuclear RNA, particularly if I'm looking to integrate single-modal snRNA- and snATAC-seq data?

And more specifically, could the use of a pre-mRNA reference and GTF files for alignment, as opposed to standard reference/annotation files, impact a MAESTRO analysis at all?

Until now I have been using Cell Ranger 4 for my analysis which recommends using a pre-mRNA reference and GTF file for nuclear RNA. I had started creating STARsolo compatible versions of these files for my MAESTRO analysis and wondered if this is the best course of action, particularly as 10X have recently released v5 which includes a new function for dealing with intronic reads without the need of a pre-RNA reference, and STARsolo also provides a similar function.

Regardless, it would be useful to hear if you have any recommendations or points of interest that I should consider when running MAESTRO using single-nuclear data.

Many Thanks,

Darren

@crazyhottommy
Copy link
Collaborator

Hi,
MAESTRO uses STARsolo for scRNAseq quantification. You can
add --soloFeatures GeneFull for single-nuclei data after you initiate the Snakefile manually at https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scRNA/Snakefile#L48

In the future, we should expose that as a parameter in the config.yaml file.

Thanks!

@Dazcam
Copy link
Author

Dazcam commented Jan 11, 2021

Many thanks for responding. I will add that command to the Snakefile today and see if it runs to completion. The pipeline hit the skids after the scrna_rseqc_genecov rule. Although that rule completed without error the logs reported the following warning:

Cannot get coverage signal from 14510_PFC_RNAAligned.sortedByCoord.out.sample.bam ! Skip

	Sample	Skewness
@ 2021-01-09 00:14:17: Running R script ...

Likely a mismatch between the BED and BAM files. This caused the pipeline to choke during the scrna_rseqc_plot rule as the RNAGenebodyCoveragePlot could not be generated.

Error in `[.data.frame`(gene_cov, , 2) : undefined columns selected
Calls: RNAGenebodyCoveragePlot -> [ -> [.data.frame

I also had a buffer size issue. I assume this is due to my samples being sequenced extremely deeply?

EXITING because of fatal error: buffer size for SJ output is too small
Solution: increase input parameter --limitOutSJcollapsed

I managed to solve it by adding the following line in shell command of the scrna_map rule.

--limitOutSJcollapsed 5000000 

Source here. May be worth adding this somewhere in config or docs?

Are you planning on adding ssclusteval to the pipeline?

@Dazcam
Copy link
Author

Dazcam commented Jan 13, 2021

UPDATE: 13th Jan 2021

When running with the --soloFeatures GeneFull parameter the directory names of some of the output files are changed such that they do not match what is specified in the Snakefile.

Instead of: Result/STAR/%sSolo.out/Gene/raw/matrix.mtx

They are stored in Result/STAR/%sSolo.out/GeneFull/raw/matrix.mtx

I think this only affects the scrna-map and scrna_qc rules.

Error message:

MissingOutputException in line 21 of /scratch/c.c1477909/maestro_analysis/14510_PFC_RNAv2/Snakefile:
Job Missing files after 5 seconds:
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/matrix.mtx
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/features.tsv
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/barcodes.tsv
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 0 completed successfully, but some output files are missing. 0
 
Removing output files of failed job scrna_map since they might be corrupted:
Result/STAR/14510_PFC_RNAAligned.sortedByCoord.out.bam, Result/STAR/14510_PFC_RNAAligned.sortedByCoord.out.bam.bai
Shutting down, this might take some time.

I have modified the Snakefile and now running MAESTRO again.

@crazyhottommy
Copy link
Collaborator

Thanks for reporting, we will keep this in our mind and make it in our next release!

@crazyhottommy crazyhottommy added the enhancement New feature or request label Jan 13, 2021
@crazyhottommy crazyhottommy self-assigned this Jan 13, 2021
@crazyhottommy
Copy link
Collaborator

Hi, we just made a new release MAESTRO1.5.1 which supports single-nuclei data. Can you please give it a try?
Thanks!

@Dazcam
Copy link
Author

Dazcam commented Jul 27, 2021

Thanks for the update. Unfortunately I had to abandon using Maestro due to the issues I was having around the time I posted. I now have a well developed pipeline of my own for my single-nuclei data but will keep my eye on Maestro's development and may consider using in the future.

@crazyhottommy
Copy link
Collaborator

Thanks for the feedback!

@njohnso6
Copy link

njohnso6 commented Dec 7, 2022

I got the same error:
EXITING because of fatal error: buffer size for SJ output is too small
Solution: increase input parameter --limitOutSJcollapsed
When running the newest version 1.5.4 (only available on the macs3 fork) to run the multiome pipeline. I have yet to try the solution previously proposed. Will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants