Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

Open
cihanerkut opened this issue Nov 21, 2024 · 1 comment

Comments

@cihanerkut
Copy link

Description of the bug

This issue was discussed in SortMeRNA repository already (sortmerna/sortmerna#407).

STAR failed for one sample due to the sequence and quality lengths mismatching for a read. After TrimGalore I have this

@ST-K00265:389:HMJW3BBXY:1:2223:9039:32244 2:N:0:ATCCACTG+ACGCACCT
ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
+
<A-AFF<FJJFFF<AAFJJ<FAJFF<JFFF-JJJJJJJJJJJ7JJJJJFJJFJJJJJJJF<JJJF<JFJJJFAJJFFFFJFJJJJAAJJJJJJJJJJJFAA

which becomes this after SortMeRNA:

@ST-K00265:389:HMJW3BBXY:1:2223:9039:32244 2:N:0:ATCCACTG+ACGCACCT
ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
+
<A

I had to deactivate the SortMeRNA step to make it work.

Would it be possible to add a failsafe for FASTQ integrity after each step that generates a FASTQ file? If necessary, fix the FASTQ file on the fly? I suggest this as a general solution.

Command used and terminal output

Command used:

nextflow run nf-core/rnaseq \
  -r 3.17.0 \
  -profile dkfz \
  --input samples.csv \
  --outdir ${PWD} \
  --genome null \
  --fasta ${REFERENCE_GENOME} \
  --gtf ${REFERENCE_GTF} \
  --additional_fasta ${REFERENCE_PHIX} \
  --gencode \
  --seq_center DKFZ \
  --remove_ribo_rna \
  --save_merged_fastq \
  --save_reference \
  --save_trimmed \
  --save_align_intermeds \
  --save_unaligned \
  --save_non_ribo_reads \
  --igenomes_ignore

Terminal output:

STAR version: 2.7.11b   compiled: 2024-07-03T14:39:20+0000 :/opt/conda/conda-bld/star_1720017372352/work/source
  Nov 20 22:22:10 ..... started STAR run
  Nov 20 22:22:10 ..... loading genome
  Nov 20 22:24:17 ..... processing annotations GTF
  Nov 20 22:24:45 ..... inserting junctions into the genome indices
  Nov 20 22:25:59 ..... started 1st pass mapping

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred

  EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
  @ST-K00265:389:HMJW3BBXY:1:2223:9039:32244
  ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
  <A
  SOLUTION: fix your fastq file

  Nov 20 22:31:36 ...... FATAL ERROR, exiting

Relevant files

No response

System information

Nextflow version: 24.10.1
Hardware: HPC
Executer: lsf
Container engine: Singularity
OS: CentOS 7
Version of nf-core/rnaseq: 3.17.0

@cihanerkut cihanerkut added the bug Something isn't working label Nov 21, 2024
@pinin4fjords
Copy link
Member

So this sounds like a SortMeRNA bug, and a feature request for the pipeline.

We could actually do FQ lint after each stage - see #1453. I'll look into it.

@pinin4fjords pinin4fjords added feature-request and removed bug Something isn't working labels Nov 29, 2024
@pinin4fjords pinin4fjords changed the title Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants