Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

cihanerkut · 2024-11-21T10:52:09Z

Description of the bug

This issue was discussed in SortMeRNA repository already (sortmerna/sortmerna#407).

STAR failed for one sample due to the sequence and quality lengths mismatching for a read. After TrimGalore I have this

@ST-K00265:389:HMJW3BBXY:1:2223:9039:32244 2:N:0:ATCCACTG+ACGCACCT
ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
+
<A-AFF<FJJFFF<AAFJJ<FAJFF<JFFF-JJJJJJJJJJJ7JJJJJFJJFJJJJJJJF<JJJF<JFJJJFAJJFFFFJFJJJJAAJJJJJJJJJJJFAA

which becomes this after SortMeRNA:

@ST-K00265:389:HMJW3BBXY:1:2223:9039:32244 2:N:0:ATCCACTG+ACGCACCT
ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
+
<A

I had to deactivate the SortMeRNA step to make it work.

Would it be possible to add a failsafe for FASTQ integrity after each step that generates a FASTQ file? If necessary, fix the FASTQ file on the fly? I suggest this as a general solution.

Command used and terminal output

Command used:

nextflow run nf-core/rnaseq \
  -r 3.17.0 \
  -profile dkfz \
  --input samples.csv \
  --outdir ${PWD} \
  --genome null \
  --fasta ${REFERENCE_GENOME} \
  --gtf ${REFERENCE_GTF} \
  --additional_fasta ${REFERENCE_PHIX} \
  --gencode \
  --seq_center DKFZ \
  --remove_ribo_rna \
  --save_merged_fastq \
  --save_reference \
  --save_trimmed \
  --save_align_intermeds \
  --save_unaligned \
  --save_non_ribo_reads \
  --igenomes_ignore

Terminal output:

STAR version: 2.7.11b   compiled: 2024-07-03T14:39:20+0000 :/opt/conda/conda-bld/star_1720017372352/work/source
  Nov 20 22:22:10 ..... started STAR run
  Nov 20 22:22:10 ..... loading genome
  Nov 20 22:24:17 ..... processing annotations GTF
  Nov 20 22:24:45 ..... inserting junctions into the genome indices
  Nov 20 22:25:59 ..... started 1st pass mapping

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred

  EXITING because of FATAL ERROR in reads input: quality string length is not equal to sequence length
  @ST-K00265:389:HMJW3BBXY:1:2223:9039:32244
  ATAAAGTTGAAGGCTACAAGAAGACCAAGGAAGCTGTTTTGCTCCTTAAGAAACTTAAAGCCTGGAATGATATCAAAAAGGTCTATGCCTCTCAGCGAATG
  <A
  SOLUTION: fix your fastq file

  Nov 20 22:31:36 ...... FATAL ERROR, exiting

Relevant files

No response

System information

Nextflow version: 24.10.1
Hardware: HPC
Executer: lsf
Container engine: Singularity
OS: CentOS 7
Version of nf-core/rnaseq: 3.17.0

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2024-11-29T16:04:29Z

So this sounds like a SortMeRNA bug, and a feature request for the pipeline.

We could actually do FQ lint after each stage - see #1453. I'll look into it.

pinin4fjords · 2024-12-03T20:47:06Z

Addressed in #1461

cihanerkut added the bug Something isn't working label Nov 21, 2024

pinin4fjords added feature-request and removed bug Something isn't working labels Nov 29, 2024

pinin4fjords changed the title ~~Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA~~ Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) Nov 29, 2024

pinin4fjords mentioned this issue Nov 29, 2024

fq lint module update: exit on failed validation nf-core/modules#7000

Closed

17 tasks

pinin4fjords closed this as completed Dec 3, 2024

pinin4fjords linked a pull request Dec 3, 2024 that will close this issue

Add FASTQ linting during preprocessing #1461

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

cihanerkut commented Nov 21, 2024

pinin4fjords commented Nov 29, 2024

pinin4fjords commented Dec 3, 2024

Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

Check FASTQ files after each preprocessing step (Inconsistent sequence and quality lengths in FASTQ files created by SortMeRNA) #1456

Comments

cihanerkut commented Nov 21, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

pinin4fjords commented Nov 29, 2024

pinin4fjords commented Dec 3, 2024