From 720313dfefc5be78d979413abddeef5dd75b2164 Mon Sep 17 00:00:00 2001
From: Lucille Delisle <lucille.delisle@epfl.ch>
Date: Wed, 13 Nov 2024 08:38:18 +0100
Subject: [PATCH] update README CHANGELOG and version

---
 workflows/transcriptomics/rnaseq-pe/CHANGELOG.md | 11 +++++++----
 workflows/transcriptomics/rnaseq-pe/README.md    |  7 ++++---
 workflows/transcriptomics/rnaseq-pe/rnaseq-pe.ga |  2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/workflows/transcriptomics/rnaseq-pe/CHANGELOG.md b/workflows/transcriptomics/rnaseq-pe/CHANGELOG.md
index 15fbc6ad2..02c637982 100644
--- a/workflows/transcriptomics/rnaseq-pe/CHANGELOG.md
+++ b/workflows/transcriptomics/rnaseq-pe/CHANGELOG.md
@@ -1,11 +1,14 @@
 # Changelog
 
-## [0.10] 2024-09-23
+## [1.0] 2024-09-23
 
-### Manual update
+### Changes in workflows
+- Add an optional subworkflow with more QC: FastQC, Picard, Read distribution on genomic features, gene body coverage, reads per chromosomes.
+- Add featureCounts as an alternative way to generate count files
+- Use fastp instead of cutadapt which uses pair overlap and allows to have optional adapter sequences
+
+### Test dataset
 - Using a new subsampled Yeast test data from Zenodo record https://zenodo.org/records/13987631
-- Added a subworkflow with MultiQC on FastQC, Cutadapt, STAR, featureCounts and Picard reports
-- Added featureCounts as an alternative way to generate count files
 
 ## [0.9] 2024-09-23
 
diff --git a/workflows/transcriptomics/rnaseq-pe/README.md b/workflows/transcriptomics/rnaseq-pe/README.md
index 93489bf62..4ba99e13c 100644
--- a/workflows/transcriptomics/rnaseq-pe/README.md
+++ b/workflows/transcriptomics/rnaseq-pe/README.md
@@ -15,8 +15,8 @@ chrM	chrM_gene	exon	0	16299	.	-	.	gene_id "chrM_gene_minus"; transcript_id "chrM
 
 ## Inputs values
 
-- Forward and Reverse adapter: this depends on the library preparation. Usually classical Illumina RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter so it will not impact your results.
-- Generate QC reports: whether to generate an aggrigated MultiQC report from FastQC, Cutadapt, STAR, featureCounts and Picard.
+- Forward and Reverse adapter (optional): By default, fastp will try to overlap both reads and will only use these sequences if R1/R2 are found not overlapped. Their sequences depends on the library preparation. Usually classical Illumina RNA libraries is Truseq and ISML (relatively new Illumina library) is Nextera.
+- Generate additional QC reports: whether to compute additional QC: FastQC, Picard, Read distribution on genomic features, gene body coverage, reads per chromosomes.
 - Reference genome: this field will be adapted to the genomes available for STAR.
 - Strandedness: For stranded RNA, reverse means that the first read in a pair is complementary to the coding sequence, forward means that the first read in a pair is in the same orientation as the coding sequence. This will only count alignments that are compatible with your library preparation strategy. This is also used for the stranded coverage and for FPKM computation with cufflinks/StringTie.
 - Use featureCounts for generating count tables: Whether to use count tables from featureCounts instead of from STAR.
@@ -28,9 +28,10 @@ chrM	chrM_gene	exon	0	16299	.	-	.	gene_id "chrM_gene_minus"; transcript_id "chrM
 - The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp.
 - The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene and generate strand-specific normalized coverage (on uniquely mapped reads).
 - Optionally featureCounts is used to generate count files when this option enabled.
+- Optionally FastQC, Picard, read_distribution, geneBody_coverage, samtools idxstats, Picard are run to get additional QC.
 - A multiQC is run to have an overview of the QC. This can also be used to get the strandedness.
 - FPKM values for genes and transcripts are computed with cufflinks using correction for multi-mapped reads (this step is optionnal).
-- FPKM/TPM values for genes are computed with StringTie.
+- FPKM/TPM values for genes are computed with StringTie (this step is optional).
 - The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
 - Unstranded coverage is computed with bedtools and normalized to the number of million uniquely mapped reads.
 - The three coverage files are converted to bigwig.
diff --git a/workflows/transcriptomics/rnaseq-pe/rnaseq-pe.ga b/workflows/transcriptomics/rnaseq-pe/rnaseq-pe.ga
index 888ebedd3..625d54098 100644
--- a/workflows/transcriptomics/rnaseq-pe/rnaseq-pe.ga
+++ b/workflows/transcriptomics/rnaseq-pe/rnaseq-pe.ga
@@ -79,7 +79,7 @@
     ],
     "format-version": "0.1",
     "license": "MIT",
-    "release": "0.10",
+    "release": "1.0",
     "name": "RNA-seq for Paired-end fastqs",
     "report": {
         "markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n"