Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update both rnaseq #211

Merged
merged 27 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3f234ae
Updating workflows/transcriptomics/rnaseq-pe from 0.4 to 0.5
Mar 17, 2023
20a4e0a
Updating workflows/transcriptomics/rnaseq-sr from 0.4 to 0.5
Mar 17, 2023
7840e21
Updating workflows/transcriptomics/rnaseq-pe from 0.5 to 0.6
Apr 17, 2023
272dae8
Updating workflows/transcriptomics/rnaseq-sr from 0.5 to 0.6
Apr 17, 2023
18798b3
Updating workflows/transcriptomics/rnaseq-pe from 0.6 to 0.7
Jun 17, 2023
10a4df4
Updating workflows/transcriptomics/rnaseq-sr from 0.6 to 0.7
Jun 17, 2023
9ab9fa4
update dockstore
lldelisle Aug 30, 2023
1a8ee07
synchronize version and changelog
lldelisle Aug 30, 2023
3a68d54
try to update workflow
lldelisle Aug 30, 2023
a88b3ee
Merge remote-tracking branch 'upstream/main' into workflows/transcrip…
lldelisle Sep 1, 2023
1cf96a6
update test for stringtie
lldelisle Sep 1, 2023
3062747
Merge branch 'workflows/transcriptomics/rnaseq-sr' of github.com:plan…
lldelisle Sep 1, 2023
35e0856
update workflow and tests
lldelisle Sep 4, 2023
bc0d9fc
update STAR, add subworkflow to compute both strands coverage
lldelisle Sep 11, 2023
85c0caf
update changelog and tests
lldelisle Sep 11, 2023
81ef815
Update README
lldelisle Sep 11, 2023
a114979
update workflow
lldelisle Sep 11, 2023
1ce395a
update README tests etc...
lldelisle Sep 11, 2023
f91afa6
update test results
lldelisle Sep 11, 2023
5141c08
use regex on tests
lldelisle Sep 12, 2023
21c2645
fix spelling
lldelisle Sep 12, 2023
3457a62
check spelling
lldelisle Sep 12, 2023
d12c5ed
Merge branch 'main' into workflows/transcriptomics/rnaseq-pe
lldelisle Sep 14, 2023
0e6c447
fix tests
lldelisle Sep 14, 2023
34eb854
Merge branch 'main' into workflows/transcriptomics/rnaseq-sr
lldelisle Sep 15, 2023
4b8a94c
Merge remote-tracking branch 'autoupdate/workflows/transcriptomics/rn…
lldelisle Sep 15, 2023
c159c3d
Language editing
lldelisle Sep 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions workflows/transcriptomics/rnaseq-pe/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Changelog

## [0.5] 2023-09-15

### Automatic update
- `toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy1` was updated to `toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.10b+galaxy4`

### Manual update
- Use STAR to compute normalized strand-specific coverage
- Add an option to use StringTie to compute FPKM
- Make cufflinks step optional

## [0.4.1] 2023-09-14
- add author in dockstore file
Expand Down
19 changes: 11 additions & 8 deletions workflows/transcriptomics/rnaseq-pe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,26 @@ chrM chrM_gene exon 0 16299 . - . gene_id "chrM_gene_minus"; transcript_id "chrM

## Inputs values

- adapter sequences: this depends on the library preparation. Usually classical RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter.
- adapter sequences: this depends on the library preparation. Usually classical Illumina RNA libraries are Truseq and ISML (relatively new Illumina library) is Nextera. If you don't know, use FastQC to determine if it is Truseq or Nextera. If the read length is relatively short (50bp), there is probably no adapter so it will not impact your results.
- reference_genome: this field will be adapted to the genomes available for STAR
- strandness: For stranded RNA, reverse means that the read is complementary to the coding sequence, forward means that the read is in the same orientation as the coding sequence. This will help you to get from STAR only the counts corresponding to your library preparation. This is also used for the stranded coverage and for FPKM computation with cufflinks.
- strandedness: For stranded RNA, reverse means that the first read in a pair is complementary to the coding sequence, forward means that the first read in a pair is in the same orientation as the coding sequence. This will only count alignments that are compatible with your library preparation strategy. This is also used for the stranded coverage and for FPKM computation with cufflinks/StringTie.
- cufflinks_FPKM: Whether you want to get FPKM with Cufflinks (pretty long)
- stringtie_FPKM: Whether you want to get FPKM/TPM etc... with StringTie.

## Processing

- The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp
- The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene.
- A multiQC is run to have an overview of the QC. This can also be used to get the strandness.
- FPKM values for reads and transcripts are computed with cufflinks using correction for multi-mapped reads.
- The workflow will remove adapters and low quality bases and filter out any read smaller than 15bp.
- The filtered reads are mapped with STAR with ENCODE parameters (for long RNA-seq but I use it for short also). STAR is also used to count reads per gene and generate strand-specific normalized coverage (on uniquely mapped reads).
- A multiQC is run to have an overview of the QC. This can also be used to get the strandedness.
- FPKM values for genes and transcripts are computed with cufflinks using correction for multi-mapped reads (this step is optionnal).
- FPKM/TPM values for genes are computed with StringTie.
- The BAM is filtered to keep only uniquely mapped reads (tag NH:i:1).
- Coverage unstranded, and each strand independently is computed with bedtools and normalized to the number of million uniquely mapped reads (in order to compute stranded coverage the BAM is modified so second mate in pairs matches orientation of the first mate in pairs).
- Unstranded coverage is computed with bedtools and normalized to the number of million uniquely mapped reads.
- The three coverage files are converted to bigwig.

### Warning

- The coverage stranded output depends on the strandness of the library:
- The coverage stranded output depends on the strandedness of the library:
- If you have an unstranded library, stranded coverages are useless
- If you have a forward stranded library, the label matches the orientation of the first read in pairs.
- If you have a reverse stranded library, the label matches the orientation of the second read in pairs.
36 changes: 15 additions & 21 deletions workflows/transcriptomics/rnaseq-pe/rnaseq-pe-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@
forward_adapter: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
reverse_adapter: GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
reference_genome: dm6
strandness: unstranded
strandedness: unstranded
cufflinks_FPKM: false
stringtie_FPKM: true
outputs:
output_log:
element_tests:
Expand All @@ -47,7 +49,7 @@
cutadapt:
asserts:
has_text:
text: "GSM461177_2 4.0 1057657 25033 25779 25250 1032407 78266618 3650637 73538382 6.041191149974054"
text: "GSM461177_2 4.4 1057657 25033 25779 25250 1032407 78266618 3650637 73538382 6.041191149974054"
general_stats:
asserts:
has_text:
Expand All @@ -58,8 +60,8 @@
n: 4
star:
asserts:
has_text:
text: "GSM461177 1032407.0 71.0 854812.0 82.8 70.65 102763.0 102412.0 102040.0 679.0 20.0 24.0 0.54 0.0 1.56 0.0 1.43 82072.0 7.95 32881.0 3.18 0.0 5.9 0.17 0 60888 1754"
has_text_matching:
expression: "GSM461177 1032407.0 71.0 854812.0 82.8 70.65 10276[23].0 102412.0 1020[34][0-9].0 679.0 20.0 24.0 0.54 0.0 1.56 0.0 1.43 82072.0 7.95 32881.0 3.18 0.0 5.9 0.17 0 60888 1754"
MultiQC webpage:
asserts:
- that: "has_text"
Expand All @@ -82,33 +84,25 @@
asserts:
has_text:
text: "FBgn0010247\t13"
transcripts_expression:
element_tests:
GSM461177:
asserts:
has_text:
text: "FBtr0078104\t-\t-\tFBgn0031217\tCG11377\t-\tchr2L:102379-104142\t1583\t1.95689 28.9556 19.9177 37.9936\tOK"
genes_expression:
element_tests:
GSM461177:
asserts:
has_text:
text: "FBgn0031217\t-\t-\tFBgn0031217\tCG11377\t-\tchr2L:102379-104142\t-\t-\t28.9556 19.7218 38.1895\tOK"
both strands coverage:
element_tests:
GSM461177:
has_size:
value: 9885639
delta: 900000
negative strand coverage:
stranded coverage:
element_tests:
GSM461177:
GSM461177_reverse:
has_size:
value: 7756965
delta: 700000
positive strand coverage:
element_tests:
GSM461177:
GSM461177_forward:
has_size:
value: 7756965
delta: 700000
genes_expression_stringtie:
element_tests:
GSM461177:
asserts:
has_text:
text: "FBgn0031217\tCG11377\tchr2L\t+\t102380\t104142\t1.955939\t32.891647\t57.313370"
Loading