DE caller comparison

ARTbio · Feb 15, 2024 · 1b030f2 · 1b030f2
1 parent 89e9e94
commit 1b030f2
Show file tree

Hide file tree

Showing 5 changed files with 129 additions and 22 deletions.
diff --git a/docs/bulk_RNAseq-IOC/28_2_manipulation_limma_data.md b/docs/bulk_RNAseq-IOC/28_2_manipulation_limma_data.md
@@ -391,3 +391,5 @@ heatmap with high number of rows` tool
 
         --> `24`
     - `Run Tool`
+
+---
diff --git a/docs/bulk_RNAseq-IOC/28_manipulation_DEseq2_data.md b/docs/bulk_RNAseq-IOC/28_manipulation_DEseq2_data.md
@@ -293,13 +293,13 @@ We do this with the tools `Add Header`
 !!! info "![](images/tool_small.png){width="25" align="absbottom"} `Add Header` settings"
     - List of Column headers (comma delimited, e.g. C1,C2,...)
 
-        --> `All_DE_genes`
+        --> `DESeq_All_DE_genes`
     - Data File (tab-delimted)
 
         --> `Unique on data 1xx...`
     - Click the `Run Tool` button
 
-:warning: Rename the generated dataset `All_DE_genes`
+:warning: Rename the generated dataset `DESeq_All_DE_genes`
 
 ### Intersection (join operation) between the list of unique gene name associated with DE and the rLog-Normalized counts file.
 
@@ -318,7 +318,7 @@ To do this, we are going to use the tool
         --> `1`
     - 2nd File
 
-        --> `All_DE_genes`
+        --> `DESeq_All_DE_genes`
     - Column to use from 2nd file
 
         --> `1`

diff --git a/docs/bulk_RNAseq-IOC/29_0_comparisons.md b/docs/bulk_RNAseq-IOC/29_0_comparisons.md
@@ -1,20 +1,125 @@
-To compare DESeq2, edgeR and limma, we are now going to collect the lists of genes
-generated in the three corresponding histories in a new history which we name `DESeq2, edgeR,
-limma comparisons`.
-
-Thus, create this history `DESeq2, edgeR, limma comparisons`, and from this history, copy
-the required datasets from all three histories.
-
-- [x] in the history `PRJNA630433 DESeq2 analysis`, get the datasets
-  - top gene lists - oriented
-  - top up-regulated gene lists
-  - top down-regulated gene lists
-- [x] in the history `PRJNA630433 edgeR analysis`, get the datasets
-  - edgeR top gene lists - oriented
-  - edgeR top up-regulated gene lists
-  - edgeR top down-regulated gene lists
-- [x] in the history `PRJNA630433 limma analysis`, get the datasets
-  - limma top gene lists - oriented
-  - limma top up-regulated gene lists
-  - limma top down-regulated gene lists
+We are now going to make a quick comparison of the three Differential Expression analysis
+packages, DESeq2, edgeR, and limma, using some results generated in the histories
+"PRJNA630433 DESeq2 analysis", "PRJNA630433 edgeR analysis" and "PRJNA630433 limma
+analysis".
 
+## Collect datasets
+
+Thus, create a new history `DESeq2, edgeR, limma comparisons`, and from this history,
+import (menu `copy datasets`) the required datasets.
+
+- [x] in the history `PRJNA630433 DESeq2 analysis`, copy the dataset
+  - `DESeq_All_DE_genes` (This should be the second to last file of the history). It is a
+    simple list of gene names with a single header)
+- [x] in the history `PRJNA630433 edgeR analysis`, get the dataset
+  - `edgeR_All_DE_genes` (Also probably the second to last file of the history).
+- [x] in the history `PRJNA630433 limma analysis`, get the dataset
+  - `limma_All_DE_genes`
+
+:warning: You can stay in the current "PRJNA630433 edgeR analysis" and "PRJNA630433 limma
+analysis" history to execute these 3 copies. Just change the `Source History` of the copy
+dashboard, and be sure that, each time, the `Destination History` is `DESeq2, edgeR, limma
+comparisons`
+
+Your starting history should look like this:
+
+![](images/history_comparison.png){width="300"}
+
+## Venn diagram
+
+The three datasets contain a list of significantly deregulated (up and down) genes in
+either of the three comparisons (Mo vs Dc, Oc vs Dc, Oc vs Mo), as returned by DESeq2,
+edgeR or Limma-voom, respectively.
+
+Note that, although there is no quantitative information anymore in these lists, we can
+look at their overlaps using a Venn diagram approach.
+
+We are going to perform this analysis using the Galaxy tool `Venn diagram [JVenn]`
+
+!!! info "![](images/tool_small.png){width="25" align="absbottom"} `Venn diagram [JVenn]` settings"
+    - **1: List to compare**
+    - Enter your list
+
+        --> `Input file containing your list`
+    - Select your file
+
+        --> `DESeq_All_DE_genes`
+
+    - Does file contain header?
+
+        --> `Yes`
+    - Column number on which apply the comparison
+
+        --> `c1`
+
+    - Enter the name of this list
+
+        --> `DESeq2`
+
+    - **2: List to compare**
+    - Enter your list
+
+        --> `Input file containing your list`
+    - Select your file
+
+        --> `edgeR_All_DE_genes`
+
+    - Does file contain header?
+
+        --> `Yes`
+    - Column number on which apply the comparison
+
+        --> `c1`
+
+    - Enter the name of this list
+
+        --> `edgeR`
+
+    - **3: List to compare** (after clicking :heavy_plus_sign:**`Insert List to compare`**)
+    - Enter your list
+
+        --> `Input file containing your list`
+    - Select your file
+
+        --> `limma_All_DE_genes`
+
+    - Does file contain header?
+
+        --> `Yes`
+    - Column number on which apply the comparison
+
+        --> `c1`
+
+    - Enter the name of this list
+
+        --> `limma-voom`
+
+    - Run tool
+
+The venn diagram return by the tool should look like this:
+
+<center>![](images/jVenn_chart.png){width="500"}</center>
+
+## Discussion
+
+DESeq2 appears as a less stringent "Caller" than edgeR: overall only a few genes (10 + 23)
+are called by edgeR but not by DESeq2. This is expected because normalisation and statistical
+tests (exact Fisher's test) are similar between DESeq2 and edgeR. However DESeq perfoms an
+original step of variance shrinking and the authors of DESeq2 claim that this data transformation
+allows to call confidently more DE genes without increasing the False Detection Rate (type
+I error). Taking only the DE genes common to both DESeq2 and edgeR would allow to further
+improve the false detection rate, at the expanse of increased False Negative rate (type II
+error)
+
+Limma appears a bit "transversal" to DESeq2 and edgeR: beside a core of common gene (2713),
+Limma calls 83 genes also called by DESeq2 and 23 genes also called by edgeR, respectively.
+
+Strikingly however, limma calls a substantial number of genes which are not called either
+by DESeq2 or edgeR. This is likely due to the radically different approach of Limma-Zoom,
+in part inherited from methods to analyse continuous microarray variables (which modele
+gene expression variables as Gaussians).
+
+Note that although Limma is likely less adapted to analysis of discrete read counts, it
+performs relatively well in light of the fact that only ~10 % of DE genes called by limma
+are not called by DESeq2 or edgeR !
+---
diff --git a/docs/bulk_RNAseq-IOC/images/history_comparison.png b/docs/bulk_RNAseq-IOC/images/history_comparison.png
diff --git a/docs/bulk_RNAseq-IOC/images/jVenn_chart.png b/docs/bulk_RNAseq-IOC/images/jVenn_chart.png
Original file line number	Diff line number	Diff line change
Expand Up		@@ -391,3 +391,5 @@ heatmap with high number of rows` tool

		--> `24`
		- `Run Tool`

		---