Skip to content

Commit

Permalink
Merge pull request #109 from ARTbio/go
Browse files Browse the repository at this point in the history
End of the GOseq part
  • Loading branch information
drosofff authored Feb 26, 2024
2 parents 97d35df + 3e4fc85 commit c616a8e
Show file tree
Hide file tree
Showing 2 changed files with 315 additions and 71 deletions.
103 changes: 91 additions & 12 deletions docs/bulk_RNAseq-IOC/31_GO_enrichment_intro.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,96 @@
![](images/lamp.png)
## Analyzing GO Enrichment from DEGs

# Analysis of functional enrichment among the differentially expressed genes
Gene Ontology (GO) enrichment analysis is used to
identify `biological processes`, `cellular components`, and `molecular functions` that are
significantly over-represented (or under-represented) in a set of genes compared to a
background list. This is particularly valuable when analyzing differentially expressed
genes (DEGs) identified from RNA-seq or microarray experiments.

We have extracted genes that are differentially expressed in treated (Pasilla gene-depleted)
samples compared to untreated samples. We would like to know if there are categories of
genes that are enriched among the differentially expressed genes.
### Individual Gene Analysis (IGA)

Gene Ontology (GO) analysis is widely used to reduce complexity and highlight biological
processes in genome-wide expression studies.
- [x] **Concept**

This approach tests each GO term individually for enrichment within the DEG list.

However, standard methods give biased results on RNA-seq data due to over-detection
of differential expression for long and highly-expressed transcripts.
- [x] **Methods:**
* **Hypergeometric test:** Calculates the probability of observing
the number of DEGs in a specific GO term by chance.
* **Fisher's exact test:** Similar to
the hypergeometric test but suitable for smaller datasets.

- [x] **Limitations:**

* Ignores the hierarchical structure of GO, potentially missing related terms.
* Susceptible to multiple testing issues, requiring correction methods like Bonferroni
adjustment.

The goseq tool provides methods for performing GO analysis of RNA-seq data,
taking length bias into account. The methods and software used by goseq are equally
applicable to other category based tests of RNA-seq data, such as KEGG pathway analysis.
- [x] **Advanced Considerations:**

* **Multiple Testing Correction:** As mentioned, IGA is susceptible to multiple testing
issues. Here are some commonly used correction methods:
* *Bonferroni adjustment:* A conservative approach that controls the family-wise
error rate (FWER) but can be overly stringent.
* *Benjamini-Hochberg (BH) procedure:* Controls the false discovery rate (FDR) and
is less conservative than Bonferroni.
* *False discovery rate (q-value):* Provides a measure of significance adjusted
for multiple testing.
* **Gene Ontology Consortium (GOC) recommendations:** The GOC recommends using a
combination of statistical significance (p-value) and fold change thresholds to
identify relevant enriched terms, acknowledging the limitations of p-values alone.

### Gene Set Analysis (GSA)

- [x] **Concept:**

Considers the entire set of DEGs and their relationships within the GO hierarchy.

- [x] **Methods:**

* **Pathway analysis tools:** Tools like Enrichr, clusterProfiler, and GSEA analyze
pre-defined gene sets like KEGG pathways and analyze enrichment within DEGs.
* **GO-based GSA methods:**
* **Rank-based approaches:** Assign a rank to each gene based on its differential
expression and analyze enrichment within ranked gene sets. (e.g., GSEA)
* **Permutation-based approaches:** Randomly shuffle gene labels and recalculate
enrichment scores to assess statistical significance. (e.g., fgsea)
- Tools like GOseq, fgsea, and piano utilize various
statistical models to account for the hierarchical structure of GO and identify
enriched functional categories.

- [x] **Advantages of using GSA:**
* Incorporates information about gene relationships within the GO hierarchy, leading
to more biologically relevant insights.
* Reduces the burden of multiple testing compared to individual GO term analysis.

### Advanced Methods for Deeper Exploration

- [x] **Cluster enrichment analysis:** Tools like CeaGO group related GO terms based on
semantic similarity and analyze enrichment within these clusters. This approach can reveal
broader functional themes beyond individual terms.
- [x] **Network analysis:** Integrating protein-protein interaction data with GO
annotations allows identifying functionally connected subnetworks enriched in DEGs. This
provides a network-based understanding of the underlying biological processes.

### Choosing the right method

The choice of method depends on factors like:

- [x] **Size of the DEG list:** For smaller lists, IGA might be sufficient, while larger lists
benefit from GSA approaches.
- [x] **Research question:** If interested in specific GO terms,
IGA might be suitable. For broader functional insights, GSA is preferred.

### Additional considerations

- [x] **Over-detection bias** standard methods give biased results on RNA-seq data due to
over-detection of differential expression for long and highly-expressed transcripts. The
==**goseq**== tool provides methods for performing GO analysis of RNA-seq data, taking length
bias into account. The methods and software used by goseq are equally applicable to other
category based tests of RNA-seq data, such as KEGG pathway analysis.
- [x] **Background gene list:** Choosing a relevant background list representing the genes not
differentially expressed is crucial for accurate enrichment analysis.
- [x] **Multiple testing correction:** Apply appropriate correction methods to account for
testing multiple GO terms simultaneously.
- [x] **Visualization:** Utilize graphical representations like bar
charts or heatmaps to visualize enriched GO terms and their significance levels.
---
Loading

0 comments on commit c616a8e

Please sign in to comment.