-
Notifications
You must be signed in to change notification settings - Fork 6
UserGuide: GO Enrichment Analysis
We uses the GOs annotations file to perform enrichment analysis on differentially expressed gene. For this, you define several parameters:
1) for enrichment
-
parameters$GO_threshold
the significant threshold used to filter p-values -
parameters$GO_min_num_genes
the minimum number of genes for each GO terms in the genome -
parameters$GO
gene set chosen for analysis "up", "down", "both" (up+down) -
parameters$GO_algo
algorithms for runTest function ("classic", "elim", "weight", "weight01", "lea", "parentchild") -
parameters$GO_stats
statistical tests for runTest function ("fisher", "ks", "t", "globaltest", "sum", "ks.ties")
2) for visualization
-
parameters$Ratio_threshold
the min enrichment ratio to display GO in graph -
parameters$GO_max_top_terms
the maximum number of GO terms plot for each GO category -
parameters$GO_min_sig_genes
the minimum number of significant gene(s) behind the enriched GO-term to display GO in graph
If provided by the user, genes are linked to Gene Ontology (GO) annotations in the GO annotation file (see Input files description section) making GO enrichment analysis possible. Each gene can be annotated with several terms which define biological pathways in which the genes are involved. GO terms are classified into 3 categories (MF describing the molecular activity of a gene, BP describing a broader biological process in which the gene is involved in coordination with other genes, and CC describing the cellular location in which the gene performs its function).
GO enrichment is automatically performed using the topGO package on the DE genes (up, down, or both, depending on the parameters$GO
) of each contrast of the experiment. The developed GOenrichment function generates numerous tables and plots.
Be careful: As enrichment tests are based on proportions, the lower the number of genes, the less reliable the test is. The interpretation of the enrichment is then hazardous. Do not draw hasty conclusions for gene lists of less than 100 genes.
The commands for running GO enrichment analysis are:
# Parameters for GO enrichment
parameters$GO_threshold = 0.05
parameters$GO_min_num_genes = 10
parameters$GO = "both"
parameters$GO_algo = "weight01"
parameters$GO_stats = "fisher"
# Parameters for GO enrichment graphs
parameters$Ratio_threshold = 1
parameters$GO_max_top_terms = 10
parameters$GO_min_sig_genes = 5
# run analysis
GOenrichment(resDEG, data, parameters)
A "DEG_test/GOenrichment/TOTAL_DEgenes/" directory will be created with all GO images and tables of statistics for results of both (up and down) DE genes (parameters$GO = "both" ).
A "DEG_test/GOenrichment/UP_DEgenes/" directory will be created with all GO images and tables of statistics for results of DE UP regulated genes (parameters$GO = "up" ).
A "DEG_test/GOenrichment/DOWN_DEgenes/" directory will be created with all GO images and tables of statistics for results of DE DOWN regulated genes (parameters$GO = "down" ).
Global graphs are created, for each contrast, highlighting either the p-value or the enrichment ratio in the 3 GO categories.
Example of global graphs:
To satisfy the maximum number of users, individualized graphs by GO category are also created, regrouping the p-value and enrichment ratio information.
Example of detailed graphs:
Example of one statistical table:
GO.ID | Term | Annotated | Significant | Expected | statisticTest | Ratio | GO_cat |
---|---|---|---|---|---|---|---|
GO:0003735 | structural constituent of ribosome | 135 | 51 | 31.31 | 0.000086 | 1.628873 | MF |
GO:0004812 | aminoacyl-tRNA ligase activity | 51 | 24 | 11.83 | 0.000150 | 2.028741 | MF |
GO:0019843 | rRNA binding | 23 | 13 | 5.33 | 0.000570 | 2.439024 | MF |
... |
Explications of some columns:
- Annotated: number of genes in your genome (the gene universe) annotated with the GO-term.
- Significant: number of genes in the list annotated with the GO-term.
- Expected: number of genes expected in the list if the proportion of the genes in the list was quite equal to its proportion in the gene universe (meaning no enrichment)
-
statisticTest: p-value of the statistic test chosen
Sub-directories are created in "DEG_test/GOenrihcment/TOTAL-UP-DOWN_DEgenes/" directories for each contrast in which the user can find many tables resuming informations on the genes behind each enriched GO-term (name of the genes, their description if an annotation file is provided, their DE status in all contrasts, and their normalized expression in CPM in all experimental conditions).
Example:
Gene | Gene_description | AC1vsAC2 | AC1vsAC3 | AC2vsAC3 | BC1vsBC2 | BC1vsBC3 | BC2vsBC3 | AC1vsBC1 | AC2vsBC2 | AC3vsBC3 | AC1 | AC2 | AC3 | BC1 | BC2 | BC3 | GO_ID | GO_term | GO_cat |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gene_002357 | phenylalanine-trna ligase beta subunit | 1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | 266.349500 | 210.51620 | 221.547300 | 254.519300 | 266.1183000 | 300.3326000 | GO:0000162 | tryptophan biosynthetic process | BP |
Gene_002384 | tyrosine-trna ligase | 1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | 0 | 64.172870 | 51.79736 | 51.085950 | 60.415040 | 71.3212400 | 61.5877900 | GO:0000162 | tryptophan biosynthetic process | BP |
Gene_003773 | tyrosyl-trna synthetase | 1 | 0 | 0 | 0 | 0 | 0 | 0 | -1 | -1 | 375.692400 | 300.71430 | 306.424000 | 333.957000 | 380.2730000 | 402.9832000 | GO:0000162 | tryptophan biosynthetic process | BP |
... |
Finally, for each analysis, a "NameOfTheContrast_SignificantGO" directory is created in which you can find, for each enriched GO-term, the genes that enabled enrichment.