-
Notifications
You must be signed in to change notification settings - Fork 6
UserGuide: DE analysis
Differential expression analysis is made with edgeR. AskoR uses the calcNormFactors function of the edgeR library, for scaling the data among all libraries and removing the effects of outliers. The default method of this normalization procedure uses a trimmed mean of M-values (TMM) between each pair of samples, but can be changed by the users to Relative Log expression (RLE) or upperquartile. For general experiments, edgeR uses the Cox-Reid profile-adjusted likelihood method in estimating dispersion. This method is based on the idea of approximate conditional likelihood. It takes care of multiple factors by fitting generalized linear models (GLM) with a design matrix. After that we can proceed with testing procedures for determining differential expression using the "likelihood ratio test" (LRT) or "quasi-likelihood ratio f-test" (QLF). The testing can be done by using the glmLRT() or functions glmFit().
For each contrast, results of the test are displayed and a file is created to store information about all the tested genes and also about the differentially expressed genes (DEGs). These files are in a format readable by Askomics to facilitate the integration of these data in this tool. The function also generates graphical outputs (Volcano, Mean-Difference plots and Heatmap plot of top list of the DEGs).
Several parameters can be used :
- FDR threshold value
- logFC threshold value
- normalization method (TMM/RLE/upperquartile/none)
- p-value adjust method (holm/hochberg/hommel/bonferroni/BH/BY/fdr/none)
- GLM method (lrt/qlf)
# FDR threshold
parameters$threshold_FDR = 0.05
# logFC threshold
parameters$threshold_logFC = 0
# normalization method
parameters$normal_method = "TMM"
# p-value adjust method
parameters$p_adj_method = "BH"
# GLM method
parameters$glm = "lrt"
You can decide to get the Volcano or Mean-Difference Plots for each contrast:
# Mean-Difference Plot of Expression Data (aka MA plot)
parameters$plotMD = TRUE
# Volcano plot for a specified coefficient/contrast of a linear model
parameters$plotVO = TRUE
Once our parameters are defined, we can start the analysis.
# run differential expression analysis
resDEG<-DEanalysis(asko_norm, data, asko_data, parameters)
##
## AC1<AC2 AC1=AC2 AC1>AC2
## 1747 8358 886
##
## AC1<AC3 AC1=AC3 AC1>AC3
## 1323 9074 594
##
## AC2=AC3
## 10991
##
## BC1<BC2 BC1=BC2 BC1>BC2
## 45 10203 743
##
## BC1=BC3
## 10991
##
## BC2<BC3 BC2=BC3 BC2>BC3
## 1 10989 1
##
## AC1<BC1 AC1=BC1 AC1>BC1
## 635 10055 301
##
## AC2<BC2 AC2=BC2 AC2>BC2
## 2175 6599 2217
##
## AC3<BC3 AC3=BC3 AC3>BC3
## 1230 8126 1635
##
## Create Summary file
##
For each contrast, you will find the number of over- or under-expressed genes. Genotype B does not show any major effects of the treatment, unlike genotype A. We also observe a certain number of differentially expressed genes between the genotypes. Files with complete DEG analysis results are in DEG_test/DEanalysis/DETables/ folder.
This is summarized in a barplot :
Each file contain statistics tests results :
gene | contrast | Project | Expression | Significance | PValue | logFC | FC | FDR | logCPM | LR | AC1_NormMeanCount | AC2_NormMeanCount | AC1_CPMnormMeanCount | AC2_CPMnormMeanCount | Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Gene_005121 | AC1vsAC2 | TestProject | AC1<AC2 | -1 | 2.12394632406578e-09 | -1.4665319003909 | 2.76356759511363 | 1.16721470239035e-05 | 2.94023616289828 | 35.8565331926586 | 53.6484144997702 | 149.24903963193 | 4.39899 | 12.237919 | phosphatidylserine decarboxylase subunit beta |
Gene_008637 | AC1vsAC2 | TestProject | AC1<AC2 | -1 | 1.13910329040906e-09 | -1.26475188029785 | 2.40285881151432 | 1.16721470239035e-05 | 4.09890595971649 | 37.0709125212401 | 133.971909907072 | 322.829121681898 | 10.98525 | 26.4709 | intraflagellar ift172 |
Gene_000377 | AC1vsAC2 | TestProject | AC1<AC2 | -1 | 4.97853316811632e-09 | -0.689973743331213 | 1.6132541573774 | 1.2884762208252e-05 | 7.80809430259914 | 34.1977950305579 | 1971.74564202918 | 3182.16333162284 | 161.6765 | 260.9267 | long-chain-fatty-acid-ligase acsbg2-like isoform x2 |
... |
A file named "Summary_DEresults.txt" is located in the DEG_test/DEanalysis/DEtables/ folder, which contains for each gene whether it is over-expressed (1) or under-expressed (-1) or neutral (0) for a given contrast. If you had provided an annotation file, these will be found in the last columns.
First lines of the summary file :
AC1vsAC2 | AC1vsAC3 | AC2vsAC3 | ... | AC2vsBC2 | AC3vsBC3 | Description | |
---|---|---|---|---|---|---|---|
Gene_000002 | 0 | 0 | 0 | ... | 0 | 0 | hypothetical protein |
Gene_000003 | -1 | 0 | 0 | ... | 0 | 0 | histone-lysine n-methyltransferase nsd2 |
Gene_000004 | 1 | 1 | 0 | ... | -1 | 0 | hypothetical protein |
... |
You'll find in "DEG_test/DEanalysis/DEimages/" directory, les Volcano, MD plots and heatmap for each contrast.