Skip to content

UserGuide: DE analysis

kgazengel edited this page Aug 29, 2024 · 6 revisions

Differential gene expression analysis

Differential expression analysis is made with edgeR. AskoR uses the calcNormFactors function of the edgeR library, for scaling the data among all libraries and removing the effects of outliers. The default method of this normalization procedure uses a trimmed mean of M-values (TMM) between each pair of samples, but can be changed by the users to Relative Log expression (RLE) or upperquartile. For general experiments, edgeR uses the Cox-Reid profile-adjusted likelihood method in estimating dispersion. This method is based on the idea of approximate conditional likelihood. It takes care of multiple factors by fitting generalized linear models (GLM) with a design matrix. After that we can proceed with testing procedures for determining differential expression using the "likelihood ratio test" (LRT) or "quasi-likelihood ratio f-test" (QLF). The testing can be done by using the glmLRT() or functions glmFit().

For each contrast, results of the test are displayed and a file is created to store information about all the tested genes and also about the differentially expressed genes (DEGs). These files are in a format readable by Askomics to facilitate the integration of these data in this tool. The function also generates graphical outputs (Volcano, Mean-Difference plots and Heatmap plot of top list of the DEGs).

Several parameters can be used :

  • FDR threshold value
  • logFC threshold value
  • normalization method (TMM/RLE/upperquartile/none)
  • p-value adjust method (holm/hochberg/hommel/bonferroni/BH/BY/fdr/none)
  • GLM method (lrt/qlf)
# FDR threshold
parameters$threshold_FDR = 0.05
# logFC threshold
parameters$threshold_logFC = 0
# normalization method
parameters$normal_method = "TMM"
# p-value adjust method
parameters$p_adj_method = "BH"
# GLM method
parameters$glm = "lrt"

You can decide to get the Volcano or Mean-Difference Plots for each contrast:

# Mean-Difference Plot of Expression Data (aka MA plot)
parameters$plotMD = TRUE
# Volcano plot for a specified coefficient/contrast of a linear model
parameters$plotVO = TRUE

Once our parameters are defined, we can start the analysis.

# run differential expression analysis
resDEG<-DEanalysis(asko_norm, data, asko_data, parameters)

## 
## AC1<AC2 AC1=AC2 AC1>AC2 
##    1747    8358     886
## 
## AC1<AC3 AC1=AC3 AC1>AC3 
##    1323    9074     594
## 
## AC2=AC3 
##   10991
## 
## BC1<BC2 BC1=BC2 BC1>BC2 
##      45   10203     743
## 
## BC1=BC3 
##   10991
## 
## BC2<BC3 BC2=BC3 BC2>BC3 
##       1   10989       1
## 
## AC1<BC1 AC1=BC1 AC1>BC1 
##     635   10055     301
## 
## AC2<BC2 AC2=BC2 AC2>BC2 
##    2175    6599    2217
## 
## AC3<BC3 AC3=BC3 AC3>BC3 
##    1230    8126    1635
## 
## Create Summary file
##

For each contrast, you will find the number of over- or under-expressed genes. Genotype B does not show any major effects of the treatment, unlike genotype A. We also observe a certain number of differentially expressed genes between the genotypes. Files with complete DEG analysis results are in DEG_test/DEanalysis/DETables/ folder.

This is summarized in a barplot :

DEplot5

Each file contain statistics tests results :

gene contrast Project Expression Significance PValue logFC FC FDR logCPM LR AC1_NormMeanCount AC2_NormMeanCount AC1_CPMnormMeanCount AC2_CPMnormMeanCount Description
Gene_005121 AC1vsAC2 TestProject AC1<AC2 -1 2.12394632406578e-09 -1.4665319003909 2.76356759511363 1.16721470239035e-05 2.94023616289828 35.8565331926586 53.6484144997702 149.24903963193 4.39899 12.237919 phosphatidylserine decarboxylase subunit beta
Gene_008637 AC1vsAC2 TestProject AC1<AC2 -1 1.13910329040906e-09 -1.26475188029785 2.40285881151432 1.16721470239035e-05 4.09890595971649 37.0709125212401 133.971909907072 322.829121681898 10.98525 26.4709 intraflagellar ift172
Gene_000377 AC1vsAC2 TestProject AC1<AC2 -1 4.97853316811632e-09 -0.689973743331213 1.6132541573774 1.2884762208252e-05 7.80809430259914 34.1977950305579 1971.74564202918 3182.16333162284 161.6765 260.9267 long-chain-fatty-acid-ligase acsbg2-like isoform x2
...

A file named "Summary_DEresults.txt" is located in the DEG_test/DEanalysis/DEtables/ folder, which contains for each gene whether it is over-expressed (1) or under-expressed (-1) or neutral (0) for a given contrast. If you had provided an annotation file, these will be found in the last columns.

First lines of the summary file :

AC1vsAC2 AC1vsAC3 AC2vsAC3 ... AC2vsBC2 AC3vsBC3 Description
Gene_000002 0 0 0 ... 0 0 hypothetical protein
Gene_000003 -1 0 0 ... 0 0 histone-lysine n-methyltransferase nsd2
Gene_000004 1 1 0 ... -1 0 hypothetical protein
...

You'll find in "DEG_test/DEanalysis/DEimages/" directory, les Volcano, MD plots and heatmap for each contrast.

DEplot1 DEplot2 DEplot3