Skip to content

UserGuide: DE analysis

S.Alves edited this page Sep 2, 2021 · 6 revisions

Differential gene expression analysis

Differential expression analysis is made with edgeR. AskoR uses the calcNormFactors function of the edgeR library, for scaling the data among all libraries and removing the effects of outliers. The default method of this normalization procedure uses a trimmed mean of M-values (TMM) between each pair of samples, but can be changed by the users to Relative Log expression (RLE) or upperquartile. For general experiments, edgeR uses the Cox-Reid profile-adjusted likelihood method in estimating dispersion. This method is based on the idea of approximate conditional likelihood. It takes care of multiple factors by fitting generalized linear models (GLM) with a design matrix. After that we can proceed with testing procedures for determining differential expression using the "likelihood ratio test" (LRT) or "quasi-likelihood ratio f-test" (QLF). The testing can be done by using the glmLRT() or functions glmFit().

For each contrast, results of the test are displayed and a file is created to store information about all the tested genes and also about the differentially expressed genes (DEGs). These files are in a format readable by Askomics to facilitate the integration of these data in this tool. The function also generates graphical outputs (Volcano, Mean-Difference plots and Heatmap plot of top list of the DEGs).

Several parameters can be used :

  • FDR threshold value
  • logFC threshold value
  • normalization method (TMM/RLE/upperquartile/none)
  • p-value adjust method (holm/hochberg/hommel/bonferroni/BH/BY/fdr/none)
  • GLM method (lrt/qlf)
# FDR threshold
parameters$threshold_FDR = 0.05
# logFC threshold
parameters$threshold_logFC = 0
# normalization method
parameters$normal_method = "TMM"
# p-value adjust method
parameters$p_adj_method = "BH"
# GLM method
parameters$glm = "lrt"

You can decide to get the Volcano or Mean-Difference Plots for each contrast:

# Mean-Difference Plot of Expression Data (aka MA plot)
parameters$plotMD = TRUE
# Volcano plot for a specified coefficient/contrast of a linear model
parameters$plotVO = TRUE

Once our parameters are defined, we can start the analysis.

# run differential expression analysis
resDEG<-DEanalysis(asko_norm, data, asko_data, parameters)

## 
## AC1<AC2 AC1=AC2 AC1>AC2 
##    1747    8358     886
## 
## AC1<AC3 AC1=AC3 AC1>AC3 
##    1323    9074     594
## 
## AC2=AC3 
##   10991
## 
## BC1<BC2 BC1=BC2 BC1>BC2 
##      45   10203     743
## 
## BC1=BC3 
##   10991
## 
## BC2<BC3 BC2=BC3 BC2>BC3 
##       1   10989       1
## 
## AC1<BC1 AC1=BC1 AC1>BC1 
##     635   10055     301
## 
## AC2<BC2 AC2=BC2 AC2>BC2 
##    2175    6599    2217
## 
## AC3<BC3 AC3=BC3 AC3>BC3 
##    1230    8126    1635
## 
## Create Summary file
##

For each contrast, you will find the number of over- or under-expressed genes. Genotype B does not show any major effects of the treatment, unlike genotype A. We also observe a certain number of differentially expressed genes between the genotypes. Files with complete DEG analysis results are in DEG_test/DEanalysis/AskoTables/ folder.

Each file contain statistics tests results :

Test_id measured_in@Contrast is@gene logFC FC PValue Expression Significance logCPM LR FDR
AC1vsAC2_Gene_005121 AC1vsAC2 Gene_005121 -1.4665319003909 2.76356759511363 2.12394632406578e-09 AC1<AC2 -1 2.94023616289828 35.8565331926586
AC1vsAC2_Gene_008637 AC1vsAC2 Gene_008637 -1.26475188029785 2.40285881151432 1.13910329040906e-09 AC1<AC2 -1 4.09890595971649 37.0709125212401
AC1vsAC2_Gene_000377 AC1vsAC2 Gene_000377 -0.689973743331213 1.6132541573774 4.97853316811632e-09 AC1<AC2 -1 7.80809430259914 34.1977950305579
...

A file named "DEG_test_summary_DE" is located in the DEG_test/DEanalysis/ folder, which contains for each gene whether it is over-expressed (1) or under-expressed (-1) or neutral (0) for a given contrast. If you had provided an annotation file, these will be found in the last columns.

First lines of the summary file :

AC1vsAC2 AC1vsAC3 AC2vsAC3 ... AC2vsBC2 AC3vsBC3 Description
Gene_000002 0 0 0 ... 0 0 hypothetical protein
Gene_000003 -1 0 0 ... 0 0 histone-lysine n-methyltransferase nsd2
Gene_000004 1 1 0 ... -1 0 hypothetical protein
...

You'll find in "DEG_test/DEanalysis/images/" directory, les Volcano, MD plots and heatmap for each contrast.

DEplot1 DEplot2 DEplot3