Dereck’s lab tools package - installable via devtools.
RNAseq_GSEAheatmaps
: Make heatmaps from combined GSEA reports after EdgeRVolcano-class
: \code{S4 class; Volcano data.table manipulated and prepared for plottingcite_RNAseqGSEA
: RNAseq GSEA methods citationdtcolnames
: Get the data column names from data.table; exclude ID colexcel2List
: Read an excel workbook to a named list of data.tables or other typelist2Excel
: Save a named list of \code{data.frames | \code{data.tables to and Excel workbook with multiple sheetsplot,Volcano,ANY-method
: \code{S4 method for \code{Volcano \code{S4 classread.gmt
: Read a .gmt pathway type file as a list.sig2UpDownGmt
: Signature(s) files to GMT list; split up and down (inclusive) based on log2fcsummariseSignatures
: Summarise how many genes up and down from a combined signature filetable2tabs
: Parse Excel tables from one sheet to named tabstabs2table
: Combine Excel sheets to single tableto.data.frame
: \code{S3 generic; convert to data.frame \strong{move column to \code{rownamesto.matrix
: \code{S3 generic; convert to matrix \strong{move column to \code{rownamesvalueCoordinates
: Coordinates of X values in a data.table|data.framewrite.gmt
: Write GMT list to a formatted file.
Use devtools
to install this package:
devtools::install_github("CoarfaBCM/derecksLabTools", force = TRUE)
library("derecksLabTools")
Load the library with library("derecksLabTools")
or call every
function preceded with: derecksLabTools::
.
Sometimes knowing NAs or X value are present in your data is not enough, you want to know where exactly.
This function does: data == value | is.na(data)
To create a truth table and then retrieves the column and row of where the value occurred as a data.frame.
test <- head(iris, 10)
test[3:5, 1:2] <- NA
derecksLabTools::valueCoordinates(test, value = NA)
column row
1 1 3
2 2 4
3 1 5
4 2 3
5 1 4
6 2 5
Returns a list of desired type of a data.frame
default is
data.table
. You can pass a coercion function either as a string or raw
function, see usage:
derecksLabTools::excel2List(
system.file("extdata", "comparisons.xlsx", package = "derecksLabTools"),
FUN_type = as.data.frame
)
derecksLabTools::excel2List(
system.file("extdata", "comparisons.xlsx", package = "derecksLabTools"),
FUN_type = "data.table::as.data.table"
)
Takes in a compiled GSEA report and creates heatmaps.
Input data format as follows:
Here is an example of the output:
Usage:
path <- system.file(
"extdata",
"GSEA-combined-enrichment-profiles.xlsx",
package = "derecksLabTools"
)
heatmaps <- derecksLabTools::RNAseq_GSEAheatmaps(
path,
scale_bounds = NULL,
reo_order_cols = NULL,
clust_row = TRUE,
clust_col = FALSE,
show_rownames = TRUE,
show_colnames = TRUE
)
pdf("./outputs/20211220_GSEA_results/gsea_results_bp/gobp-enrichment-heatmap.pdf", width = 7, height = 10)
print(heatmaps$gobp)
dev.off()
pdf("./outputs/20211220_GSEA_results/gsea_results_hallmark/hallmark-enrichment-heatmap.pdf", width = 7, height = 10)
print(heatmaps$hallmark)
dev.off()
pdf("./outputs/20211220_GSEA_results/gsea_results_kegg/kegg-enrichment-heatmap.pdf", width = 7, height = 10)
print(heatmaps$kegg)
dev.off()
pdf("./outputs/20211220_GSEA_results/gsea_results_reactome/reactome-enrichment-heatmap.pdf", width = 7, height = 10)
print(heatmaps$reactome)
dev.off()
Parse Excel tables from one sheet to named tabs
Parses tables from one Excel sheet based on an identifier, empty
rows/columns must be left between tables as these are used for edge
detection by the is.na()
function. Each table on the sheet should have
column”” names, the first is used for identification of tables, the
second for tab names.
This is a tool used at our lab for quickly writing comparisons for the RNAseq analysis and then converting them to multiple tabs.
The typical format is; colnames: “ID”, “comparison_name”, where ID designates the sample ID’s and comparison_name designates test/control.
Note that you can have other content on your excel sheet as long as it does not contain the table_id string used for parsing.
Input:
Output:
Arguments:
-
file
String; path to a file type xlsx. - table_id String
$$default "ID"$$ ; this is used for identifying the individual tables on a single sheet. -
out_file
String; the name of the output file - must have extension.xlsx
. -
return
Boolean defaultFALS**E ; if TRUE returns the parsed data.-
Returns
: if return argument set to TRUE; a list ofdata.frame
s - might be useful for analysis - the primary output is the file output.
-
derecksLabTools::table2tabs(
file = "./data/table2tabs/comparisons-setup.xlsx",
table_id = "ID",
out_file = "output-file.xlsx",
return = FALSE
)
Combine all sheets (tabs) from one or more Excel workbooks to a single table (an index is generated - first tab), padding is added (empty rows and columns) between the indvidual tables. This is useful for getting an overview of your data and avoiding having to click n tabs.
Input:
Output:
Arguments:
dir
String; path to a directory; this will read all.xlsx
files at this location.columns
Integer default3 ; defines the number of columns to split the combined tables over. This splits the data and thus avoids having to scroll over a large amount of tables.out_file
String; the name of the output file - must have extension.xlsx
.return
Boolean defaultFALS**E ; if TRUE returns the parsed data.Returns
if return arguemnt set to TRUE; a list ofdata.frame
s - might be useful for analysis - the primary output is the file output.
derecksLabTools::tabs2table(
dir = "./mycomparisons-are-here/",
columns = 3,
out_file = "output-file.xlsx",
return = FALSE
)
Prints methods used for RNAseq and GSEA analysis, allows for variable interpolation to print a custom message.
cite_RNAseqGSEA(fold_changes = c(1.5, 2.0), normalisation_type = "TMM")
Methods: RNA seq and GSEA processing
RNAseq data was trimmed using cutadapt[1] v1.18 and fastQC[2] v0.11.9. Mapping was done with Homo_sapiens.GRCh38.101.gtf[3] as a reference genome. Trim and mapping quality was assesed with the multiqc[4] utility version 1.8. Differential expression analysis was done with use of the edgeR[5] package version 3.32.1 and EDAseq[6] 2.24.0. An FDR cutoff of 0.05 was selected and fold change cutoff: c("1.5, ", "2, "); TMM normalisation was used. GSEA[7, 8] (gene set enrichment analysis) was run with GSEA version 3.0. We used msigdb[8, 8] 7.3 human gene set files including: c2.cp.kegg.v7.3.symbols.gmt, c2.cp.reactome.v7.3.symbols.gmt, c5.go.bp.v7.3.symbols.gmt, h.all.v7.3.symbols.gmt as reference pathways. Produced reports were filtered for an FDR cutoff of 0.25, these were then used to create heatmaps.
[1] Martin, Marcel. "Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads." EMBnet.journal, vol. 17, no. 1, 2011, p. 10., doi:10.14806/ej.17.1.200.
[2] Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online] http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
[3] Schneider, Valerie A., et al. "Evaluation of GRCh38 and De Novo Haploid Genome Assemblies Demonstrates the Enduring Quality of the Reference Assembly." Genome Research, vol. 27, no. 5, 2017, pp. 849–864., doi:10.1101/gr.213611.116.
[4] Ewels, Philip, et al. "MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report." Bioinformatics, vol. 32, no. 19, 2016, pp. 3047–3048., doi:10.1093/bioinformatics/btw354.
[5] Robinson, M. D., et al. "EdgeR: a Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data." Bioinformatics, vol. 26, no. 1, 2009, pp. 139–140., doi:10.1093/bioinformatics/btp616.
[6] Risso, Davide, et al. "GC-Content Normalization for RNA-Seq Data." BMC Bioinformatics, vol. 12, no. 1, 2011, p. 480., doi:10.1186/1471-2105-12-480.
[7] Subramanian, A., et al. "Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles." Proceedings of the National Academy of Sciences, vol. 102, no. 43, 2005, pp. 15545–15550., doi:10.1073/pnas.0506580102.
[8] Liberzon, A., et al. "Molecular Signatures Database (MSigDB) 3.0." Bioinformatics, vol. 27, no. 12, 2011, pp. 1739–1740., doi:10.1093/bioinformatics/btr260.
[9] Liberzon, Arthur, et al. "The Molecular Signatures Database Hallmark Gene Set Collection." Cell Systems, vol. 1, no. 6, 2015, pp. 417–425., doi:10.1016/j.cels.2015.12.004.