Skip to content

Latest commit

 

History

History
152 lines (109 loc) · 5.24 KB

README.md

File metadata and controls

152 lines (109 loc) · 5.24 KB

Logolas

Logolas is an R package for Enrichment Depletion Logo plots with string symbols, that highlights both enrichment and depletion of symbols, as opposed to standard logo plots, as in seqLogo package, that are biased towards highlighting enrichments. Logolas also generalizes logo plots to use both characters and strings.

If you find a bug, please create an issue.

This code has been tested in ...

misc

License

Copyright (c) 2018-2019, Kushal Dey.

All source code and software in this repository are made available under the terms of the GNU General Public License. See the LICENSE file for the full text of the license.

Citing this work

If you find that this R package is useful for your work, please cite our paper which is out on BMC Bioinformatics:

Dey, K.K., Xie, D. and Stephens, M., 2018. A new sequence logo plot to highlight enrichment and depletion. BMC Bioinformatics. 19:473 https://doi.org/10.1186/s12859-018-2489-3.

Quick Start

The most recent version of Logolas is available from Github using devtools R package.First, you would require to install the following Bioconductor packages.

source("https://bioconductor.org/biocLite.R")
biocLite(c("Biostrings","BiocStyle","Biobase","seqLogo","ggseqlogo"))

Then install Logolas as follows

library(devtools)
install_github("kkdey/Logolas",build_vignettes = TRUE)

Once you have installed the package, load the package in R by entering

library(Logolas)

To get an overview of the package, enter

help(package = "Logolas")

Next, try creating a few plots using the logomaker function:

Create a standard Logo plot in Logolas, analogous to seqLogo and ggseqLogo R packages.

sequence <- c("CTATTGT","CTCTTAT","CTATTAA","CTATTTA", "CTATTAT","CTTGAAT",
              "CTTAGAT","CTATTAA","CTATTTA","CTATTAT", "CTTTTAT","CTATAGT",
              "CTATTTT","CTTATAT","CTATATT","CTCATTT", "CTTATTT","CAATAGT",
              "CATTTGA","CTCTTAT","CTATTAT","CTTTTAT", "CTATAAT","CTTAGGT",
              "CTATTGT","CTCATGT","CTATAGT", "CTCGTTA","CTAGAAT","CAATGGT")
logomaker(sequence,type = "Logo")

misc

The corresponding EDLogo plot highlights the depletion of T in the middle, not visually clear in the standard logo plot.

logomaker(sequence, type = "EDLogo")

misc

One can also apply EDLogo for amino acid motifs, marked by alphabets beyond A, C, G and T as in DNA motifs.

We create an EDLogo plot on the amino acid sequences at N-Glycosylation sites, with a user specified background bg chosen to be the median psoitional weight of an aminoa acid in the context around the glycosylation site [data from Uniprotkb].

data("N_Glycosyl_sequences")
bg <- apply(N_Glycosyl_sequences, 1, function(x) return(median(x)))
bg <- bg/sum(bg)
logomaker(N_Glycosyl_sequences, type = "EDLogo", bg=bg)

misc

EDLogo highlights the motif Asn (N) -X- Ser (S)/Thr (T) -X motif at the center where X is depleted for the amino acid Pro (P).

Logolas allows the symbols in the logo plot to be a combination of strings and charcaters or be purely strings - examples of which are shown below

For a mutation signature (mismatch type at the center with flanking bases) example (data from Shiraishi et al 2015).

data(mutation_sig)
logomaker(mutation_sig, type = "EDLogo", color_type = "per_symbol",  color_seed = 2000)

misc

EDLogo plot for the enrichment and depletion of histone marks in different parts of the genome (data from Koch et al 2007).

data(histone_marks)
logomaker(histone_marks$mat, bg = histone_marks$bgmat, type = "EDLogo")

misc

Finally, please walk through some more detailed examples in the vignette:

vignette("Logolas")

Developer notes

This was the R command used to generate the vignette PDF file from the R Markdown source:

render("Logolas.Rmd",output_format="pdf_document")

Credits

This software was developed by Kushal Dey, Dongyue Xie and Matthew Stephens at the University of Chicago. For any questions or comments, please contact Kushal Dey at [email protected].

The authors would like to acknowledge Oliver Bembom, the author of the seqLogo package which acted as an inspiration and starting point for this software. The authors also thank Peter Carbonetto, Edward Wallace and John Blischak for helpful discussions and feedback.