Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading SEACR peak files #7

Merged
merged 4 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Imports:
BSgenome,
GenomicRanges,
ggplot2,
IRanges,
memes,
rGADEM,
rtracklayer,
Expand Down
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,15 @@ export(density_plot)
export(motif_enrichment)
export(read_motif_file)
export(read_peak_file)
export(read_peak_file_seacr)
export(summit_to_motif)
import(Biostrings)
import(GenomicRanges)
import(IRanges)
import(ggplot2)
import(universalmotif)
importFrom(BSgenome,getSeq)
importFrom(memes,runAme)
importFrom(memes,runFimo)
importFrom(rtracklayer,import)
importFrom(utils,read.table)
53 changes: 53 additions & 0 deletions R/read_peak_file_seacr.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#' Import a SEACR BED file as a GRanges object
#'
#' \code{read_peak_file()} reads a SEACR BED file as a GRanges object and adds
#' an additional metadata column representing the position of the peak summit.
#'
#' @importFrom utils read.table
#' @import IRanges
#' @import GenomicRanges
#'
#' @param file_path Path to a \code{BED} file.
#'
#' @examples
#' \dontrun{
#' peak_file <- system.file("extdata",
#' "ctcf_seacr.stringent.bed",
#' package = "MotifStats"
#' )
#' peak_obj <- read_peak_file_seacr(file_path = peak_file)
#' }
#'
#' @export
read_peak_file_seacr <- function(file_path){
# Read BED file as table
obj <- utils::read.table(file_path, header = TRUE)
names(obj) <- c("chr",
"start",
"end",
"total_signal",
"max_signal",
"max_signal_region")

# Create summit column
separated_cols <- strsplit(as.character(obj$max_signal_region), "[:|-]")
obj$max_chr <- sapply(separated_cols, function(x) x[1])
obj$max_start <- sapply(separated_cols, function(x) x[2])
obj$max_end <- sapply(separated_cols, function(x) x[3])
obj$summit <- floor((as.numeric(obj$max_start) + as.numeric(obj$max_end)) / 2)

# Create GRanges object
gr_ranges <- IRanges::IRanges(start = obj$start, end = obj$end)

gr_obj <- GenomicRanges::GRanges(
seqnames = obj$chr,
ranges = gr_ranges,
strand = "*"
)

GenomicRanges::mcols(gr_obj)$summit <- obj$summit
GenomicRanges::mcols(gr_obj)$name <- paste0("peak_", 1:nrow(obj))
names(gr_obj) <- GenomicRanges::mcols(gr_obj)$name

return(gr_obj)
}
7 changes: 7 additions & 0 deletions R/summit_to_motif.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,13 @@ summit_to_motif <- function(peak_input,
bfile = bfile,
thresh = fimo_threshold,
...)

# Return NULL lists if no FIMO matches are detected
if (is.null(fimo_df)) return(
list(peak_set = NULL,
distance_to_summit = NULL)
)

index_to_repeat <- base::match(as.vector(GenomicRanges::seqnames(fimo_df)),
names(peaks))
expanded_peaks <- peaks[index_to_repeat]
Expand Down
1,582 changes: 1,582 additions & 0 deletions inst/extdata/ctcf_seacr.stringent.bed

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions man/read_peak_file_seacr.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions tests/testthat/test-read_peak_file_seacr.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
test_that("The output of read_peak_file is a GRanges object", {
peak_file <- system.file("extdata",
"ctcf_seacr.stringent.bed",
package = "MotifStats")
res <- read_peak_file_seacr(file_path = peak_file)

expect_s4_class(res, class = "GRanges")
})
11 changes: 9 additions & 2 deletions vignettes/MotifStats.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,11 @@ Next, we load the narrowPeak files from MACS2/3[^f2].
name_peaks <- read_peak_file("/path/to/peak/file.narrowPeak")
```

Alternatively, we may load the BED files from SEACR[^f3].
```{r eval = FALSE}
name_peaks <- read_peak_file("/path/to/peak/file.bed")
```

For this example, we will load the built-in CTCF TIP-seq peaks (as `GRanges`
object) and CREB motif (as `PWMatrix` object) data.
```{r include = TRUE}
Expand All @@ -116,7 +121,7 @@ relative to a set of background sequences.
- A 0-order background model with the same nucleotide composition as the input
sequences is used to generate the background sequences.
- An additional `out_dir` argument can be used to specify the
directory to save the AME output files[^f3] and the background model.
directory to save the AME output files[^f4] and the background model.

```{r include = TRUE}
ctcf_read_prop <- motif_enrichment(
Expand Down Expand Up @@ -206,5 +211,7 @@ sessionInfo()
[^f1]: The peak file is a subset of chromosome 19 to reduce the file size.
[^f2]: narrowPeak files from both version 2 and 3 of [MACS: Model-based Analysis
for ChIP-Seq](https://github.com/macs3-project/MACS) is supported.
[^f3]: For more information on the output files, refer to the
[^f3]: BED files from
[SEACR: Sparse Enrichment Analysis for CUT&RUN](https://github.com/FredHutch/SEACR)
[^f4]: For more information on the output files, refer to the
[AME documentation](https://meme-suite.org/meme/doc/ame.html).
Loading