Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr v. 1.0.0 compatibility #354

Merged
merged 32 commits into from
May 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b16c3f1
use dev dplyr
kriemo Mar 20, 2020
a1487c3
don't pass chrom to groups more than once
kriemo Mar 20, 2020
6daccec
reuse the POS col to work around bug in dplyr (see #4996)
kriemo Mar 20, 2020
932945d
using old < dplyr v0.8.0 grouped dataframes now throws an error not a…
kriemo Mar 20, 2020
45888d9
drop equality testing for classes
kriemo Mar 20, 2020
ffaed9a
bump dplyr version to at least 0.8.0
kriemo Mar 20, 2020
58e9746
deprecate dplyr < 0.8.0 handling, require dplyr 0.8.0
kriemo Mar 20, 2020
6fc77e3
frame_data -> tribble
kriemo Mar 20, 2020
c20cd0b
fxn no longer need
kriemo Mar 20, 2020
e0245ae
update docs and news
kriemo Mar 20, 2020
6855917
clean up duplicated unique and leftover dplyr v < 0.8.0 code
kriemo Mar 21, 2020
602e7bd
sync with master
kriemo Mar 23, 2020
5eab1c1
use length instead of n()
kriemo Mar 23, 2020
42171ae
remove more old dplyr handling code
kriemo Mar 23, 2020
74b38d3
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix
kriemo Mar 23, 2020
099d435
sync w/ master
kriemo Mar 23, 2020
aaf8cc2
use remotes for rlang and vctrs to fix build, remove once dplyr is us…
kriemo Mar 24, 2020
45bcfd7
sync w/ master, retrigger build
kriemo Mar 25, 2020
78c223e
remove commented code, check build
kriemo Apr 1, 2020
4abfca8
remove custom classes
kriemo Apr 22, 2020
15f9fdb
remove custom attributes "sorted" and "merged" as not checked or used…
kriemo Apr 22, 2020
2dc56d4
remove commented code
kriemo Apr 22, 2020
c55505b
work on docs
kriemo Apr 23, 2020
b6d1b9e
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix
kriemo Apr 23, 2020
48710c0
update docs (tbl_interval -> ivl_df) (tbl_genome -> genome_df)
kriemo Apr 25, 2020
f0bcf53
update to newer tidyr syntax
kriemo Apr 25, 2020
de6533c
add news
kriemo Apr 25, 2020
57dbfa9
coerce to tibbles when checking
kriemo Apr 25, 2020
d62c437
add cols to unnest
kriemo Apr 30, 2020
338e481
mention removal of sorted and merge attributes
kriemo May 1, 2020
6b89a58
remove remote dplyr, rlang, and vctrs
kriemo May 4, 2020
958f426
implement changes suggested in #363
kriemo May 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ License: MIT + file LICENSE
Depends:
R (>= 3.1.2)
Imports:
dplyr (>= 0.7.0),
dplyr (>= 0.8.0),
rlang,
readr,
stringr,
Expand Down
16 changes: 3 additions & 13 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,12 +1,5 @@
# Generated by roxygen2: do not edit by hand

S3method(as.tbl_genome,data.frame)
S3method(as.tbl_genome,tbl_df)
S3method(as.tbl_interval,GRanges)
S3method(as.tbl_interval,data.frame)
S3method(as.tbl_interval,tbl_df)
export(as.tbl_genome)
export(as.tbl_interval)
export(bed12_to_exons)
export(bed_absdist)
export(bed_closest)
Expand All @@ -32,6 +25,8 @@ export(bed_sort)
export(bed_subtract)
export(bed_window)
export(bound_intervals)
export(check_genome)
export(check_interval)
export(concat)
export(create_introns)
export(create_tss)
Expand All @@ -40,20 +35,15 @@ export(create_utrs5)
export(db_ensembl)
export(db_ucsc)
export(flip_strands)
export(gr_to_bed)
export(interval_spacing)
export(is.tbl_genome)
export(is.tbl_interval)
export(read_bed)
export(read_bed12)
export(read_bedgraph)
export(read_broadpeak)
export(read_genome)
export(read_narrowpeak)
export(read_vcf)
export(tbl_genome)
export(tbl_interval)
export(trbl_genome)
export(trbl_interval)
export(valr_example)
export(values)
export(values_unique)
Expand Down
11 changes: 11 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
# valr 0.5.0.9000

## Major changes

* `trbl_interval()` and `trbl_genome()` custom `tibble` subclasses have been deemed unnecessary and have been removed from the package.

* coercing `GRanges` to a `valr` compatible data.frame now uses the `gr_to_bed()` function rather than `as.trbl_interal()` methods.


## Minor changes

* dplyr version < 0.8.0 is no longer supported due to unnecessary code bloat and challenges with handling multiple grouping structures (#359).

* The `sort_by` argument of `bed_random()` has been changed to `sorted`, and will now by default
use `bed_sort()` to sort the output, rather than rely on naming the sorting columns. Sorting can
be suppressed by using `sorted = FALSE`.

* `bed_sort()` now uses base R sorting with the `radix` method for increased speed. (#353)

* `tbls` processed by `bed_merge()`or `bed_sort()` no longer store either `merged` or `sorted` as attributes, due to these attributes being rarely checked in the codebase and potential sources of unexpected behavior.

## Bug fixes

* Fixed `bed_closest()` to prevent erroneous intervals being reported when adjacent closest intervals are present in the `y` table. (#348)
Expand Down
4 changes: 2 additions & 2 deletions R/bed12_to_exons.r
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
#' number, with respect to strand (i.e., the first exon for `-` strand
#' genes will have larger start and end coordinates).
#'
#' @param x [tbl_interval()]
#' @param x [ivl_df]
#'
#' @family utilities
#'
Expand All @@ -15,7 +15,7 @@
#'
#' @export
bed12_to_exons <- function(x) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
x <- check_interval(x)

if (!ncol(x) == 12) {
stop("expected 12 column input", call. = FALSE)
Expand Down
39 changes: 15 additions & 24 deletions R/bed_absdist.r
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@
#' Both absolute and scaled distances are reported as `.absdist` and
#' `.absdist_scaled`.
#'
#' @param x [tbl_interval()]
#' @param y [tbl_interval()]
#' @param genome [tbl_genome()]
#' @param x [ivl_df]
#' @param y [ivl_df]
#' @param genome [genome_df]
#'
#' @return
#' [tbl_interval()] with `.absdist` and `.absdist_scaled` columns.
#' [ivl_df] with `.absdist` and `.absdist_scaled` columns.
#'
#' @template stats
#'
Expand All @@ -39,9 +39,9 @@
#'
#' @export
bed_absdist <- function(x, y, genome) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
if (!is.tbl_interval(y)) y <- as.tbl_interval(y)
if (!is.tbl_genome(genome)) genome <- as.tbl_genome(genome)
x <- check_interval(x)
y <- check_interval(y)
genome <- check_genome(genome)

# establish grouping with shared groups (and chrom)
groups_xy <- shared_groups(x, y)
Expand All @@ -55,20 +55,11 @@ bed_absdist <- function(x, y, genome) {
x <- group_by(x, !!! groups_vars)
y <- group_by(y, !!! groups_vars)

if (utils::packageVersion("dplyr") < "0.7.99.9000"){
x_cpp <- update_groups(x)
y_cpp <- update_groups(y)
grp_indexes <- shared_group_indexes(x_cpp, y_cpp)
res <- dist_impl(x_cpp, y_cpp,
grp_indexes$x,
grp_indexes$y,
distcalc = "absdist")
} else {
grp_indexes <- shared_group_indexes(x, y)
res <- dist_impl(x, y,
grp_indexes$x, grp_indexes$y,
distcalc = "absdist")
}

grp_indexes <- shared_group_indexes(x, y)
res <- dist_impl(x, y,
grp_indexes$x, grp_indexes$y,
distcalc = "absdist")

# convert groups_xy to character vector
if (!is.null(groups_xy)) {
Expand All @@ -80,18 +71,18 @@ bed_absdist <- function(x, y, genome) {
genome <- inner_join(genome, get_labels(y), by = c("chrom"))

ref_points <- summarize(y, .ref_points = n())
genome <- inner_join(genome, ref_points, by = c("chrom", groups_xy))
genome <- inner_join(genome, ref_points, by = c(groups_xy))

genome <- mutate(genome, .ref_gap = .ref_points / size)
genome <- select(genome, -size, -.ref_points)

# calculate scaled reference sizes
res <- full_join(res, genome, by = c("chrom", groups_xy))
res <- full_join(res, genome, by = c(groups_xy))
res <- mutate(res, .absdist_scaled = .absdist * .ref_gap)
res <- select(res, -.ref_gap)

# report back original x intervals not found
x_missing <- anti_join(x, res, by = c("chrom", groups_xy))
x_missing <- anti_join(x, res, by = c(groups_xy))
x_missing <- ungroup(x_missing)
x_missing <- mutate(x_missing, .absdist = NA, .absdist_scaled = NA)
res <- bind_rows(res, x_missing)
Expand Down
26 changes: 11 additions & 15 deletions R/bed_closest.r
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
#' Identify closest intervals.
#'
#' @param x [tbl_interval()]
#' @param y [tbl_interval()]
#' @param x [ivl_df]
#' @param y [ivl_df]
#' @param overlap report overlapping intervals
#' @param suffix colname suffixes in output
#'
#' @template groups
#'
#' @return
#' [tbl_interval()] with additional columns:
#' [ivl_df] with additional columns:
#' - `.dist` distance to closest interval. Negative distances
#' denote upstream intervals.
#' - `.overlap` overlap with closest interval
Expand All @@ -18,26 +18,26 @@
#' @seealso \url{http://bedtools.readthedocs.io/en/latest/content/tools/closest.html}
#'
#' @examples
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 100, 125
#' )
#'
#' y <- trbl_interval(
#' y <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 25, 50,
#' 'chr1', 140, 175
#' )
#'
#' bed_glyph(bed_closest(x, y))
#'
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' "chr1", 500, 600,
#' "chr2", 5000, 6000
#' )
#'
#' y <- trbl_interval(
#' y <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' "chr1", 100, 200,
#' "chr1", 150, 200,
Expand All @@ -50,12 +50,12 @@
#' bed_closest(x, y, overlap = FALSE)
#'
#' # Report distance based on strand
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end, ~name, ~score, ~strand,
#' "chr1", 10, 20, "a", 1, "-"
#' )
#'
#' y <- trbl_interval(
#' y <- tibble::tribble(
#' ~chrom, ~start, ~end, ~name, ~score, ~strand,
#' "chr1", 8, 9, "b", 1, "+",
#' "chr1", 21, 22, "b", 1, "-"
Expand All @@ -74,8 +74,8 @@
#' @export
bed_closest <- function(x, y, overlap = TRUE,
suffix = c(".x", ".y")) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
if (!is.tbl_interval(y)) y <- as.tbl_interval(y)
x <- check_interval(x)
y <- check_interval(y)

check_suffix(suffix)

Expand All @@ -96,10 +96,6 @@ bed_closest <- function(x, y, overlap = TRUE,

suffix <- list(x = suffix[1], y = suffix[2])

if (utils::packageVersion("dplyr") < "0.7.99.9000"){
x <- update_groups(x)
y <- update_groups(y)
}
grp_indexes <- shared_group_indexes(x, y)

res <- closest_impl(x, y,
Expand Down
18 changes: 8 additions & 10 deletions R/bed_cluster.r
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,20 @@
#' `max_dist = 0` means that both overlapping and book-ended intervals will be
#' clustered.
#'
#' @param x [tbl_interval()]
#' @param x [ivl_df]
#' @param max_dist maximum distance between clustered intervals.
#'
#' @template groups
#'
#' @return [tbl_interval()] with `.id` column specifying sets of clustered intervals.
#' @return [ivl_df] with `.id` column specifying sets of clustered intervals.
#'
#' @family single set operations
#'
#' @seealso
#' \url{http://bedtools.readthedocs.org/en/latest/content/tools/cluster.html}
#'
#' @examples
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 100, 200,
#' 'chr1', 180, 250,
Expand All @@ -30,7 +30,7 @@
#' bed_cluster(x)
#'
#' # glyph illustrating clustering of overlapping and book-ended intervals
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 1, 10,
#' 'chr1', 5, 20,
Expand All @@ -43,14 +43,12 @@
#'
#' @export
bed_cluster <- function(x, max_dist = 0) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
x <- check_interval(x)

res <- group_by(x, chrom, add = TRUE)
res <- bed_sort(res)
groups <- rlang::syms(unique(c("chrom", group_vars(x))))
res <- group_by(x, !!! groups)

if (utils::packageVersion("dplyr") < "0.7.99.9000"){
res <- update_groups(res)
}
res <- bed_sort(res)

res <- merge_impl(res, max_dist, collapse = FALSE)

Expand Down
24 changes: 10 additions & 14 deletions R/bed_complement.r
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
#' Identify intervals in a genome not covered by a query.
#'
#' @param x [tbl_interval()]
#' @param genome [tbl_genome()]
#' @param x [ivl_df]
#' @param genome [ivl_df]
#'
#' @family single set operations
#'
#' @return [tbl_interval()]
#' @return [ivl_df]
#'
#' @examples
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 0, 10,
#' 'chr1', 75, 100
#' )
#'
#' genome <- trbl_genome(
#' genome <- tibble::tribble(
#' ~chrom, ~size,
#' 'chr1', 200
#' )
#'
#' bed_glyph(bed_complement(x, genome))
#'
#' genome <- trbl_genome(
#' genome <- tibble::tribble(
#' ~chrom, ~size,
#' 'chr1', 500,
#' 'chr2', 600,
#' 'chr3', 800
#' )
#'
#' x <- trbl_interval(
#' x <- tibble::tribble(
#' ~chrom, ~start, ~end,
#' 'chr1', 100, 300,
#' 'chr1', 200, 400,
Expand All @@ -42,8 +42,8 @@
#'
#' @export
bed_complement <- function(x, genome) {
if (!is.tbl_interval(x)) x <- as.tbl_interval(x)
if (!is.tbl_genome(genome)) genome <- as.tbl_genome(genome)
x <- check_interval(x)
genome <- check_genome(genome)

res <- bed_merge(x)

Expand All @@ -57,13 +57,9 @@ bed_complement <- function(x, genome) {

res <- group_by(res, chrom)

if (utils::packageVersion("dplyr") < "0.7.99.9000"){
res <- update_groups(res)
}

res <- complement_impl(res, genome)

res <- bind_rows(res, chroms_no_overlaps)
res <- bind_rows(res, as_tibble(chroms_no_overlaps))
res <- bed_sort(res)

res
Expand Down
Loading