-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dplyr v. 1.0.0 compatibility #354
Conversation
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix # Conflicts: # NEWS.md # man/bed_random.Rd
The benchmarks look normal on my end with these changes. I will wait until github actions are set up and tested on this PR prior to merging. dev dplyr with changeslibrary(valr)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(tibble)
library(scales)
library(microbenchmark)
genome <- read_genome(valr_example('hg19.chrom.sizes.gz'))
# number of intervals
n <- 1e6
# number of timing reps
nrep <- 3
seed_x <- 1010486
x <- bed_random(genome, n = n, seed = seed_x)
seed_y <- 9283019
y <- bed_random(genome, n = n, seed = seed_y)
res <- microbenchmark(
# randomizing functions
bed_random(genome, n = n, seed = seed_x),
bed_shuffle(x, genome, seed = seed_x),
# # single tbl functions
bed_slop(x, genome, both = 1000),
bed_flank(x, genome, both = 1000),
bed_shift(x, genome),
bed_merge(x),
bed_partition(x),
bed_cluster(x),
bed_complement(x, genome),
# multi tbl functions
bed_closest(x, y),
bed_intersect(x, y),
bed_map(x, y, .n = length(end)),
bed_subtract(x, y),
bed_window(x, y, genome),
# stats
bed_absdist(x, y, genome),
bed_reldist(x, y),
bed_jaccard(x, y),
bed_fisher(x, y, genome),
bed_projection(x, y, genome),
# utilities
bed_makewindows(x, win_size = 100),
times = nrep,
unit = 's')
# covert nanoseconds to seconds
res <- res %>%
as_tibble() %>%
mutate(time = time / 1e9) %>%
arrange(time)
# futz with the x-axis
maxs <- res %>%
group_by(expr) %>%
summarize(max.time = max(boxplot.stats(time)$stats))
# filter out outliers
res <- res %>%
left_join(maxs) %>%
filter(time <= max.time * 1.05)
#> Joining, by = "expr"
ggplot(res, aes(x=reorder(expr, time), y=time)) +
geom_boxplot(fill = 'red', outlier.shape = NA, alpha = 0.5) +
coord_flip() +
theme_bw() +
labs(
y='execution time (seconds)',
x='',
title="valr benchmarks",
subtitle=paste(comma(n), "random x/y intervals,", comma(nrep), "repetitions")) cran dplyr with changescurrent CRAN valr with cran dplyrCreated on 2020-03-23 by the reprex package (v0.3.0) |
I believe that a recent commit in don't run this code as it may segfault. #devtools::install_gitub("tidyverse/dplyr") @35d3ace
#devtools::install_gitub("r-lib/vctrs") @1350e43
library(dplyr, warn.conflicts = FALSE)
library(valr)
packageVersion("dplyr")
#> [1] '0.8.99.9002'
packageVersion("vctrs")
#> [1] '0.2.99.9010'
bedfile <- valr_example('genes.hg19.chr22.bed.gz')
genes <- read_bed(bedfile, n_fields = 6)
tss <- genes %>%
filter(strand == '+')
tss
#> # A tibble: 330 x 6
#> chrom start end name score strand
#> <chr> <int> <int> <chr> <chr> <chr>
#> 1 chr22 16162065 16172265 LINC00516 3 +
#> 2 chr22 16239287 16239327 DQ590589 1 +
#> 3 chr22 16241085 16241125 DQ590589 1 +
#> 4 chr22 16244205 16244245 DQ590589 1 +
#> 5 chr22 16245997 16246037 DQ590589 1 +
#> 6 chr22 16274557 16278600 P712P 2 +
#> 7 chr22 17029615 17029643 DQ571479 1 +
#> 8 chr22 17082800 17129720 TPTEP1 9 +
#> 9 chr22 17308363 17310225 HSFY1P1 2 +
#> 10 chr22 17517459 17539682 CECR7 4 +
#> # … with 320 more rows
#segfaults
tss %>%
mutate(end = start + 1) |
Github Actions are set up on master. |
The checks didn't run on the last commit. I don't see a way to restart on my end. |
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix # Conflicts: # .gitignore
The build is failing again due to upstream changes in dplyr or vctrs packages. I'll look into it.. |
This all looks good. We'll have to remove Does this cover changes proposed in #363? |
We will be able to release this prior to There are few changes proposed in #364 that we don't have, not sure if they are needed as I don't get any warnings or test failures. I'll port the relevant commits over to this PR just to be safe. |
Mostly small changes:
tests use expect_equivalent() instead of expect_equal() when comparing
trbl_interval()
objects totibbles()
, as a tibble is not equal to a trbl_interval due to differing class attributes.The
add
argument ofgroup_by()
is deprecated in favor of.add
. I've replaced uses ofadd
with explicit naming of the groups needed for grouping to avoid using this argument.The dplyr dependency was bumped to 0.8.0 and all code used to maintain older compatibility was removed, as it is likely unnecessary at this point.
For now we should continue to test against dev dply until they have a stable release candidate, then we can update valr on CRAN.