Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr v. 1.0.0 compatibility #354

Merged
merged 32 commits into from
May 4, 2020
Merged

dplyr v. 1.0.0 compatibility #354

merged 32 commits into from
May 4, 2020

Conversation

kriemo
Copy link
Member

@kriemo kriemo commented Mar 20, 2020

Mostly small changes:

  1. tests use expect_equivalent() instead of expect_equal() when comparing trbl_interval() objects to tibbles(), as a tibble is not equal to a trbl_interval due to differing class attributes.

  2. The add argument of group_by() is deprecated in favor of .add. I've replaced uses of add with explicit naming of the groups needed for grouping to avoid using this argument.

  3. The dplyr dependency was bumped to 0.8.0 and all code used to maintain older compatibility was removed, as it is likely unnecessary at this point.

  4. For now we should continue to test against dev dply until they have a stable release candidate, then we can update valr on CRAN.

R/bed_absdist.r Outdated Show resolved Hide resolved
kriemo added 4 commits March 21, 2020 08:43
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix

# Conflicts:
#	NEWS.md
#	man/bed_random.Rd
@kriemo
Copy link
Member Author

kriemo commented Mar 23, 2020

The benchmarks look normal on my end with these changes. I will wait until github actions are set up and tested on this PR prior to merging.

dev dplyr with changes

library(valr)
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(tibble)
library(scales)
library(microbenchmark)

genome <- read_genome(valr_example('hg19.chrom.sizes.gz'))

# number of intervals
n <- 1e6
# number of timing reps
nrep <- 3

seed_x <- 1010486
x <- bed_random(genome, n = n, seed = seed_x)
seed_y <- 9283019
y <- bed_random(genome, n = n, seed = seed_y)

res <- microbenchmark(
  # randomizing functions
  bed_random(genome, n = n, seed = seed_x),
  bed_shuffle(x, genome, seed = seed_x),
  # # single tbl functions
  bed_slop(x, genome, both = 1000),
  bed_flank(x, genome, both = 1000),
  bed_shift(x, genome),
  bed_merge(x),
  bed_partition(x),
  bed_cluster(x),
  bed_complement(x, genome),
  # multi tbl functions
  bed_closest(x, y),
  bed_intersect(x, y),
  bed_map(x, y, .n = length(end)),
  bed_subtract(x, y),
  bed_window(x, y, genome),
  # stats
  bed_absdist(x, y, genome),
  bed_reldist(x, y),
  bed_jaccard(x, y),
  bed_fisher(x, y, genome),
  bed_projection(x, y, genome),
  # utilities
  bed_makewindows(x, win_size = 100),
  times = nrep,
  unit = 's')

# covert nanoseconds to seconds
res <- res %>%
  as_tibble() %>%
  mutate(time = time / 1e9) %>%
  arrange(time)

# futz with the x-axis
maxs <- res %>%
  group_by(expr) %>%
  summarize(max.time = max(boxplot.stats(time)$stats))

# filter out outliers
res <- res %>%
  left_join(maxs) %>%
  filter(time <= max.time * 1.05)
#> Joining, by = "expr"

ggplot(res, aes(x=reorder(expr, time), y=time)) +
  geom_boxplot(fill = 'red', outlier.shape = NA, alpha = 0.5) +
  coord_flip() +
  theme_bw() +
  labs(
    y='execution time (seconds)',
    x='',
    title="valr benchmarks",
    subtitle=paste(comma(n), "random x/y intervals,", comma(nrep), "repetitions"))

cran dplyr with changes

current CRAN valr with cran dplyr

Created on 2020-03-23 by the reprex package (v0.3.0)

@kriemo
Copy link
Member Author

kriemo commented Mar 23, 2020

I believe that a recent commit in vctrs is now causing a seqfault in the vignette build. I'm going to keep this PR open and hold off on merging until the dplyr release becomes more stable.

don't run this code as it may segfault.

#devtools::install_gitub("tidyverse/dplyr") @35d3ace
#devtools::install_gitub("r-lib/vctrs") @1350e43

library(dplyr, warn.conflicts = FALSE)
library(valr)

packageVersion("dplyr")
#> [1] '0.8.99.9002'
packageVersion("vctrs")
#> [1] '0.2.99.9010'

bedfile <- valr_example('genes.hg19.chr22.bed.gz')

genes <- read_bed(bedfile, n_fields = 6)

tss <- genes %>%
  filter(strand == '+') 

tss
#> # A tibble: 330 x 6
#>    chrom    start      end name      score strand
#>    <chr>    <int>    <int> <chr>     <chr> <chr> 
#>  1 chr22 16162065 16172265 LINC00516 3     +     
#>  2 chr22 16239287 16239327 DQ590589  1     +     
#>  3 chr22 16241085 16241125 DQ590589  1     +     
#>  4 chr22 16244205 16244245 DQ590589  1     +     
#>  5 chr22 16245997 16246037 DQ590589  1     +     
#>  6 chr22 16274557 16278600 P712P     2     +     
#>  7 chr22 17029615 17029643 DQ571479  1     +     
#>  8 chr22 17082800 17129720 TPTEP1    9     +     
#>  9 chr22 17308363 17310225 HSFY1P1   2     +     
#> 10 chr22 17517459 17539682 CECR7     4     +     
#> # … with 320 more rows


#segfaults
tss %>%
  mutate(end = start + 1)

@jayhesselberth
Copy link
Member

Github Actions are set up on master.

@jayhesselberth
Copy link
Member

The checks didn't run on the last commit. I don't see a way to restart on my end.

kriemo added 2 commits March 25, 2020 07:36
Merge remote-tracking branch 'upstream/master' into dev-dplyr-fix

# Conflicts:
#	.gitignore
@kriemo
Copy link
Member Author

kriemo commented Apr 1, 2020

The build is failing again due to upstream changes in dplyr or vctrs packages. I'll look into it..

@kriemo kriemo marked this pull request as draft April 25, 2020 20:38
R/tbls.r Show resolved Hide resolved
R/tbls.r Show resolved Hide resolved
@kriemo
Copy link
Member Author

kriemo commented May 4, 2020

closes #353 and #274

@kriemo kriemo marked this pull request as ready for review May 4, 2020 12:58
@kriemo kriemo requested a review from jayhesselberth May 4, 2020 12:59
@kriemo kriemo mentioned this pull request May 4, 2020
@jayhesselberth
Copy link
Member

This all looks good.

We'll have to remove Remotes from the DESCRIPTION before submitting to CRAN. What happens if the Remotes section is removed now? Do we need to wait current remote packages to be updated on CRAN before submitting?

Does this cover changes proposed in #363?

@kriemo
Copy link
Member Author

kriemo commented May 4, 2020

We will be able to release this prior to vctrs and dplyr being released as valr should build against the current cran versions. I'll remove the remotes so that we test the build against cran versions.

There are few changes proposed in #364 that we don't have, not sure if they are needed as I don't get any warnings or test failures. I'll port the relevant commits over to this PR just to be safe.

@jayhesselberth jayhesselberth merged commit 978f3fb into master May 4, 2020
@jayhesselberth jayhesselberth deleted the dev-dplyr-fix branch May 4, 2020 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants