Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup <DataMask>$set(name, chunks) #5474

Merged
merged 3 commits into from
Sep 2, 2020
Merged

speedup <DataMask>$set(name, chunks) #5474

merged 3 commits into from
Sep 2, 2020

Conversation

romainfrancois
Copy link
Member

related to #5017

with the code:

library(dplyr, warn.conflicts = FALSE)

m0 <- matrix(0, 50, 2000)
groups <- sample(1:10, 50, replace = TRUE)

m1 <- apply(m0, c(1,2), function(x) sample(c(0,1),1)) %>%
  as.data.frame() %>%
  mutate(groups = groups)

m2 <- m1 %>%
  group_by(groups)

profvis::profvis({
  summarise_all(m2, list(sum))
})

we're down from:

image

to

image

@romainfrancois
Copy link
Member Author

image

@romainfrancois
Copy link
Member Author

Seems there is some more room:

library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(vctrs)
#> Warning: package 'vctrs' was built under R version 4.0.2

m0 <- matrix(0, 50, 2000)
groups <- sample(1:10, 50, replace = TRUE)

m1 <- apply(m0, c(1,2), function(x) sample(c(0,1),1)) %>%
  as.data.frame() %>%
  mutate(groups = groups)

m2 <- m1 %>%
  group_by(groups)

bench::workout({
  summarise_all(m2, sum)
})
#> # A tibble: 1 x 3
#>   exprs                   process     real
#>   <bch:expr>             <bch:tm> <bch:tm>
#> 1 summarise_all(m2, sum)    2.05s    2.05s

lump <- function(x) vec_c(!!!x)
bench::workout({
  indices <- group_rows(m2)
  chops <- map(m2[1:2000], vec_chop, indices)
  chunks <- map(chops, function(chunks) {
    lump(map(chunks, sum))
  })
})
#> # A tibble: 3 x 3
#>   exprs                                                             process
#>   <bch:expr>                                                        <bch:t>
#> 1 indices <- group_rows(m2)                                            35µs
#> 2 chops <- map(m2[1:2000], vec_chop, indices)                        31.4ms
#> 3 chunks <- map(chops, function(chunks) { lump(map(chunks, sum)) }) 304.4ms
#> # … with 1 more variable: real <bch:tm>

Created on 2020-08-11 by the reprex package (v0.3.0.9001)

@romainfrancois
Copy link
Member Author

I think these are harmless changes, just moving some code down to compiled code.

@romainfrancois romainfrancois merged commit 745c9a6 into master Sep 2, 2020
@romainfrancois romainfrancois deleted the mask_perf branch September 2, 2020 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant