-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance of pmin/pmax using .EACHI vs making an intermediate large allocation without .EACHI #4598
Comments
I should mention that defining
|
Thank you for the report - I can reproduce in both 1.12.8 and dev. It is always helpful to look at
This suggests that Taking a step back, it makes sense that extra calls take more time - here's an arbitrary example that adds some very light weight functions but still takes twice as long: library(data.table)
dt = data.table(grp = seq_len(664000L),
val = rnorm(664000L))
bench::mark(dt[, .(val = val), by = grp],
dt[, .(val = (val) + (2 - 2)), by = grp])
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc
#> <bch:expr> <bch> <bch:> <dbl> <bch:byt>
#> 1 dt[, .(val = val), by = grp] 254ms 258ms 3.87 16.8MB
#> 2 dt[, .(val = (val) + (2 - 2)), by = grp] 503ms 503ms 1.99 15.2MB
#> # ... with 1 more variable: `gc/sec` <dbl> You may be able to improve timings if you make a simplified function to call so that things are more lightweight. This is in the middle of the road - faster than the original weighted.mean2 = function (val, xstart, xend, ystart, yend) {
wts = ifelse(xend > yend, yend, xend) - ifelse(xstart > ystart, xstart, ystart)+1L
return(sum(val * wts) / sum(wts))
}
time7 = system.time({
out7 <- x[y,
list(avg_value = weighted.mean2(value, xstart2, xend2, ystart2, yend2)),
by=.EACHI, on=c("id","xend>=ystart","xstart<=yend"),nomatch=NULL, verbose = TRUE]
})
##Making each group and running j (GForce FALSE) ...
## memcpy contiguous groups took 0.057s for 664000 groups
## eval(j) took 7.332s for 664000 calls
## 7.660s elapsed (7.610s cpu)
setkey(out7, NULL)
all.equal(out1, out7)
## [1] TRUE Overall, it may be difficult to balance memory performance with actual function speed. The next step would be to look into |
Thanks for the response. It seems like this really comes down to a j statement containing a mix of group-by operations and rowwise (vectorized) operations that could just as well be done without a by statement. In most context this doesn't matter since you could just first do the non-grouping operation once over all rows in one If I had run constructed my base-R comparison correctly this would have been clear. The following demonstrates the issue independently of data.table:
ie--pmin and pmax have nonnegligible overhead. That's not to say that there isn't opportunity for data.table to do better than base R. It seems like it shouldn't be too hard to create a function to calculate interval intersect lengths (or the weighted average directly) with less overhead. I may also play around with this but I probably won't get too far very quickly since I have limited C/C++ experience. |
@myoung3 could you include the timings in your example? But one thing to note is that I do not believe your example is equivalent - for
|
@ColeMiller1 Oops yes I forgot to index over the intervals in y in that example. If I were to actually do the task in base R it would take forever because of indexing and looping slowness. But I think the following demonstrates the point pretty concisely:
So there's nothing wrong with data.table here specifically. But this issue demonstrates that when it's necessary to perform an intermediate vectorized operation on columns resulting from a join within .EACHI groups, there are time-performance costs relative to not using .EACHI since that vectorized operation is called repeatedly. In a sense, this the defining "feature" of .EACHI (it avoids a large but unnecessary intermediate allocation by reusing the same allocation for each group, and as such it's not possible to perform a vectorized operation over the entire intermediate table because that intermediate table never exists in memory--at least not all at the same time), but it really seems to highlight when baseR or other vectorized functions have overhead. |
I thought there was another post from you about using
Regardless, if you think that this is resolved, feel free to close the issue. I am not sure if there is much more n <- 664000
vals = c(1.1, 2.2)
v1 <- c(5L,8L)
v2 <- c(6L,7L)
v3 <- c(2L,1L)
v4 <- c(3L,2L)
system.time(for(i in seq_len(n)) {weighted.mean(vals, pmin(v1,v2)-pmax(v3,v4)+1L) })
## user system elapsed
## 18.66 0.05 18.80
time1 ## the original by = .EACHI example
## user system elapsed
## 17.32 0.08 17.61 |
@ColeMiller1 Yes I had a post on I could close this but first I'll ask whether there's any interest in including a discussion of this topic in any documentation? Are there other situations where people might use .EACHI but need to calculate intermediate variables which could be done via a single vectorized operation if the entire dataset existed at once? If so, it might be worth discussing the performance tradeoff between .EACHI and the intermediate allocation approach (unless work is done to find optimized functions). But maybe this use-case is so unique it's not worth discussing. @mattdowle ? Yes actually doing the indexing operations in base R on the real data takes ages. data.table does all the indexing basically instantly which is just mind-blowing. |
Actually I just searched through the vignettes and there is zero discussion of the memory benefits of using .EACHI. At the very least, it's probably worth it to just mention the memory allocation trick of .EACHI. I'm now not even sure where I read about how .EACHI works. Maybe stackoverflow or NEWS |
I did not find any direct documentation about memory considerations for There are some related issues: #2181 (and closing PR #4398) and there's a link to this SO post. Finally, below is an Rcpp approach. It's the fastest of the approaches although I will note that NA or Inf values may cause issues without modification to the function. In summary, using it with data.table alllows for 664,000 groups to be ordered and processed in less than 2 seconds. #include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double custom_fx(NumericVector vals,
IntegerVector xstart,
IntegerVector xend,
IntegerVector ystart,
IntegerVector yend) {
int n = vals.size();
int wt_sum = 0;
double val_sum = 0;
int y_end = yend[0];
int y_start = ystart[0];
for (int i = 0; i < n; i++) {
int wt = 1;
if (xend[i] > y_end) {
wt += y_end;
} else {
wt += xend[i];
}
if (xstart[i] > y_start) {
wt -= xstart[i];
} else {
wt -= y_start;
}
val_sum += vals[i] * wt;
wt_sum += wt;
}
return(val_sum / wt_sum);
} x[y,
list(avg_value = custom_fx(value, xstart2, xend2, ystart2, yend2)),
by=.EACHI, on=c("id","xend>=ystart","xstart<=yend"),nomatch=NULL, verbose = TRUE]
## Original by = .EACHI
## user system elapsed
## 11.31 0.06 11.40
## Using C++
## user system elapsed
## 1.94 0.02 1.96 |
I'm in the process of developing a function that takes values measured over intervals and averages those values to new non-aligned intervals and I've noticed that pmin and pmax are causing performance issues in some settings. The title isn't exactly accurate because I can reproduce the pmin/pmax performance issues even without using .EACHI, but I'll present this in the context of my the interval averaging function I'm writing to demonstrate why this matters.
First I'll create some data. x contains values measured over intervals (ie "value" is a time-integrated average and start/end denote
the closed interval over which the average was taken); y contains new intervals over which I'd like to calculate averages from the data in x:
Just to demonstrate what I'm trying to accomplish, here is the naive averaging approach that starts by expanding the interval dataset into a time-series. Obviously this is sub-optimal since it creates an unnecessary intermediate large table.
Instead, the better approach would be to use a non-equi join combined with a weighted average over .EACHI from the join:
The timings show that using the .EACHI is slower and this seems to be specifically because pmin/pmax are called in the argument definition of list(). This is not ideal because I'd like to use the .EACHI approach to reduce the memory usage of the averaging function I'm writing.
So basically pmin/pmax seem to be causing slowness passed directly as an argument to weighted.mean.
In base R the following return basically the same timings for me:
The text was updated successfully, but these errors were encountered: