-
Notifications
You must be signed in to change notification settings - Fork 997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on
performing slower than double setkey
#1232
Comments
I would advise not to benchmark creation of the data.table objects, but just isolated join operation. Now it can have some more noise because of that. |
@jangorecki then how to avoid the freebie given to the second operation? namely, after the first run of my thinking is that there's noise, but it should be more or less equal across operations. the real trouble is it distorts the magnitude of difference between the operations. |
I would benchmark then manually instead of microbenchmark function. |
@jangorecki thanks for the pointer, I've updated my timings. |
Thanks for nice timing. I run your script and got |
Advantages of
In spite of all this, for most cases All in all, both Hope this helps. PS: In spite of these points, it might be possible that |
Thanks for detailed description. Can we turn that issue into FR for documentation? Besides that the issue probably will be coming back in future, the performance impact (where it may speed-up and where it may slow down) is worth to mention. Definitely not a high priority. |
@jangorecki noted down under #944 |
I recently integrated the new
on
functionality into some code of mine that was being dragged down by repetitive key switching (here for some context), so I was excited for the newon
feature to (potentially) speed things up. I was quite surprised to find that actually the code ran about 30% slower (45 instead of 35 minutes) usingon
.I was able to reproduce this using large
data.table
s beefed up from @jangorecki'sjoin_on
tests:So about 27% slower here. Maybe I'm not understanding the purpose of
on
, but I thought that the double-keyed approach should basically be an upper bound for how longon
takes. And indeedon
is faster when the tables are smaller:So, roughly 5% faster when
DT1
is smaller.Roughly 7% faster when
DT2
is smaller.The text was updated successfully, but these errors were encountered: