-
Notifications
You must be signed in to change notification settings - Fork 224
Improve performance of rem_scalar/div_scalar
for integer types (4x-10x)
#259
Comments
Fuse processing number% 3 is slow. The main reason is that type conversion is made every time. I try to remove type conversion. The performance can be 2G -- > 5G / s |
#252 shows the main reason of cast. I did some perfs in datafuse, seems the rem is the hot path. |
By using |
This is a valid request: valid use-case, documented benefits, crate with implementation available. 👍 Would you like to work on it, or would you like me to take it? Also, I went through the crate and is |
The same optimization can be used for division as well right? |
Yes! Of course.
Sorry, currently I may have no time to work on it, because there are some urgent issues I have to do in datafuse. Now I just provide this idea to make arrow2 work better. |
Done in #275 |
rem_scalar/div_scalar
for integer types (4x-10x)
Refer to:
Blog: https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/
Paper: https://arxiv.org/abs/1902.01961
Go: https://github.com/bmkessler/fastdiv
Rust: https://docs.rs/strength_reduce/0.2.3/strength_reduce/
The text was updated successfully, but these errors were encountered: