Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #259

Closed
sundy-li opened this issue Aug 7, 2021 · 7 comments
Closed

Comments

@sundy-li
Copy link
Collaborator

sundy-li commented Aug 7, 2021

Refer to:

Blog: https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/
Paper: https://arxiv.org/abs/1902.01961
Go: https://github.com/bmkessler/fastdiv
Rust: https://docs.rs/strength_reduce/0.2.3/strength_reduce/

@lideen999
Copy link

Fuse processing number% 3 is slow. The main reason is that type conversion is made every time. I try to remove type conversion. The performance can be 2G -- > 5G / s

@sundy-li
Copy link
Collaborator Author

sundy-li commented Aug 7, 2021

@lideen999

#252 shows the main reason of cast.

I did some perfs in datafuse, seems the rem is the hot path.

@sundy-li
Copy link
Collaborator Author

@jorgecarleitao

By using strength_reduce, it shows great improvement in datafuse.

@jorgecarleitao
Copy link
Owner

This is a valid request: valid use-case, documented benefits, crate with implementation available. 👍

Would you like to work on it, or would you like me to take it?

Also, I went through the crate and is unsafe free, so even easier sell.

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Aug 10, 2021
@ritchie46
Copy link
Collaborator

The same optimization can be used for division as well right?

@sundy-li
Copy link
Collaborator Author

sundy-li commented Aug 10, 2021

The same optimization can be used for division as well right?

Yes! Of course.

Would you like to work on it, or would you like me to take it?

Sorry, currently I may have no time to work on it, because there are some urgent issues I have to do in datafuse.

Now I just provide this idea to make arrow2 work better.
And strength_reduce is not generically implemented, so it may need some type match case to dispatch the codes.

@sundy-li sundy-li changed the title Improve rem_scalar performance Improve rem_scalar/div_scalar performance Aug 10, 2021
@jorgecarleitao
Copy link
Owner

Done in #275

@jorgecarleitao jorgecarleitao changed the title Improve rem_scalar/div_scalar performance Improve performance of rem_scalar/div_scalar (4x-10x) Aug 24, 2021
@jorgecarleitao jorgecarleitao changed the title Improve performance of rem_scalar/div_scalar (4x-10x) Improve performance of rem_scalar/div_scalar for integer types (4x-10x) Aug 24, 2021
@jorgecarleitao jorgecarleitao removed the enhancement An improvement to an existing feature label Aug 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants