-
Notifications
You must be signed in to change notification settings - Fork 919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Implement "order-preserving minimal perfect hash function" for nested types #10020
Comments
Add @nvdbaranec since there were relevant discussions about this. I can't remember what were the results of discussions. |
And @bdice as well. |
Add @devavret for similar/related topic. |
This issue has been labeled |
This issue has been labeled |
@ttnghia why was this closed? Did this get finished, or did we decide not to move forward here? If the latter, please include a brief description so that we have a reference in case we look up this issue again in the future. |
I close this because now we already have a hashing solution for nested types, so I assume that this is no longer being considered. Please correct me if I'm wrong, and feel free to reopen it if needed. |
That is a good point. I think this now moves very low on our list of priorities because of what you said. I suspect that using an order-preserving hash, or alternative some sort of encoding scheme, might still be useful as a way to accelerate even the new comparators by using some simpler scheme as a prefilter before doing the full comparison. However, you're definitely right that it's no longer absolutely necessary since the new comparators at least make it possible to work with arbitrarily nested types. @jrhemstad do you think there's any reason to pursue this issue further? |
When dealing with lists using GPU processing, things become much complicated. For example, efficiently sorting lists on the GPU is very difficult to achieve if we compare the lists directly because lists may have very different lengths. Similarly, searching/checking for existence etc. of lists in a lists column cannot run efficiently on the GPU for the same reason.
"Order-preserving minimal perfect hash function" may be a cure. If we can implement such function on the GPU which also runs fast, we can quickly iterate through the input lists, computing their hash values then using these hash values for sorting/searching/etc. The performance of list operations therefore can be improved significantly.
Reference: https://dl.acm.org/doi/pdf/10.1145/125187.125200 (note: this variant may not exactly be the one that we want).
The text was updated successfully, but these errors were encountered: