Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement "order-preserving minimal perfect hash function" for nested types #10020

Closed
ttnghia opened this issue Jan 11, 2022 · 8 comments
Closed
Labels
feature request New feature or request

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Jan 11, 2022

When dealing with lists using GPU processing, things become much complicated. For example, efficiently sorting lists on the GPU is very difficult to achieve if we compare the lists directly because lists may have very different lengths. Similarly, searching/checking for existence etc. of lists in a lists column cannot run efficiently on the GPU for the same reason.

"Order-preserving minimal perfect hash function" may be a cure. If we can implement such function on the GPU which also runs fast, we can quickly iterate through the input lists, computing their hash values then using these hash values for sorting/searching/etc. The performance of list operations therefore can be improved significantly.

Reference: https://dl.acm.org/doi/pdf/10.1145/125187.125200 (note: this variant may not exactly be the one that we want).

@ttnghia ttnghia added feature request New feature or request Needs Triage Need team to review and classify labels Jan 11, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Jan 11, 2022

Add @nvdbaranec since there were relevant discussions about this. I can't remember what were the results of discussions.

@nvdbaranec
Copy link
Contributor

And @bdice as well.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jan 18, 2022

Add @devavret for similar/related topic.

@ttnghia ttnghia changed the title [FEA] Implement "order-preserving minimal perfect hash function" for list operations [FEA] Implement "order-preserving minimal perfect hash function" for nested types Jan 25, 2022
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@vyasr
Copy link
Contributor

vyasr commented May 25, 2022

@ttnghia why was this closed? Did this get finished, or did we decide not to move forward here? If the latter, please include a brief description so that we have a reference in case we look up this issue again in the future.

@ttnghia
Copy link
Contributor Author

ttnghia commented May 25, 2022

I close this because now we already have a hashing solution for nested types, so I assume that this is no longer being considered. Please correct me if I'm wrong, and feel free to reopen it if needed.

@vyasr
Copy link
Contributor

vyasr commented May 25, 2022

That is a good point. I think this now moves very low on our list of priorities because of what you said. I suspect that using an order-preserving hash, or alternative some sort of encoding scheme, might still be useful as a way to accelerate even the new comparators by using some simpler scheme as a prefilter before doing the full comparison. However, you're definitely right that it's no longer absolutely necessary since the new comparators at least make it possible to work with arbitrarily nested types. @jrhemstad do you think there's any reason to pursue this issue further?

@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants