Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

Open
winningsix opened this issue Mar 20, 2024 · 1 comment
Labels
feature request New feature or request performance A performance related task/issue

Comments

@winningsix
Copy link
Collaborator

winningsix commented Mar 20, 2024

Is your feature request related to a problem? Please describe.
Rlike was an very expensive cudf operation compared to other operator like compare. One option is mentioned in #10600 that we can replace some cases with cheaper expression. Another option we could have is to introduce short-circuit evaluation to skip some regexp by prioritizing to evaluate those cheaper evaluations.

Describe the solution you'd like

record_name regexpr (.*[0-9]{2}) AND date BTWEEN `20201111` AND `20201112` AND length(store_names) > 15

For condition mentioned above, we could optimize it by:

  1. Evaluate other conditions firstly other than regexp
  2. Support null masking filter on the fisrst 2 conditions and evaluate null masked data using regexp
  3. (nice to have) reorder date and length conditions on the fly based on its selectivity
@winningsix winningsix added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Mar 20, 2024
@revans2
Copy link
Collaborator

revans2 commented Mar 20, 2024

On the GPU the problems typically show up around thread divergence and non-coalesed memory access patterns.

I am not 100% sure about this so we should run some experiments and see, but I don't think it is a clear win every time. If the string is long, then it will always have a bad memory access pattern, and the length of time that the kernel takes to run is likely the amount of time it takes for the longest string to be processed by a warp. If the string is short then the memory access pattern is likely good, and even though we might still have thread divergence it is decently fast.

Replacing string values with nulls requires a memory copy, and copy_if_else for strings is not known to be that great. So we might need some kind of a heuristic to see what matters. I think in really bad cases today this could be a big win, especially if we can free up some warps early to process more data, and there are enough input strings to make that be a win.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request performance A performance related task/issue
Projects
None yet
Development

No branches or pull requests

3 participants