[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

winningsix · 2024-03-20T10:36:31Z

Is your feature request related to a problem? Please describe.
Rlike was an very expensive cudf operation compared to other operator like compare. One option is mentioned in #10600 that we can replace some cases with cheaper expression. Another option we could have is to introduce short-circuit evaluation to skip some regexp by prioritizing to evaluate those cheaper evaluations.

Describe the solution you'd like

record_name regexpr (.*[0-9]{2}) AND date BTWEEN `20201111` AND `20201112` AND length(store_names) > 15

For condition mentioned above, we could optimize it by:

Evaluate other conditions firstly other than regexp
Support null masking filter on the fisrst 2 conditions and evaluate null masked data using regexp
(nice to have) reorder date and length conditions on the fly based on its selectivity

The text was updated successfully, but these errors were encountered:

revans2 · 2024-03-20T13:55:05Z

On the GPU the problems typically show up around thread divergence and non-coalesed memory access patterns.

I am not 100% sure about this so we should run some experiments and see, but I don't think it is a clear win every time. If the string is long, then it will always have a bad memory access pattern, and the length of time that the kernel takes to run is likely the amount of time it takes for the longest string to be processed by a warp. If the string is short then the memory access pattern is likely good, and even though we might still have thread divergence it is decently fast.

Replacing string values with nulls requires a memory copy, and copy_if_else for strings is not known to be that great. So we might need some kind of a heuristic to see what matters. I think in really bad cases today this could be a big win, especially if we can free up some warps early to process more data, and there are enough input strings to make that be a win.

winningsix added feature request New feature or request ? - Needs Triage Need team to review and classify performance A performance related task/issue labels Mar 20, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

winningsix commented Mar 20, 2024 •

edited

Loading

revans2 commented Mar 20, 2024

[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

[FEA] Support short-circuit evaluation for expensive expression like rlike #10613

Comments

winningsix commented Mar 20, 2024 • edited Loading

revans2 commented Mar 20, 2024

winningsix commented Mar 20, 2024 •

edited

Loading