-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify reprog_device::extract to return groups in a single pass #8460
Modify reprog_device::extract to return groups in a single pass #8460
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.08 #8460 +/- ##
===============================================
Coverage ? 82.95%
===============================================
Files ? 109
Lines ? 18226
Branches ? 0
===============================================
Hits ? 15120
Misses ? 3106
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, got a few mostly stylistic suggestions.
Didn't dig too deep into the algorithmic aspect of the changes, let me know if it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CMake changes LGTM
@gpucibot merge |
) Closes #8569 This essentially undoes the performance improvement made in #8460 since the logic mishandles a greedy quantifier pattern when it occurs inside an extract group. The internal regex logic is only able to track a single extract group when such a quantifier is specified. This PR does improve the interface for the internal extract call and adds some gtests for this issue. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu) - MithunR (https://github.com/mythrocks) URL: #8575
) This is a less ambitious version of #8460 which had to be reverted in #8575 because it did not work with greedy quantifiers. The change here involves calling the underlying `reprog_device::extract` to retrieve each group result within a single kernel rather than launching a kernel for each group. The output is placed contiguously in a 2d span (wrapped uvector) and a permutation iterator is used to build the output columns (one column per group). Like it's predecessor, the performance improvement is mostly when specifying more than 1 group in the regex pattern. The benchmark results showed no change for single groups but was 2x faster for multiple groups over long (8K) strings and up to 4x faster for multiple groups over many (16M) strings. The benchmark test for extract was also updated to better report the number of groups being used when measuring results. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Nghia Truong (https://github.com/ttnghia) URL: #9358
This PR modifies the internal regex
reprog_device::extract
function to return all matching groups in a single call. Previously, retrieving each group range required individual calls to thisextract
function resulted in re-matching the entire given pattern for each group. The code logic would identify each group but only return the range for the specified group.The code change here passes a pre-allocated global memory array to capture each group range in a single pass. The extract is an all-or-nothing process. In fact, a
find
function must first be executed to retrieve the bounds of the given pattern. So if any of the groups are missing or do not match, no groups are returned for that row. Retrieving the last group would always require processing the previous groups and the code logic now records those positions in the global memory array. The memory array can then be used directly to build the output columns.This simplifies the code around extract and also improves performance especially for long strings or patterns with many groups. For small strings and a small number of groups, the gbenchmark showed equivalent performance to the previous implementation. For larger strings and more groups, the gbenchmark showed a 2-3x improvement.