You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When seeing regular expressions like ab|cd|ef in the replace API, the performance is not great for large strings. We can actually optimize this for running on the GPU using the cudf native multi-replace API. This API was recently updated for performance fixes for large strings in rapidsai/cudf#12858. So we should also measure how performant this optimization could be based on input string size and number of target strings.
Describe the solution you'd like
The RAPIDS Accelerator for Apache Spark should utilize this optimization whenever it makes sense based on performance considerations.
Describe alternatives you've considered regexp_replace does not perform well on the GPU when the input strings are very large. In addition, choices can add some complexity to the regular expression as there are more options. We can parallelize this on the GPU using cudf API described earlier.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When seeing regular expressions like
ab|cd|ef
in the replace API, the performance is not great for large strings. We can actually optimize this for running on the GPU using the cudf native multi-replace API. This API was recently updated for performance fixes for large strings in rapidsai/cudf#12858. So we should also measure how performant this optimization could be based on input string size and number of target strings.Describe the solution you'd like
The RAPIDS Accelerator for Apache Spark should utilize this optimization whenever it makes sense based on performance considerations.
Describe alternatives you've considered
regexp_replace
does not perform well on the GPU when the input strings are very large. In addition, choices can add some complexity to the regular expression as there are more options. We can parallelize this on the GPU using cudf API described earlier.The text was updated successfully, but these errors were encountered: