You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When passing an empty string ('') and a regular expression containing only ? or * repetitions, the output is not consistent between Spark and cuDF. Note this is actually a pure inconsistency with Spark's regexp_replace, it actually does not apply to what Java does as a standard.
Empty string input should be short circuited in the plugin as this is what is expected in certain versions of Spark.
Additional context
This bug was originally reported to Spark in https://issues.apache.org/jira/browse/SPARK-39107, and the issue was fixed in apache/spark#36457 for newer patch versions of Spark in 3.1,3.2,3.3 and master, so shims will need to be created to handle the original faulty behavior.
The text was updated successfully, but these errors were encountered:
Describe the bug
When passing an empty string (
''
) and a regular expression containing only?
or*
repetitions, the output is not consistent between Spark and cuDF. Note this is actually a pure inconsistency with Spark'sregexp_replace
, it actually does not apply to what Java does as a standard.Steps/Code to reproduce bug
PySpark Example Code:
Spark (CPU) Output:
Plugin (GPU) Output:
Expected behavior
Empty string input should be short circuited in the plugin as this is what is expected in certain versions of Spark.
Additional context
This bug was originally reported to Spark in https://issues.apache.org/jira/browse/SPARK-39107, and the issue was fixed in apache/spark#36457 for newer patch versions of Spark in 3.1,3.2,3.3 and master, so shims will need to be created to handle the original faulty behavior.
The text was updated successfully, but these errors were encountered: