Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GpuRegExExtract is not align with RegExExtract #5135

Closed
sperlingxx opened this issue Apr 2, 2022 · 0 comments · Fixed by #5136
Closed

[BUG] GpuRegExExtract is not align with RegExExtract #5135

sperlingxx opened this issue Apr 2, 2022 · 0 comments · Fixed by #5136
Assignees
Labels
bug Something isn't working

Comments

@sperlingxx
Copy link
Collaborator

Describe the bug

  1. In Spark, the pattern of regexp_extract isn't required to fully match the input string. Just as what extract means, regexp_extract finds any substring which matches the pattern and extracts them. However, GpuRegExExtract requires a full match on pattern.
val df =Seq("1a", "2a", "3a", "4a", "5a", "6a", "7a", "8a", "9a", "10a").toDF("c")
df.coalesce(1).select(regexp_extract(col("c"), "(a)", 1)).collect()

GPU result: Array([], [], [], [], [], [], [], [], [], [])
CPU result: Array([a], [a], [a], [a], [a], [a], [a], [a], [a], [a])

  1. When group index is 0, GpuRegExExtract doesn't behave correctly.

CPU run: regexp_extract('123abcEfg', '([0-9]+)[a-z]+([A-Z])', 0) => 123abcE
CPU run: regexp_extract('123abcEfg', '([0-9]+)[a-z]+([A-Z])', 0) => 123abcEfg

BTW, the issue is originated from #5088.

@sperlingxx sperlingxx added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 2, 2022
@sperlingxx sperlingxx self-assigned this Apr 2, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants