Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] regexp_test is failing in nightly tests #6028

Closed
jlowe opened this issue Jul 19, 2022 · 7 comments · Fixed by #6041
Closed

[BUG] regexp_test is failing in nightly tests #6028

jlowe opened this issue Jul 19, 2022 · 7 comments · Fixed by #6041
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@jlowe
Copy link
Contributor

jlowe commented Jul 19, 2022

Latest nightly run had 120 test failures in regexp_test. Sampling a few, they were all of the "plan is not columnar" variety.

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Jul 19, 2022
@jlowe
Copy link
Contributor Author

jlowe commented Jul 19, 2022

Wondering if this is related to #5776? cc: @NVnavkumar

@NVnavkumar NVnavkumar self-assigned this Jul 19, 2022
@NVnavkumar
Copy link
Collaborator

Weird, it's supposed to skip those tests actually. Let me take a look.

@NVnavkumar
Copy link
Collaborator

So there are 2 sets of integration tests: regexp_test.py and regexp_no_unicode_test.py

In regexp_test.py, there is this guard for skipping the test:

if locale.nl_langinfo(locale.CODESET) != 'UTF-8':
    pytestmark = [pytest.mark.regexp, pytest.mark.skip(reason=str("Current locale doesn't support UTF-8, regexp support is disabled"))]
else:
    pytestmark = pytest.mark.regexp

In regexp_no_unicode_test.py, it's detecting that UTF-8 is in the environment:

[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:31: Current locale uses UTF-8, fallback will not occur
[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:40: Current locale uses UTF-8, fallback will not occur
[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:49: Current locale uses UTF-8, fallback will not occur

I would assume that regexp_test.py uses the same environment, so its tests are enabled by that same logic.

However, for some reason, in the JVM used by the underlying Spark, the locale isn't using UTF-8, so regular expression support is disabled, hence the "part of the plan is not columnar" exceptions:

[2022-07-19T16:00:10.434Z] !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
[2022-07-19T16:00:10.434Z]   @Expression <Alias> split(a#16, [:], -1) AS split(a, [:], -1)#18 could run on GPU
[2022-07-19T16:00:10.434Z]     !Expression <StringSplit> split(a#16, [:], -1) cannot run on GPU because regular expression support is disabled because the GPU only supports the UTF-8 charset when using regular expressions
[2022-07-19T16:00:10.434Z]       @Expression <AttributeReference> a#16 could run on GPU
[2022-07-19T16:00:10.434Z]       @Expression <Literal> [:] could run on GPU
[2022-07-19T16:00:10.434Z]       @Expression <Literal> -1 could run on GPU

So the Python interpreter is inconsistent from the JVM on the driver in this case.

@pxLi
Copy link
Member

pxLi commented Jul 20, 2022

Also filed #6032 which looks like related to recent regex change,
seems this could fail in most scenarios except spark local mode (pre-merge CI)

@gerashegalov
Copy link
Collaborator

@NVnavkumar do we care about the Python process settings?

nl_langinfo's doc does not sound encouraging

This function is not available on all systems, and the set of possible options might also vary across platforms.

Should we just retrieve default charset of the underlying JVM using something like:
sc._jvm.java.nio.charset.Charset.defaultCharset()

@NVnavkumar
Copy link
Collaborator

I think that requires access to the spark context, which I don't know will have been created by the point we would decide whether to skip a test or not

@revans2
Copy link
Collaborator

revans2 commented Jul 20, 2022

Yes we have access to it at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants