[BUG] regexp_test is failing in nightly tests #6028

jlowe · 2022-07-19T17:02:34Z

Latest nightly run had 120 test failures in regexp_test. Sampling a few, they were all of the "plan is not columnar" variety.

jlowe · 2022-07-19T17:04:47Z

Wondering if this is related to #5776? cc: @NVnavkumar

NVnavkumar · 2022-07-19T17:43:13Z

Weird, it's supposed to skip those tests actually. Let me take a look.

NVnavkumar · 2022-07-19T19:18:38Z

So there are 2 sets of integration tests: regexp_test.py and regexp_no_unicode_test.py

In regexp_test.py, there is this guard for skipping the test:

if locale.nl_langinfo(locale.CODESET) != 'UTF-8':
    pytestmark = [pytest.mark.regexp, pytest.mark.skip(reason=str("Current locale doesn't support UTF-8, regexp support is disabled"))]
else:
    pytestmark = pytest.mark.regexp

In regexp_no_unicode_test.py, it's detecting that UTF-8 is in the environment:

[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:31: Current locale uses UTF-8, fallback will not occur
[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:40: Current locale uses UTF-8, fallback will not occur
[2022-07-19T16:00:20.222Z] SKIPPED [1] ../../src/main/python/regexp_no_unicode_test.py:49: Current locale uses UTF-8, fallback will not occur

I would assume that regexp_test.py uses the same environment, so its tests are enabled by that same logic.

However, for some reason, in the JVM used by the underlying Spark, the locale isn't using UTF-8, so regular expression support is disabled, hence the "part of the plan is not columnar" exceptions:

[2022-07-19T16:00:10.434Z] !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
[2022-07-19T16:00:10.434Z]   @Expression <Alias> split(a#16, [:], -1) AS split(a, [:], -1)#18 could run on GPU
[2022-07-19T16:00:10.434Z]     !Expression <StringSplit> split(a#16, [:], -1) cannot run on GPU because regular expression support is disabled because the GPU only supports the UTF-8 charset when using regular expressions
[2022-07-19T16:00:10.434Z]       @Expression <AttributeReference> a#16 could run on GPU
[2022-07-19T16:00:10.434Z]       @Expression <Literal> [:] could run on GPU
[2022-07-19T16:00:10.434Z]       @Expression <Literal> -1 could run on GPU

So the Python interpreter is inconsistent from the JVM on the driver in this case.

pxLi · 2022-07-20T02:27:15Z

Also filed #6032 which looks like related to recent regex change,
seems this could fail in most scenarios except spark local mode (pre-merge CI)

gerashegalov · 2022-07-20T19:33:16Z

@NVnavkumar do we care about the Python process settings?

nl_langinfo's doc does not sound encouraging

This function is not available on all systems, and the set of possible options might also vary across platforms.

Should we just retrieve default charset of the underlying JVM using something like:
sc._jvm.java.nio.charset.Charset.defaultCharset()

NVnavkumar · 2022-07-20T19:39:32Z

I think that requires access to the spark context, which I don't know will have been created by the point we would decide whether to skip a test or not

revans2 · 2022-07-20T19:41:30Z

Yes we have access to it at that point.

jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify P0 Must have for release labels Jul 19, 2022

NVnavkumar self-assigned this Jul 19, 2022

sameerz removed the ? - Needs Triage Need team to review and classify label Jul 19, 2022

pxLi mentioned this issue Jul 20, 2022

[BUG] Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec failure #6032

Closed

NVnavkumar mentioned this issue Jul 20, 2022

Improve check for UTF-8 in integration tests by testing from the JVM #6041

Merged

NVnavkumar linked a pull request Jul 21, 2022 that will close this issue

Improve check for UTF-8 in integration tests by testing from the JVM #6041

Merged

NVnavkumar closed this as completed in #6041 Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] regexp_test is failing in nightly tests #6028

[BUG] regexp_test is failing in nightly tests #6028

jlowe commented Jul 19, 2022

jlowe commented Jul 19, 2022

NVnavkumar commented Jul 19, 2022

NVnavkumar commented Jul 19, 2022

pxLi commented Jul 20, 2022 •

edited

Loading

gerashegalov commented Jul 20, 2022

NVnavkumar commented Jul 20, 2022

revans2 commented Jul 20, 2022

[BUG] regexp_test is failing in nightly tests #6028

[BUG] regexp_test is failing in nightly tests #6028

Comments

jlowe commented Jul 19, 2022

jlowe commented Jul 19, 2022

NVnavkumar commented Jul 19, 2022

NVnavkumar commented Jul 19, 2022

pxLi commented Jul 20, 2022 • edited Loading

gerashegalov commented Jul 20, 2022

NVnavkumar commented Jul 20, 2022

revans2 commented Jul 20, 2022

pxLi commented Jul 20, 2022 •

edited

Loading