Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017 #9781

Closed
gerashegalov opened this issue Nov 17, 2023 · 2 comments · Fixed by #9871
Closed

[BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017 #9781

gerashegalov opened this issue Nov 17, 2023 · 2 comments · Fixed by #9871
Assignees
Labels
bug Something isn't working

Comments

@gerashegalov
Copy link
Collaborator

gerashegalov commented Nov 17, 2023

Describe the bug
test_cast_string_date_valid_format fails

    def fromInternal(self, v):
        if v is not None:
>           return datetime.date.fromordinal(v + self.EPOCH_ORDINAL)
          ValueError: ordinal must be >= 1

Local Repro

SPARK_HOME=~/dist/spark-3.3.0-bin-hadoop3  \
SPARK_RAPIDS_TEST_DATAGEN_SEED=1700250017 \
./integration_tests/run_pyspark_from_build.sh -k test_cast_string_date_valid_format

Environment details (please complete the following information)
AWS DBR 10.4

ShimLoader: Complete Spark build info: 3.2.1-databricks, https://git-wip-us.apache.org/repos/asf/spark.git, {branch}, 0000000000000000000000000000000000000000, <unknown>
NFO ShimLoader: Scala version: version 2.12.14

Additional context
N/A

@gerashegalov gerashegalov added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 17, 2023
@gerashegalov
Copy link
Collaborator Author

@gerashegalov gerashegalov changed the title [BUG] test_cast_string_date_valid_format fails on DBR 10.4 (321db) CI [BUG] test_cast_string_date_valid_format fails Nov 18, 2023
@gerashegalov gerashegalov changed the title [BUG] test_cast_string_date_valid_format fails [BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017 Nov 18, 2023
pxLi pushed a commit that referenced this issue Nov 20, 2023
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 21, 2023
@jlowe
Copy link
Member

jlowe commented Nov 22, 2023

The problem is triggered when the string generator emits a value that has only zeros for the year but also has a valid month and day ordinal. In this case, the offending string was 0000-2-05. Spark handles this properly in the JVM, but when pyspark tries to marshal the date integral value into a Python datetime it throws an error since datetime doesn't support dates before year 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants