Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for the timestamp error case described in SPARK-10177 #8693

Closed
Tracked by #8666
mythrocks opened this issue Jul 11, 2023 · 3 comments · Fixed by #8700
Closed
Tracked by #8666

Add test for the timestamp error case described in SPARK-10177 #8693

mythrocks opened this issue Jul 11, 2023 · 3 comments · Fixed by #8700
Assignees
Labels
task Work required that improves the product but is not user facing test Only impacts tests

Comments

@mythrocks
Copy link
Collaborator

mythrocks commented Jul 11, 2023

Apache JIRA: https://issues.apache.org/jira/browse/SPARK-10177

Refer to ParquetHiveCompatibilitySuite.scala#L97.

We just need to add a PySpark test to check that the following data is written/read the same on spark-rapids and Apache Spark:

CREATE TABLE ts_test STORED AS PARQUET
AS SELECT CAST("2015-01-01 00:00:00" AS TIMESTAMP) problem_timestamp;
@mythrocks mythrocks changed the title SPARK-10177: Timestamps read incorrectly. Refer to the test in ParquetHiveCompatibility.scala#L97 Add test for the timestamp error case described in SPARK-10177 Jul 11, 2023
@mythrocks mythrocks added test Only impacts tests task Work required that improves the product but is not user facing labels Jul 11, 2023
@razajafri
Copy link
Collaborator

razajafri commented Jul 12, 2023

This seems like the bug was in DateTimeUtils as evident from the comment here. Considering that is this still a valid test case that we need in parquet_write_test.py?

@mythrocks
Copy link
Collaborator Author

Hmm. In hindsight, probably not top priority. I'd still add it, just in case there's a similar bug in CUDF, since it can be tested with a single timestamp record.

@mythrocks
Copy link
Collaborator Author

Sorry for the red-herring in the description. I have corrected the suggested SQL to include a column alias.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing test Only impacts tests
Projects
None yet
2 participants