Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JSON parsing common between JsonToStructs and ScanJson #10542

Merged
merged 8 commits into from
Mar 11, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Mar 4, 2024

This fixes #10456
This fixes #10467
This fixes #10469
This fixes #10494
This fixes #9588
This fixes #4779
This fixes #10470
This fixes #10460
This fixes #10468

The fixes are mostly from combining fixes that were in ScanJson with fixes that were in JsonToStructs, which removed a lot of corner cases where one had fixed some things and the other had fixed others.

Sadly it did not fix the date/timestamp parsing. The Date/Timestamp parsing is based off of ScanJson, which was more liberal in many areas compared to from_json, but does not support reading numbers as dates/timestamps like from_json partially did. It caused a number of tests in from_json to start failing partly because it is not falling back in cases that ScanJson has problems.

As a part of this it also added in some nested type support for ScanJson with tests added from #10325

With this we also have adopted the configs spark.rapids.sql.json.read.float.enabled, spark.rapids.sql.json.read.double.enabled, and spark.rapids.sql.json.read.decimal.enabled for from_json.

@revans2
Copy link
Collaborator Author

revans2 commented Mar 4, 2024

build

@revans2 revans2 requested a review from andygrove March 4, 2024 22:31
@sameerz sameerz added the task Work required that improves the product but is not user facing label Mar 6, 2024
andygrove
andygrove previously approved these changes Mar 7, 2024
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @revans2

@revans2
Copy link
Collaborator Author

revans2 commented Mar 7, 2024

build

1 similar comment
@revans2
Copy link
Collaborator Author

revans2 commented Mar 7, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Mar 7, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Mar 7, 2024

@andygrove could you please take another look? I had gotten some schema mixed up that fixed things for JSON, but ended up breaking CSV. That should be fixed now.

@revans2 revans2 merged commit 3d3ade2 into NVIDIA:branch-24.04 Mar 11, 2024
42 of 43 checks passed
@revans2 revans2 deleted the make_json_common branch March 11, 2024 14:48
@revans2
Copy link
Collaborator Author

revans2 commented Mar 13, 2024

This also fixed #4647

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment