Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable JSON Scan and from_json by default #11753

Merged
merged 4 commits into from
Nov 25, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Nov 22, 2024

this fixes #11630

Note that @ttnghia did a lot of work on MAP support so we are enabling that by default too.

Signed-off-by: Robert (Bobby) Evans <[email protected]>
@revans2
Copy link
Collaborator Author

revans2 commented Nov 22, 2024

build

@ttnghia
Copy link
Collaborator

ttnghia commented Nov 22, 2024

docs/compatibility.md Outdated Show resolved Hide resolved
Spark supports parsing decimal types either formatted as floating point number or integral numbers, even if it is
in a quoted string. If it is in a quoted string the local of the JVM is used to determine the number format.
If the local is not for the `US`, which is the default we will fall back to the CPU because we do not currently
parse those numbers correctly. The `US` format removes all commas ',' from the string.
Copy link
Collaborator

@ttnghia ttnghia Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clarify this, otherwise this can be misunderstood as removing commas from any string either quoted or unquoted.

Suggested change
parse those numbers correctly. The `US` format removes all commas ',' from the string.
parse those numbers correctly. The `US` format removes all commas ',' from the quoted string.

docs/compatibility.md Outdated Show resolved Hide resolved
docs/compatibility.md Outdated Show resolved Hide resolved
@sameerz sameerz added the feature request New feature or request label Nov 25, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Nov 25, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Nov 25, 2024

@ttnghia please take another look

docs/compatibility.md Outdated Show resolved Hide resolved
@ttnghia
Copy link
Collaborator

ttnghia commented Nov 25, 2024

build

@@ -323,7 +323,7 @@ try to call out which releases we offer different results for. JSON parsing is e
except for date and timestamp types where we still have work to complete. If you wish to disable
JSON Scan you can set `spark.rapids.sql.format.json.enabled` or
`spark.rapids.sql.format.json.read.enabled` to false. To disable `from_json` you can set
`spark.rapids.sql.expression.StructsToJson` to false.
`spark.rapids.sql.expression.JsonToStructs` to false.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Generally I prefer to have someone post a change request instead of pushing to my PR directly. That is so that I can keep my internal development branch in sync with what is in the PR.

@revans2 revans2 merged commit 6539441 into NVIDIA:branch-24.12 Nov 25, 2024
49 checks passed
@revans2 revans2 deleted the from_json_on_by_default branch November 25, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] enable from_json and json scan by default
3 participants