[FEA] JSON number normalization when returned as a string #15318

revans2 · 2024-03-15T15:50:44Z

Is your feature request related to a problem? Please describe.
I am filing this to capture what Spark does, but it feels very Spark specific, which could be problematic. The solution here might be related to a solution to #15222 so we don't cause too much performance impact to others.

NVIDIA/spark-rapids#10458 is the corresponding issue in the Spark Plugin and NVIDIA/spark-rapids#10218 is related to it.

I don't really know 100% the solution I would like. This is where it gets to be kind of ugly/difficult.

When Spark processes JSON it parses the JSON into tokens, and then converts that back to a String when it is done. This results in things like numbers being converted to integers, doubles or java BigDecimal values, and then converted back to a String. For integers and BigDecimal values (numbers that do not include a decimal point or scientific notation) The processing is mostly a noop.

-0 becomes just 0. If there are any leading zeros on the number, then they are removed (but only if validations didn't already mark that as a problem #15222)

For floating point numbers it is more complicated, and I need to get some more specifics to put in here. The hard part is detecting overflow and converting the number to +/- Infinity. Conversion from scientific notation to regular floating point notation and back. Then there is also making sure that the number fits the actual floating point notation.

I almost want to have a way for me to provide my own code for this to happen, but I'm not sure if there is any good way to do that, because I am nervous that Spark will change some of these things over time.

ttnghia · 2024-03-27T20:56:15Z

getJsonObject number normalization NVIDIA/spark-rapids-jni#1897

revans2 added feature request New feature or request cuIO cuIO issue Spark Functionality that helps Spark RAPIDS labels Mar 15, 2024

This was referenced Mar 15, 2024

[FEA] JSON reader improvements for Spark-RAPIDS #13525

Open

[FEA] JSON reader parses types compatible with Spark NVIDIA/spark-rapids#4609

Open

[FEA] JSON input support NVIDIA/spark-rapids#9

Open

revans2 mentioned this issue Oct 18, 2024

[FEA] enable from_json and json scan by default NVIDIA/spark-rapids#11630

Closed

34 tasks

karthikeyann added this to the Nested JSON reader milestone Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] JSON number normalization when returned as a string #15318

[FEA] JSON number normalization when returned as a string #15318

revans2 commented Mar 15, 2024

ttnghia commented Mar 27, 2024 •

edited

Loading

[FEA] JSON number normalization when returned as a string #15318

[FEA] JSON number normalization when returned as a string #15318

Comments

revans2 commented Mar 15, 2024

ttnghia commented Mar 27, 2024 • edited Loading

ttnghia commented Mar 27, 2024 •

edited

Loading