You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am filing this to capture what Spark does, but it feels very Spark specific, which could be problematic. The solution here might be related to a solution to #15222 so we don't cause too much performance impact to others.
I don't really know 100% the solution I would like. This is where it gets to be kind of ugly/difficult.
When Spark processes JSON it parses the JSON into tokens, and then converts that back to a String when it is done. This results in things like numbers being converted to integers, doubles or java BigDecimal values, and then converted back to a String. For integers and BigDecimal values (numbers that do not include a decimal point or scientific notation) The processing is mostly a noop.
-0 becomes just 0. If there are any leading zeros on the number, then they are removed (but only if validations didn't already mark that as a problem #15222)
For floating point numbers it is more complicated, and I need to get some more specifics to put in here. The hard part is detecting overflow and converting the number to +/- Infinity. Conversion from scientific notation to regular floating point notation and back. Then there is also making sure that the number fits the actual floating point notation.
I almost want to have a way for me to provide my own code for this to happen, but I'm not sure if there is any good way to do that, because I am nervous that Spark will change some of these things over time.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
I am filing this to capture what Spark does, but it feels very Spark specific, which could be problematic. The solution here might be related to a solution to #15222 so we don't cause too much performance impact to others.
NVIDIA/spark-rapids#10458 is the corresponding issue in the Spark Plugin and NVIDIA/spark-rapids#10218 is related to it.
I don't really know 100% the solution I would like. This is where it gets to be kind of ugly/difficult.
When Spark processes JSON it parses the JSON into tokens, and then converts that back to a String when it is done. This results in things like numbers being converted to integers, doubles or java BigDecimal values, and then converted back to a String. For integers and BigDecimal values (numbers that do not include a decimal point or scientific notation) The processing is mostly a noop.
-0 becomes just 0. If there are any leading zeros on the number, then they are removed (but only if validations didn't already mark that as a problem #15222)
For floating point numbers it is more complicated, and I need to get some more specifics to put in here. The hard part is detecting overflow and converting the number to +/- Infinity. Conversion from scientific notation to regular floating point notation and back. Then there is also making sure that the number fits the actual floating point notation.
I almost want to have a way for me to provide my own code for this to happen, but I'm not sure if there is any good way to do that, because I am nervous that Spark will change some of these things over time.
The text was updated successfully, but these errors were encountered: