You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
To be clear I don't think this is a very important use case. Especially because Spark in inconsistent here too.
A JSON object can technically have repeated keys in it.
{"data":100,"data":200}
The JSON spec goes out of it's way to not specify how to interpret the data in JSON. Just what is and is not technically valid JSON. So repeated keys is ambiguous.
So for json_tuple the last non-null value always wins.
In this the last row is actually a null, not a string with "null" in it.
If I ask for it as a long, then the "200" for the first row is ignored because it cannot be made into a long.
Happily our implementation for get_json_object is already fairly close on this. But not 100% of the way there. It is missing the not null part and only is doing the first match part.
The text was updated successfully, but these errors were encountered:
Describe the bug
To be clear I don't think this is a very important use case. Especially because Spark in inconsistent here too.
A JSON object can technically have repeated keys in it.
The JSON spec goes out of it's way to not specify how to interpret the data in JSON. Just what is and is not technically valid JSON. So repeated keys is ambiguous.
So for
json_tuple
the last non-null value always wins.Data type does not really matter because everything is returned as a String
For
from_json
it is much more complicated because types are involved.Here it appears to be that last always wins if the input data can be manipulated to match the desired type. And null can always be manipulated???
In this the last row is actually a
null
, not a string with "null" in it.If I ask for it as a long, then the "200" for the first row is ignored because it cannot be made into a long.
Double is similar, but even though "100.012" can be made into a double, the last null value still wins, which is different from
json_tuple
Nesting appears to be very similar. It goes with last wins, so long as the data can be manipulated into the desired data type.
get_json_object
appears to always be first non-null match wins.Happily our implementation for
get_json_object
is already fairly close on this. But not 100% of the way there. It is missing the not null part and only is doing the first match part.The text was updated successfully, but these errors were encountered: