You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I don't know how critical this is, but as a part of my investigation into JSON number parsing, specifically decimals I found that Spark passes in a locale when parsing quoted string decimals in particular. This led me to discover that by default BigDecimal parses any character that can be transformed into a digit as a digit, as does java.lang.Long.
Shows that there are at least 350 different characters that are "digits"
scala> new java.math.BigDecimal("1٢७")
res20: java.math.BigDecimal = 127
scala> new java.lang.Long("1٢७")
res21: Long = 127
scala> new java.lang.Float("1٢७")
java.lang.NumberFormatException: For input string: "1٢७"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
at java.lang.Float.parseFloat(Float.java:451)
at java.lang.Float.<init>(Float.java:532)
... 47 elided
Happily this appears to only work for decimal values in Spark.
scala> Seq("1٢७").toDF("what").selectExpr("what", "CAST(what AS LONG) what_long", "CAST(what AS DECIMAL(10,0)) what_dec", "CAST(what AS double) what_double").show()
+----+---------+--------+-----------+
|what|what_long|what_dec|what_double|
+----+---------+--------+-----------+
| 1٢७| null| 127| null|
+----+---------+--------+-----------+
But we are still parsing them differently compared to Spark. This is probably not a big deal, but it is a little scary.
The text was updated successfully, but these errors were encountered:
Describe the bug
I don't know how critical this is, but as a part of my investigation into JSON number parsing, specifically decimals I found that Spark passes in a locale when parsing quoted string decimals in particular. This led me to discover that by default BigDecimal parses any character that can be transformed into a digit as a digit, as does java.lang.Long.
Shows that there are at least 350 different characters that are "digits"
Happily this appears to only work for decimal values in Spark.
But we are still parsing them differently compared to Spark. This is probably not a big deal, but it is a little scary.
The text was updated successfully, but these errors were encountered: