[BUG] Decimal parses non-arabic numbers #10532

revans2 · 2024-03-01T19:24:25Z

Describe the bug
I don't know how critical this is, but as a part of my investigation into JSON number parsing, specifically decimals I found that Spark passes in a locale when parsing quoted string decimals in particular. This led me to discover that by default BigDecimal parses any character that can be transformed into a digit as a digit, as does java.lang.Long.

(0 until 32767*2 + 1).map(i => i.toChar).filter(c => Character.isDigit(c)).map(c => (Character.digit(c, 10), c))

Shows that there are at least 350 different characters that are "digits"

scala> new java.math.BigDecimal("1٢७")
res20: java.math.BigDecimal = 127

scala> new java.lang.Long("1٢७")
res21: Long = 127

scala> new java.lang.Float("1٢७")
java.lang.NumberFormatException: For input string: "1٢७"
  at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
  at sun.misc.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
  at java.lang.Float.parseFloat(Float.java:451)
  at java.lang.Float.<init>(Float.java:532)
  ... 47 elided

Happily this appears to only work for decimal values in Spark.

scala> Seq("1٢७").toDF("what").selectExpr("what", "CAST(what AS LONG) what_long", "CAST(what AS DECIMAL(10,0)) what_dec", "CAST(what AS double) what_double").show()
+----+---------+--------+-----------+
|what|what_long|what_dec|what_double|
+----+---------+--------+-----------+
| 1٢७|     null|     127|       null|
+----+---------+--------+-----------+

But we are still parsing them differently compared to Spark. This is probably not a big deal, but it is a little scary.

The text was updated successfully, but these errors were encountered:

revans2 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 1, 2024

sameerz removed the ? - Needs Triage Need team to review and classify label Mar 5, 2024

sameerz changed the title ~~[BUG] Decimal parses non-aribic numbers~~ [BUG] Decimal parses non-arabic numbers Mar 6, 2024

revans2 mentioned this issue Oct 18, 2024

[FEA] enable from_json and json scan by default #11630

Closed

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Decimal parses non-arabic numbers #10532

[BUG] Decimal parses non-arabic numbers #10532

revans2 commented Mar 1, 2024

[BUG] Decimal parses non-arabic numbers #10532

[BUG] Decimal parses non-arabic numbers #10532

Comments

revans2 commented Mar 1, 2024