try to avoid precision loss #983

pjfanning · 2023-04-06T21:13:21Z

see JSON precision loss on copyCurrentEvent() for floats that require greater than double precision #730
prefer BigDecimal/BigInteger to avoid losing precision

src/main/java/com/fasterxml/jackson/core/JsonGenerator.java

cowtowncoder · 2023-04-06T21:41:10Z

src/main/java/com/fasterxml/jackson/core/JsonGenerator.java

+                writeNumber((BigInteger) n);
+            } else if (n instanceof BigDecimal) {
+                final BigDecimal bd = (BigDecimal) n;
+                p.streamReadConstraints().validateBigIntegerScale(bd.scale());


in the 'failing' test case, we end up with getNumberExact returning this exact case (a BigDecimal)

BigDecimal makes sense, that validation is what I don't understand (as we are not converting).

ok, I've removed the validation

cowtowncoder · 2023-04-06T21:47:26Z

src/main/java/com/fasterxml/jackson/core/JsonGenerator.java

+                    writeNumber((Float) n);
+                } else if (n instanceof BigDecimal) {
+                    final BigDecimal bd = (BigDecimal) n;
+                    p.streamReadConstraints().validateBigIntegerScale(bd.scale());


Same as above, this validation does not make sense.

src/main/java/com/fasterxml/jackson/core/JsonGenerator.java

cowtowncoder

LGTM, will merge

cowtowncoder · 2023-04-06T22:40:08Z

This works well for the issue (passing!), for jackson-databind, and almost all format backends.

But frustratingly there is ONE new test failure for CBOR (in https://github.com/FasterXML/jackson-dataformats-binary/) for "parser.nextTextValue()". I don't think this is necessarily a problem with change here, but might expose some problem CBORParser state keeping. Interestingly enough seems to be related to float type handling...

cowtowncoder · 2023-04-06T22:53:42Z

Ah-ha. I think it's StringRef changes from:

FasterXML/jackson-dataformats-binary#347

that are the root cause; and maybe changes here simply caused different encoding of number in CBOR when copying (float instead of... double maybe).

pjfanning · 2023-04-06T23:18:30Z

The JsonGenerator change means the number in the broken test will be output differently. The CBOR code writes BigDecimals very differently from how it writes doubles/floats.

If needs be, we could hack CBORGenerator to override the new behaviour and to work more like it did before.

cowtowncoder · 2023-04-06T23:27:42Z

@pjfanning While it is true that handling changes, the bug I see is almost certainly not due to that: it's more a combination of test code and bug (I think) in CBOR for new (in 2.15) StringRef stuff -- and/or handling of BigDecimal.

So: basically change here means that test code in CBORTestBase creates slightly different CBOR Doc -- instead of 0.5 as double (64-bit fixed) it is written as BigDecimal equivalent (since copy method now plays it safe).
Test itself would pass except that somehow decoder gets confused OR exposes START_ARRAY instead of BigDecimal.
Latter is because under the hood CBOR encodes BigDecimal using 2 BigInteger equivalents, as an array (not unlike Smile, expect Smile does not use Array construct). But that logical array should not be exposed (and is not planned to)

here-abarany · 2023-04-07T00:19:15Z

Just so there's a record in this conversation, the cause was the optimized nextTextValue() function wasn't set up to handle extension tags for multiple types, most notably arrays for this case. This was a bug in previous versions as well, but wasn't triggered until now due to the specifics of how the test was set up.

A separate question is whether it's beneficial to write a BigDecimal representation, at least without a way to disable this behavior, in cases such as passing a JsonParser to a CBORGenerator. Since this is an extension for the file format, it makes the files less portable when used with other parsers. Other binary formats (such as Smile) may not have the portability issue, though performance and/or file size may be a concern if you are storing files that contain many, many doubles.

cowtowncoder · 2023-04-07T17:08:26Z

I am leaning towards reverting this change, due to problems @here-abarany is pointing out.
I just need to check that TokenBuffer from jackson-databind deals with things already (which I think it does); and I may want to add a new method (copyCurrentEventExact(...)) that would handle accuracy in improved way, for users that want that.
(whether to add copyCurrentStructureExact() is another question).

cowtowncoder · 2023-04-07T17:24:19Z

Ah ok: TokenBuffer overrides both methods so it does not rely on default implementations.
It also relies more on JsonParser.getNumberValueDeferred() that we added to defer decoding for textual values (until true result type we want is needed).

So I think reversion is fine.

pjfanning · 2023-04-07T17:38:01Z

This change fixes an issue. If we revert it, what are the plans for the future revisiting of this issue? I prefer correctness over performance. Do we have any test scenarios that can be looked at? Proof of serious perf issues caused by this change.

cowtowncoder · 2023-04-07T17:54:00Z

@pjfanning There are two parts to this: the original report is quite far removed from the test(s) I added -- so my test is tied to implementation, but does not necessarily prove original issue could not be resolved: we have added more functionality to allow accurate retaining of content. We also have not released the fix yet so it's not a regression in that sense, even if test did recreate problem to fix.

The other part is that like I said we can (and should I think) add a new method -- copyCurrentEventExact() -- to be used when maximum accuracy retaining is needed. Existing method can provide its existing implementation.

So while I agree that accuracy is, in general, preferred, I think that at this point change of actual functionality is risky and it is better to add new functionality over changes that risk breaking (in some sense) existing usage.

try to avoid precision loss

f2e845a