Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try to avoid precision loss #983

Merged
merged 6 commits into from
Apr 6, 2023
Merged

Conversation

pjfanning
Copy link
Member

writeNumber((BigInteger) n);
} else if (n instanceof BigDecimal) {
final BigDecimal bd = (BigDecimal) n;
p.streamReadConstraints().validateBigIntegerScale(bd.scale());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the 'failing' test case, we end up with getNumberExact returning this exact case (a BigDecimal)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigDecimal makes sense, that validation is what I don't understand (as we are not converting).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've removed the validation

writeNumber((Float) n);
} else if (n instanceof BigDecimal) {
final BigDecimal bd = (BigDecimal) n;
p.streamReadConstraints().validateBigIntegerScale(bd.scale());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this validation does not make sense.

Copy link
Member

@cowtowncoder cowtowncoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will merge

@cowtowncoder cowtowncoder merged commit dbf0f7c into FasterXML:2.15 Apr 6, 2023
@cowtowncoder
Copy link
Member

This works well for the issue (passing!), for jackson-databind, and almost all format backends.

But frustratingly there is ONE new test failure for CBOR (in https://github.com/FasterXML/jackson-dataformats-binary/) for "parser.nextTextValue()". I don't think this is necessarily a problem with change here, but might expose some problem CBORParser state keeping. Interestingly enough seems to be related to float type handling...

@cowtowncoder
Copy link
Member

cowtowncoder commented Apr 6, 2023

Ah-ha. I think it's StringRef changes from:

FasterXML/jackson-dataformats-binary#347

that are the root cause; and maybe changes here simply caused different encoding of number in CBOR when copying (float instead of... double maybe).

@pjfanning
Copy link
Member Author

The JsonGenerator change means the number in the broken test will be output differently. The CBOR code writes BigDecimals very differently from how it writes doubles/floats.

If needs be, we could hack CBORGenerator to override the new behaviour and to work more like it did before.

@cowtowncoder
Copy link
Member

cowtowncoder commented Apr 6, 2023

@pjfanning While it is true that handling changes, the bug I see is almost certainly not due to that: it's more a combination of test code and bug (I think) in CBOR for new (in 2.15) StringRef stuff -- and/or handling of BigDecimal.

So: basically change here means that test code in CBORTestBase creates slightly different CBOR Doc -- instead of 0.5 as double (64-bit fixed) it is written as BigDecimal equivalent (since copy method now plays it safe).
Test itself would pass except that somehow decoder gets confused OR exposes START_ARRAY instead of BigDecimal.
Latter is because under the hood CBOR encodes BigDecimal using 2 BigInteger equivalents, as an array (not unlike Smile, expect Smile does not use Array construct). But that logical array should not be exposed (and is not planned to)

@here-abarany
Copy link

Just so there's a record in this conversation, the cause was the optimized nextTextValue() function wasn't set up to handle extension tags for multiple types, most notably arrays for this case. This was a bug in previous versions as well, but wasn't triggered until now due to the specifics of how the test was set up.

A separate question is whether it's beneficial to write a BigDecimal representation, at least without a way to disable this behavior, in cases such as passing a JsonParser to a CBORGenerator. Since this is an extension for the file format, it makes the files less portable when used with other parsers. Other binary formats (such as Smile) may not have the portability issue, though performance and/or file size may be a concern if you are storing files that contain many, many doubles.

@cowtowncoder
Copy link
Member

I am leaning towards reverting this change, due to problems @here-abarany is pointing out.
I just need to check that TokenBuffer from jackson-databind deals with things already (which I think it does); and I may want to add a new method (copyCurrentEventExact(...)) that would handle accuracy in improved way, for users that want that.
(whether to add copyCurrentStructureExact() is another question).

@cowtowncoder
Copy link
Member

Ah ok: TokenBuffer overrides both methods so it does not rely on default implementations.
It also relies more on JsonParser.getNumberValueDeferred() that we added to defer decoding for textual values (until true result type we want is needed).

So I think reversion is fine.

@pjfanning
Copy link
Member Author

pjfanning commented Apr 7, 2023

This change fixes an issue. If we revert it, what are the plans for the future revisiting of this issue? I prefer correctness over performance. Do we have any test scenarios that can be looked at? Proof of serious perf issues caused by this change.

@cowtowncoder
Copy link
Member

@pjfanning There are two parts to this: the original report is quite far removed from the test(s) I added -- so my test is tied to implementation, but does not necessarily prove original issue could not be resolved: we have added more functionality to allow accurate retaining of content. We also have not released the fix yet so it's not a regression in that sense, even if test did recreate problem to fix.

The other part is that like I said we can (and should I think) add a new method -- copyCurrentEventExact() -- to be used when maximum accuracy retaining is needed. Existing method can provide its existing implementation.

So while I agree that accuracy is, in general, preferred, I think that at this point change of actual functionality is risky and it is better to add new functionality over changes that risk breaking (in some sense) existing usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants