Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of precision when serializing <double> #360

Closed
matspetter opened this issue Nov 17, 2016 · 39 comments · Fixed by #915
Closed

Loss of precision when serializing <double> #360

matspetter opened this issue Nov 17, 2016 · 39 comments · Fixed by #915
Assignees
Labels
solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Milestone

Comments

@matspetter
Copy link

It seems that precision is lost when serializing a double. I cannot say why since
std::numeric_limits::digits10 should provide enough digits !?
but if I change that to:
std::numeric_limits::digits10 then I'm not loosing anything.
I'm NOT using and "long double" types. Only "double".

@nlohmann
Copy link
Owner

Do you have an example to check?

@gregmarr
Copy link
Contributor

digits10
number of decimal digits that can be represented without change
that is, any number with this many decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow.

max_digits10
number of decimal digits necessary to differentiate all values of this type
that is, the number of base-10 digits that are necessary to uniquely represent all distinct values of the type T, such as necessary for serialization/deserialization to text.

@matspetter
Copy link
Author

matspetter commented Nov 17, 2016

Ok lets see. This code was run on a Mac.
g++ --version

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
#include <string>
#include <sstream>
#include <iostream>
#include <limits>

int main(int argc,char** argv)
{
    double v = 100000000000.1236;

    int p1 = std::numeric_limits<double>::digits10;
    int p2 = std::numeric_limits<double>::max_digits10;

    // stream with precision == digits10
    std::stringstream ss;
    ss.precision(p1);
    ss << v;
    std::cout << "digits10     " << p1 << ": " << ss.str() << std::endl;

    // stream with precision == max_digits10
    std::stringstream ss2;
    ss2.precision(p2);
    ss2 << v;
    std::cout  << "max_digits10 " << p2 << ": " << ss2.str() << std::endl;

    // Read back and compare with original
    double v1,v2;
    ss >> v1;
    ss2 >> v2;

    std::cout << "v==v1 : " << ((v==v1)?"true":"false") << std::endl;
    std::cout << "v==v2 : " << ((v==v2)?"true":"false") << std::endl;
}

output:

digits10     15: 100000000000.124
max_digits10 17: 100000000000.1236
v==v1 : false
v==v2 : true

It is not easy with floating points and precision but this tells me that the streaming seems more correct when using "max_digits10"

@matspetter
Copy link
Author

It also gives the same result/output on a Ubuntu system:
g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

uname -a
Linux mbergg12-ubuntu15 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

@TurpentineDistillery
Copy link

This is not a bug, but a reality of dealing with floating-point representation.
If one uses more than digits10 digits of precision, then string->value->string is not guaranteed to round-trip.

@gregmarr
Copy link
Contributor

gregmarr commented Nov 19, 2016

@TurpentineDistillery However using max_digits10 on output guarantees that value->string->value will not change. The difference being that not all strings of length max_digits10 can be generated by value->string with max_digits10 precision. So as long as string was generated by outputting a value, it's safe. The issue comes when the string was just an arbitrary string of digits.

@nlohmann
Copy link
Owner

This issue can be closed, right?

@matspetter
Copy link
Author

matspetter commented Nov 24, 2016 via email

@nlohmann nlohmann added the solution: invalid the issue is not related to the library label Nov 24, 2016
@nlohmann
Copy link
Owner

Thanks for the quick response!

@timueller
Copy link

I don't see how this is not a bug. The OP's example exactly illustrates the problem of value -> string -> value serialization/deserialization. We need to store double precision data in a json string form and read it back without loss of precision and have run into exactly the same problem.

@nlohmann
Copy link
Owner

What would you propose?

@timueller
Copy link

Is there a problem if you just switch to using max_digits10 for both dump() and operator<< ?

@nlohmann
Copy link
Owner

Then numbers like 2312.42 would be round-tripped to 2312.4200000000001.

@abolz
Copy link
Contributor

abolz commented Oct 11, 2017

I just ran into the same problem. In my opinion, the string->value->string round-trip isn't really relevant, the value->string->value round-trip is. And the (minimum) number of decimal digits to distinguish all floating-point values is max_digits10, not digits10.

Note that currently "2312.4200000000001" does not "round-trip" to "2312.4200000000001". Actually a string->value->string round-trip cannot be guaranteed unless the first string is generated by the same implementation as the second (and then there is an initial value->string->value round-trip required...).

So I think that value->string->value should be guaranteed (when using the same library for serialization/deserialization).

@nlohmann
Copy link
Owner

See #360 (comment).

@gregmarr
Copy link
Contributor

@nlohmann My comment actually supports using max_digits10. It was a comment on this statement:

If one uses more than digits10 digits of precision, then string->value->string is not guaranteed to round-trip.

I was pointing out that this is true, but only for strings that were written by a value->string conversion not using the full precisions, or were written by hand. As such, I think it's fine for those values to not be preserved exactly.

@abolz
Copy link
Contributor

abolz commented Oct 11, 2017

I think @gregmarr is saying the same. My expectation is that if I have a json object, assign any double precision value to it, serialize it to disk and later deserialize it again, the values should be exactly equal. This requires using max_digits10.

@nlohmann
Copy link
Owner

I think one reason for the status quo were the roundtrip results of https://github.com/miloyip/nativejson-benchmark. I'm not sure whether there exists the one right solution, so we need to make a decision.

@abolz
Copy link
Contributor

abolz commented Oct 28, 2017

I have an implementation of the Grisu2 algorithm for printing floating-point numbers, based on the reference implementation by Florian Loitsch. It works for IEEE float/double (but not long double) and produces a short representation which is guaranteed to round-trip. I could submit a PR if you like, so we have something to start with.

@nlohmann nlohmann reopened this Oct 28, 2017
@nlohmann nlohmann added state: please discuss please discuss the issue or vote for your favorite option and removed solution: invalid the issue is not related to the library labels Oct 28, 2017
@ojwoodford
Copy link

ojwoodford commented Oct 30, 2017

I just hit this issue. I store unit test data in JSON and a new unit test is failing because of this loss of precision. Is there any reason why std::setprecision shouldn't work on an ostream I'm passing a json object into?

@gregmarr
Copy link
Contributor

It doesn't use the ostream formatting for floating point numbers.
https://github.com/nlohmann/json/blob/master/src/json.hpp#L8324
https://github.com/nlohmann/json/blob/develop/src/json.hpp#L6701

If you change these to max_digits10 instead of digits10 it should fix your failures.

@pvleuven
Copy link

pvleuven commented Nov 17, 2017

I just ran into the same issue as @gregmarr described and switching to max_digits10 seems to work

json j1 = 1.0 / 3.0;
json j2 = json::parse( j1.dump() );
bool is_eq = *( j1.get_ptr<json::number_float_t*>() ) == *( j2.get_ptr<json::number_float_t*>() );
// is_eq is false with digits10, true with max_digits10

@nlohmann
Copy link
Owner

Hi all. I shall change digits10 to max_digits10.

I lot of test cases fail. It seems that they focus on the string->number->string case:

      Start 36: test-inspection_all
36/70 Test #36: test-inspection_all .................***Failed    5.77 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-inspection is a Catch v1.9.7 host application.
Run with -? for options

-------------------------------------------------------------------------------
object inspection
  serialization
  dump and floating-point numbers
-------------------------------------------------------------------------------
../test/src/unit-inspection.cpp:235
...............................................................................

../test/src/unit-inspection.cpp:238: FAILED:
  CHECK( s.find("42.23") != std::string::npos )
with expansion:
  18446744073709551615 (0xffffffffffffffff)
  !=
  18446744073709551615 (0xffffffffffffffff)

-------------------------------------------------------------------------------
object inspection
  serialization
  dump and small floating-point numbers
-------------------------------------------------------------------------------
../test/src/unit-inspection.cpp:241
...............................................................................

../test/src/unit-inspection.cpp:244: FAILED:
  CHECK( s.find("1.23456e-78") != std::string::npos )
with expansion:
  18446744073709551615 (0xffffffffffffffff)
  !=
  18446744073709551615 (0xffffffffffffffff)

===============================================================================
test cases:   1 |   0 passed | 1 failed
assertions: 150 | 148 passed | 2 failed
      Start 62: test-regression_all
62/70 Test #62: test-regression_all .................***Failed    5.59 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-regression is a Catch v1.9.7 host application.
Run with -? for options

-------------------------------------------------------------------------------
regression tests
  issue #228 - double values are serialized with commas as decimal points
-------------------------------------------------------------------------------
../test/src/unit-regression.cpp:417
...............................................................................

../test/src/unit-regression.cpp:453: FAILED:
  CHECK( j1a.dump() == "2312.42" )
with expansion:
  "2312.4200000000001" == "2312.42"

../test/src/unit-regression.cpp:454: FAILED:
  CHECK( j1b.dump() == "2312.42" )
with expansion:
  "2312.4200000000001" == "2312.42"

../test/src/unit-regression.cpp:462: FAILED:
  CHECK( ss.str() == "4.712,112312.42" )
with expansion:
  "4.712,112312.4200000000001"
  ==
  "4.712,112312.42"

../test/src/unit-regression.cpp:464: FAILED:
  CHECK( ss.str() == "4.712,112312.4247,11" )
with expansion:
  "4.712,112312.420000000000147,11"
  ==
  "4.712,112312.4247,11"

../test/src/unit-regression.cpp:466: FAILED:
  CHECK( j2a.dump() == "23.42" )
with expansion:
  "23.420000000000002" == "23.42"

-------------------------------------------------------------------------------
regression tests
  issue #380 - bug in overflow detection when parsing integers
-------------------------------------------------------------------------------
../test/src/unit-regression.cpp:784
...............................................................................

../test/src/unit-regression.cpp:788: FAILED:
  CHECK( j.dump() == "1.66020696663386e+20" )
with expansion:
  "1.6602069666338596e+20"
  ==
  "1.66020696663386e+20"

===============================================================================
test cases:   1 |   0 passed | 1 failed
assertions: 408 | 402 passed | 6 failed
      Start 66: test-testsuites_all
66/70 Test #66: test-testsuites_all .................***Failed    0.07 sec

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-testsuites is a Catch v1.9.7 host application.
Run with -? for options

-------------------------------------------------------------------------------
compliance tests from nativejson-benchmark
  roundtrip
-------------------------------------------------------------------------------
../test/src/unit-testsuites.cpp:281
...............................................................................

../test/src/unit-testsuites.cpp:328: FAILED:
  CHECK( j.dump() == json_string )
with expansion:
  "[1.2344999999999999]" == "[1.2345]"
with messages:
  filename := "test/data/json_roundtrip/roundtrip22.json"
  json_string := "[1.2345]"

../test/src/unit-testsuites.cpp:328: FAILED:
  CHECK( j.dump() == json_string )
with expansion:
  "[-1.2344999999999999]" == "[-1.2345]"
with messages:
  filename := "test/data/json_roundtrip/roundtrip23.json"
  json_string := "[-1.2345]"

../test/src/unit-testsuites.cpp:328: FAILED:
  CHECK( j.dump() == json_string )
with expansion:
  "[2.2250738585071999e-308]"
  ==
  "[2.2250738585072e-308]"
with messages:
  filename := "test/data/json_roundtrip/roundtrip29.json"
  json_string := "[2.2250738585072e-308]"

===============================================================================
test cases:   7 |   6 passed | 1 failed
assertions: 974 | 971 passed | 3 failed

It would be great if you could have a look at these tests and tell me why it's OK to change or ignore them.

nlohmann added a commit that referenced this issue Nov 28, 2017
@ojwoodford
Copy link

A string is higher precision than a double (e.g. the former can represent 1.2345 exactly; the latter cannot), so converting from string -> double -> string can lead to a change in value, whereas double -> string -> double should not. For this reason, it's not clear to me why you would have exact tests on the former; they should have a tolerance.

@gregmarr
Copy link
Contributor

I think several of these came from an external benchmark that valued the "load and resave a JSON file with exact values" benchmark. I agree that those are not necessarily something that we should care about.

@nlohmann
Copy link
Owner

The roundtrip tests (string -> JSON -> string) come from here: https://github.com/miloyip/nativejson-benchmark/tree/master/data/roundtrip

@lwinkler
Copy link

lwinkler commented Dec 7, 2017

Another strange behavior happening with serialization. Here I serialized a double with the DBL_MAX value (1.79769e+308). The resulting string value becomes larger than DBL_MAX and cannot be parsed back. (I post this here as it seems to be related.)

#include<iostream>
#include<float.h>
#include<cstdio>
#include"json.hpp"

using nlohmann::json;
using namespace std;


int main() {
	double d = DBL_MAX;
	json js;
	js["max"] = d;

	stringstream ss;
	ss << js["max"].dump();
	json js2 = json::parse(ss.str());
	cout << js2.dump() << endl;
}

This results in:

terminate called after throwing an instance of 'nlohmann::detail::out_of_range'
  what():  [json.exception.out_of_range.406] number overflow parsing '1.79769313486232e+308'

@stale
Copy link

stale bot commented Jan 6, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 6, 2018
@stale stale bot closed this as completed Jan 13, 2018
@nlohmann nlohmann removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Jan 21, 2018
@nlohmann
Copy link
Owner

Reopened to check whether #915 fixed this issue.

@nlohmann nlohmann reopened this Jan 21, 2018
@nlohmann
Copy link
Owner

The example from #360 (comment) works now and outputs 1.7976931348623157e+308.

@nlohmann
Copy link
Owner

The roundtrips from #360 (comment) work.

@nlohmann
Copy link
Owner

Roundtripping 2312.42 (#360 (comment)) works now.

@nlohmann
Copy link
Owner

Roundtripping 100000000000.1236 (#360 (comment)) works now.

@nlohmann
Copy link
Owner

Thanks to #915 from @abolz, this issue is now fixed. Thanks everybody for the patience!

@nlohmann nlohmann added solution: proposed fix a fix for the issue has been proposed and waits for confirmation and removed state: please discuss please discuss the issue or vote for your favorite option labels Jan 27, 2018
@nlohmann nlohmann self-assigned this Jan 27, 2018
@nlohmann nlohmann added this to the Release 3.1.0 milestone Jan 27, 2018
@ghost
Copy link

ghost commented Nov 14, 2018

This issue still seems to be there?

Here is some code that reproduces the issue:

#include <nlohmann/json.hpp>
int main()
{	
	using Json = ::nlohmann::json;
	std::string json_text{ "{\"spot\": 21898.99}" };
	Json json = Json::parse(json_text);
	auto j = *json.find("spot");
	double val = j.get<double>();
	// val is 21898.990000000002
        // expecting 21898.99
}

Git SHA da81e7b
Windows 10 version 10.0.14939.0
Visual Studio 2017 15.9.0, C++ 19.15.26732.1
Windows SDK 10.0.17134.0

Is seems as if this calls to std::strtod in lexer.hpp is the problem

 static void strtof(double& f, const char* str, char** endptr) noexcept
    {
        f = std::strtod(str, endptr);
    }

@nlohmann
Copy link
Owner

The library stores floating point numbers as double by default (you can change this in a template parameter).

The number 21898.99 will be stored as the double 21898.990000000002. This is the name number the parser comes up to after reading the string 21898.99. You can check in your debugger that both numbers are equal down to the bit.

image

https://www.exploringbinary.com/floating-point-converter/

@ghost
Copy link

ghost commented Nov 15, 2018

@nlohmann Danke Schoen!
I found out this is not a problem in your library but how decimal to floating point conversion takes place, and the limit of how floating point numbers are stored in memory. I learnt something today!

@charpty
Copy link

charpty commented Aug 9, 2021

I learnt something today too, thanks for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants