Fixed Issue #187 - Change parse to record floating point representation #201

twelsby · 2016-01-30T04:28:52Z

This pull request incorporates the following changes:

New function get_integer() that is used instead of strtoull()/strtoll(). This function implements custom integer parsing that saves the number to the correct basic_json type and returns the type. This also eliminates the need for attempt_cast() as overflow is detected internally. Additionally details about the floating point representation are collected during the parse (in the event it is not an integer) such as the number of significant figures, the capitalization of 'e' and the use of the optional positive exponential sign. For floating point numbers this function is faster then the method currently used in 2.0.0 and only slightly slower than the method used in 1.1.0. For integers this method is significantly faster than both 2.0.0 and 1.1.0.
New type type_data_t that incorporates the value_t enum into a bitfield with the floating point representation information. This ensures that no additional memory is used in any packing scenario except byte level packing (unlikely and even then only a small amount extra is used). The type includes operator overloads so that it can be used as an almost drop in replacement for the straight value_t (switch statements need to be changed to use the get() method). Using bitfields results in a small performance overhead but as these variables are accessed only infrequently it is not significant.
Modification to the dumping of floating point numbers, which now uses snprintf() for the conversion to enable application of the recorded floating point representation. This results in massive performance improvements simply from the change to snprintf().
Addition of extra roundtrip unit tests that target the new behaviour.

twelsby · 2016-01-30T07:58:44Z

@gregmarr, in relation to "0.0" my rationale was that since a floating point 0.0 was itself a special case it would be better not to complicate the code with yet another special case for -0.0, although it may be slightly more efficient to do so. I've had another look at it and the complication is not as bad as I thought so I have made the change.

In relation to the comment - the mention of '15' digits has been there for some time even though std::numeric_limits<number_float_t>::digits10() was used instead of a literal 15. This was not introduced or changed in this pull. That said, you have a good point so I have also amended the comment.

twelsby · 2016-01-30T08:06:26Z

I also merged get_integer() into get_number(). The code coverage decreased with the original commit and it appeared that the test of endptr after str_to_float_t() was never testing true. I tried to introduce a new unit test but it proved impossible to provide any input that the lexer identified as a number that would also cause str_to_float_t() to set endptr to anything other than m_cursor. This is therefore a redundant test which I have removed.

With this test removed, get_number() was just calling get_integer() and then str_to_float_t() so there didn't seem to be much point in having two separate functions.

nlohmann · 2016-01-30T13:08:38Z

Could you please put the additional test cases to a different place (best would be directly into unit.cpp to the regression tests), because we would have clashes if miloyip/nativejson-benchmark adds new roundtrip tests?

gregmarr · 2016-01-30T15:03:51Z

I didn't realize that it also covered -0.0, since the comment didn't mention it.
The numeric limits still has the mismatch with the comment using the typedef and the code using double.

nlohmann · 2016-01-30T16:45:54Z

I am currently not sure whether it is a good idea to preserve optional + and the case of the exponent. In the end, it is still the same number, and JSON allows for these variations. After all, we also don't care about whitespace or the order in which object keys are parsed.

twelsby · 2016-01-31T01:25:44Z

@gregmarr, you are correct, the numeric limits should be number_float_t, not double.

@nlohmann, it would reduce the code complexity to drop e/E and + preservation but this would not meaningfully affect performance as these are not computationally intensive operations - saving the representation is two comparisons that only need to be done once per number, while applying them involves two comparisons and a very fast loop if the '+' needs to be removed. But I agree it shouldn't be a priority to preserve such things... but in my view the precision needs to be preserved.

Really what we need though is a custom function that doesn't depend on snprintf(). One thing that snprintf() doesn't do, that would be useful is to round non-parsed floating point numbers into the shortest decimal representation that ensures accurate roundtrips (i.e. double->string->double). This is what rapidJSON does, although its implementation causes string->double->string roundtrips to fail in certain circumstances. The best of both worlds is precision preservation for string->double->string and shortest decimal for double->string->double, but that could only be accomplished with a custom double dumping function.

The problem I see it with custom double parsing/dumping functions is that both are very challenging to do efficiently. Integer parsing/dumping can easily and efficiently be done with minimal code (no more than 10 lines each) but I would not be surprised if adding custom floating point parsing/dumping added thousands of lines - but maybe I am just being pessimistic.

nlohmann · 2016-03-30T18:56:55Z

I shall finally merge this this weekend.

nlohmann · 2016-04-03T12:10:07Z

Thanks a lot! I merged the code and made some minor adjustments. I hope the tests all succeed.

nlohmann · 2016-04-03T12:24:32Z

All tests are fine. Thanks so much @twelsby!

nlohmann · 2016-05-08T13:47:16Z

src/json.hpp.re2c

+                        // Remove '+' sign from the exponent if necessary
+                        if (!m_type.bits.exp_plus)
+                        {
+                            if (len > static_cast<int>(sizeof(buf))) len = sizeof(buf);


Hi @twelsby!

I am currently trying to improve the test coverage of the code, but I fail to create a test case where len is larger than the size of buf. The previous snprintf calls never seem to return anything greater than 263 - and if I understand snprintf correctly, it would only return a value larger than sizeof(buf) if writing to the buffer failed. In that case, capping it to sizeof(buf) and continue to write to it would make no sense.

Before I jump to wrong conclusions, I would like to ask you what situation you had in mind that would require the if.

All the best
Niels

Trevor Welsby added 4 commits January 27, 2016 13:04

Change parse to record float precision

f35f60c

Implement bitfields for better readability over masks

7d08aa7

Shift integer parsing to guess_type() and rename to get_integer()

21a00fc

Merge get_number()/get_integer() + changes suggested by @gregmarr

e951795

nlohmann added the improvement label Jan 30, 2016

nlohmann mentioned this pull request Jan 30, 2016

Bugs in miloyip/nativejson-benchmark: roundtrips #187

Closed

nlohmann mentioned this pull request Feb 11, 2016

Fixed stream precision issue. #206

Closed

nlohmann self-assigned this Mar 30, 2016

nlohmann added a commit that referenced this pull request Apr 3, 2016

merged #201

9c233be

nlohmann added this to the Release 2.0.0 milestone Apr 3, 2016

nlohmann closed this Apr 3, 2016

nlohmann mentioned this pull request Apr 5, 2016

Incorrect dumping of parsed numbers with exponents, but without decimal places #230

Closed

nlohmann reviewed May 8, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed Issue #187 - Change parse to record floating point representation #201

Fixed Issue #187 - Change parse to record floating point representation #201

twelsby commented Jan 30, 2016

twelsby commented Jan 30, 2016

twelsby commented Jan 30, 2016

nlohmann commented Jan 30, 2016

gregmarr commented Jan 30, 2016

nlohmann commented Jan 30, 2016

twelsby commented Jan 31, 2016

nlohmann commented Mar 30, 2016

nlohmann commented Apr 3, 2016

nlohmann commented Apr 3, 2016

nlohmann May 8, 2016

Fixed Issue #187 - Change parse to record floating point representation #201

Fixed Issue #187 - Change parse to record floating point representation #201

Conversation

twelsby commented Jan 30, 2016

twelsby commented Jan 30, 2016

twelsby commented Jan 30, 2016

nlohmann commented Jan 30, 2016

gregmarr commented Jan 30, 2016

nlohmann commented Jan 30, 2016

twelsby commented Jan 31, 2016

nlohmann commented Mar 30, 2016

nlohmann commented Apr 3, 2016

nlohmann commented Apr 3, 2016

nlohmann May 8, 2016

Choose a reason for hiding this comment