-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Floating point precision and missing integer fields #90
Comments
Hi Mark, I'm glad you find it useful. To your first question (Q1): Ignoring JSON, FlatBuffers have the concept of default values which are not stored in the buffer. By default, the default integer value is 0. Regardless of whether the value is stored or not, it is read back by the reader because it knows the default. The JSON printer uses runtime flags to achieve similar behavior: flatcc/src/runtime/json_printer.c Lines 482 to 492 in ed32d61
The flags are not set by the init function, but immediately after using the set flags call: flatcc/include/flatcc/flatcc_json_printer.h Lines 255 to 271 in ed32d61
There are two cases where the 0 could be omitted: either the field is not present at all in the buffer (either value was never written, or a default value was written to the buffer without being forced, as is the normal mode), or the field is present but the value matches the default value. The latter case is controlled by skip_default and the former case is controlled by force_default - print the default even if absent. Now, is there a bug? possibly - but, from memory, I believe the intention is to print a value in the buffer if it is stored in the buffer, default or not but also not print default values that are not stored - otherwise a lot of fields could be printed that are of no interest. So if the buffer actually contains the value, it would probably be a bug, but more likely the buffer is omitting the default value and so is the printer. In this case you need to either force the value into the buffer, or use force_default on the runtime flag, but then lots of other values might be printed as well. Q2 follow in separate post. |
Q2: This is also something that bothers me. I have actually spent considerable time trying to fix this, several times, but it is very difficult if is to be done with grisu3 and remain fast, I also looked at googles grisu3 implementation in C++ but did not find it helpful. I tried to print the full resolution to a buffer (this happens already), but then truncated the buffer using a rounding algorithm that is fairly trivial. This did work, but the time overhead was such that I did not want to release it. A proper solution need to stop the algorithm that produces new digits at a lower resolution. There is a computed error measure that controls this process, but it is a black art. Any modification could easily lead to incorrect results. I would be really happy if someone would provide such a patch. As I recall the Google solution also works by rounding after the fact, which is slow. For your case I'd recommend not using grisu3 and redefine the print methods so they use a different resolution. I can't provide a generic interface for configurable resolution because that would force the grisu3 implemtentation to support it if/when the fallback is replaced by an internal implementation. While a bit hackish you could probably do it by pretending to be grisu3 implementation in the following: Other suggestions are welcome, but it is not an easy problem. |
Q2 v2 Another option is to use hex floating points if that is acceptable to the recepient. I'm not sure I have actually tested this, but it definitely is the intention that this should be supported. This would both be very fast and presumably also compact with the added benefit of perfect numeric precision. Here is a config flag for hex floats: flatcc/include/flatcc/flatcc_rtconfig.h Lines 81 to 90 in ed32d61
Several JSON parsers will accept hex float unintentionally because they use strtod in the core implementation without verifying that the numeric format is valid JSON. The C++ FlatBuffer JSON parser should be supporting hex floats officially, as I understand, at least we had a discussion about it. Please let me know if you go down that path. |
Hi Mike, Q1:
This logic sounds correct to me too. |
Hi Mark, You could try to parse a manually created buffer with flatcc and set the force flag on the parser, then see how it prints. flatcc/test/json_test/test_json.c Lines 162 to 181 in ed32d61
See also and Please do not hestitate to raise further concerns - it might be a bug although I suspect that it is not. |
Q2: I tried using the standard library
Using hex floats won't work in our situation as we are using the JSON data in various different systems and languages. Transmitting the data in binary format is an option though. We could then post-process the binary data to JSON elsewhere. I appreciate this is a difficult problem to solve. While looking for another solution I came across this 2018 ACM DL paper which I though may be of interest. It claims to improve on the Grisu3 method and is possibly simpler to implement (according to the authors). Q1: Thanks for clarifying and pointing me to the location in the documentation. If I call the *_force_add( ) methods, the default values get added to the buffer as you have described. Thanks, |
Thanks for the numbers. sprintf performance is not too bad in truncated form. I don't recall the performance numbers of my rouding experiment, but considering it needs to pass over 17 digits several times, I don't think it would be much of an improvement. Interesting paper - if you (or anyone else reading this) come across a quality C implementation with test cases and performance tests against grisu3, I'd seriously consider adopting it, especially if it can efficiently control resolution. |
There seems to be an Apache 2 licensed C implementation |
The implementation changes a lot atm and has not solved the resolution problem fully: But it is worth keeping an eye on. |
BTW: on performance: if you are measuring the overall performance of JSON processing and the difference is a factor 2, then of course sprintf is significantly more than twice slower than grisu3 and worth optimizing further. Floating point printing is truly a signficant factor in overall JSON processing. |
sprintf is actually even slower because the factor 3 is on the overall JSON printing that also does other work. So it is likely closer to x4-x6. The new Ryu algorithm should be a factor 3 faster than grisu3 on average, and with fewer outliers. So that would make it roughly 10x faster than sprintf, is my guess. The Ryu author is very responsive so it is likely that we will see solution in the not so distant future. The core also appears stable with most changes to the Ryu code base being architectural. This means that a given snapshot of the code will likely be correct since Ryu has been verified with exhaustive search in 32-bit floats. I suggest you follow progress on Ryu and make a test implementation with flatcc once formatting is supported. We can then use those findings to evaluate if a port to Ryu makes sense. For reference, grisu3 would still be used for parsing and it has a dependency on strtod for a few percentage of cases, as I recall. |
The Ryu project does not have immediate plans on supporting custom printing precision. Keeping this issue open as a reminder to check status in the future. |
Thank-you for creating a great library. I am seeing a performance improvement of approximately 10:1 over JsonCpp. File sizes tested up to 3 MB.
Q1:
I have noticed that tables with integer fields with 0 values are not printed. Is this a bug or have I missed something in the documentation?
Q2:
I would like to specify the floating point precision (significant figures) to use when writing floats or doubles? Ideally through a configuration option. In my application I'd like to reduce the number of significant figures to 5. This will reduce the size of my JSON files by a factor of 2.
This is the relevant piece of code. The precision is hard coded. However, it is easily parameterized for the standard library use case. For the grisu3 library function
grisu3_print_double
, its unclear as to how to modify the function to limit the precision. Could you perhaps provide a code snippet that includes the changes by adding a precision parameter to thegrisu3_print_double
call?Thank-you!
The text was updated successfully, but these errors were encountered: