How "overflow safe" are the algorithms? (in particular when working with integer types) Provide "overflow hardened for integer type" versions of some of the most common the algorithms? #729
Replies: 3 comments
-
For some of the algorithms there is an optional parameter |
Beta Was this translation helpful? Give feedback.
-
Thanks for the pointer to the TCalc :) . I had read about it in the do, I think this is a nice thing to have, but that it may be possible by tuning a bit the algorithms to go even further / higher precision :) . |
Beta Was this translation helpful? Give feedback.
-
There I've used overflow detection in to_arithmetic.h so maybe I can include this technique in 'checked' versions of the algorithms. |
Beta Was this translation helpful? Give feedback.
-
How "overflow safe" are the algorithms used, for example (but not only) https://www.etlcpp.com/pseudo_moving_average.html , https://www.etlcpp.com/mean.html , https://www.etlcpp.com/standard_deviation.html , in particular when working with integer types? (which is actually quite common; for example, using lats and lons as int32_t so that lat_float = lat_integer / 1.e7, like is done in many MCU GPS libraries, one can in theory easily get overflows when taking some **2 in variance estimation).
Both when computing the average of many large integers, or even more when computing the std or var of many large integers (since there is a ``**2``` involved in addition), it is relatively easy to hit overflow conditions and shoot oneself in the foot and get completely wrong results.
My question is: is anything smart done to mitigate this risk? If not, can this be mentioned in the documentation?
An additional question is if some "smart" algorithm specializations for integer types to avoid this kind of issues should be provided; for example, computing the "coarse" mean / var in float first, then turning these into nearest integer, and then taking the mean / var relative to the mean offset to avoid huge values as much as possible could be an option.
I am by no way an expert on this, and the way I was doing it was likely broken and bad and should probably be improved on, but I was for example needing this for a n-sigma filtering, see as a possible source of inspiration:
https://github.com/jerabaul29/OpenMetBuoy-v2021a/blob/8a1256e43c19c2be496525f64c2e8e073344b8c8/legacy_firmware/firmware/standard_gps_waves_thermistors_drifter/statistical_processing.h#L19-L143
Note: there are also issues when using float types due to relative precision, but that could be for another issue, and there is probably not really a workaround, this is just how floats are.
For example, calculating the average of the 4 float_32 or float_64:
1.0e36, 10, 10, -1.0e36
which should be equal to exactly 5, may not be accurate, since due to relative accuracies, one will likely get (if not using a crazy precision float type like float_128 or _256 I guess, though I am not sure even then) that
1.0e36+10=1.0e36
, so if just summing and dividing, the 2 values 10 will be "precision erased" compared with the other terms.That could be a small warning at the start of the documentation of the associated functions, or one could consider that the users should know about this since "there is no real way around and this is just how floats work, this is expected background knowledge" (in this case, it could be possible to sort by absolute value before doing operations, hoping that the very large absolute value terms cancel exactly first so that the small terms are not "relative precision erased", but that may be a bit a corner case). But it may be useful to just state that nothing smart is done to mitigate this kind of problems (if not something smart is actually being done).
Beta Was this translation helpful? Give feedback.
All reactions