How "overflow safe" are the algorithms? (in particular when working with integer types) Provide "overflow hardened for integer type" versions of some of the most common the algorithms? #729

jerabaul29 · 2023-01-04T10:02:43Z

jerabaul29
Jan 4, 2023

How "overflow safe" are the algorithms used, for example (but not only) https://www.etlcpp.com/pseudo_moving_average.html , https://www.etlcpp.com/mean.html , https://www.etlcpp.com/standard_deviation.html , in particular when working with integer types? (which is actually quite common; for example, using lats and lons as int32_t so that lat_float = lat_integer / 1.e7, like is done in many MCU GPS libraries, one can in theory easily get overflows when taking some **2 in variance estimation).

Both when computing the average of many large integers, or even more when computing the std or var of many large integers (since there is a ``**2``` involved in addition), it is relatively easy to hit overflow conditions and shoot oneself in the foot and get completely wrong results.

My question is: is anything smart done to mitigate this risk? If not, can this be mentioned in the documentation?

An additional question is if some "smart" algorithm specializations for integer types to avoid this kind of issues should be provided; for example, computing the "coarse" mean / var in float first, then turning these into nearest integer, and then taking the mean / var relative to the mean offset to avoid huge values as much as possible could be an option.

I am by no way an expert on this, and the way I was doing it was likely broken and bad and should probably be improved on, but I was for example needing this for a n-sigma filtering, see as a possible source of inspiration:

https://github.com/jerabaul29/OpenMetBuoy-v2021a/blob/8a1256e43c19c2be496525f64c2e8e073344b8c8/legacy_firmware/firmware/standard_gps_waves_thermistors_drifter/statistical_processing.h#L19-L143

Note: there are also issues when using float types due to relative precision, but that could be for another issue, and there is probably not really a workaround, this is just how floats are.

For example, calculating the average of the 4 float_32 or float_64:

1.0e36, 10, 10, -1.0e36

which should be equal to exactly 5, may not be accurate, since due to relative accuracies, one will likely get (if not using a crazy precision float type like float_128 or _256 I guess, though I am not sure even then) that 1.0e36+10=1.0e36, so if just summing and dividing, the 2 values 10 will be "precision erased" compared with the other terms.

That could be a small warning at the start of the documentation of the associated functions, or one could consider that the users should know about this since "there is no real way around and this is just how floats work, this is expected background knowledge" (in this case, it could be possible to sort by absolute value before doing operations, hoping that the very large absolute value terms cancel exactly first so that the small terms are not "relative precision erased", but that may be a bit a corner case). But it may be useful to just state that nothing smart is done to mitigate this kind of problems (if not something smart is actually being done).

jwellbelove · 2023-01-04T10:26:54Z

jwellbelove
Jan 4, 2023
Maintainer

For some of the algorithms there is an optional parameter typename TCalc = TInput that allows the user to specify the type to be used for intermediate calculations. This could maybe be extended to algorithms that could suffer from intermediate overflows.

0 replies

jerabaul29 · 2023-01-04T10:51:02Z

jerabaul29
Jan 4, 2023
Author

Thanks for the pointer to the TCalc :) . I had read about it in the do, I think this is a nice thing to have, but that it may be possible by tuning a bit the algorithms to go even further / higher precision :) .

0 replies

jwellbelove · 2023-01-05T11:40:06Z

jwellbelove
Jan 5, 2023
Maintainer

There I've used overflow detection in to_arithmetic.h so maybe I can include this technique in 'checked' versions of the algorithms.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How "overflow safe" are the algorithms? (in particular when working with integer types) Provide "overflow hardened for integer type" versions of some of the most common the algorithms? #729

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

How "overflow safe" are the algorithms? (in particular when working with integer types) Provide "overflow hardened for integer type" versions of some of the most common the algorithms? #729

jerabaul29 Jan 4, 2023

Replies: 3 comments

jwellbelove Jan 4, 2023 Maintainer

jerabaul29 Jan 4, 2023 Author

jwellbelove Jan 5, 2023 Maintainer

jerabaul29
Jan 4, 2023

jwellbelove
Jan 4, 2023
Maintainer

jerabaul29
Jan 4, 2023
Author

jwellbelove
Jan 5, 2023
Maintainer