-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sum
, stats
and avg
aggregations should use Kahan summation
#27807
Comments
One thing to note with this is the tradeoff for accuracy is that the memory required to store the sum for each parent bucket will be doubled (since we need to store the sum and the compensation value) and the number of operations to calculate the sum increase 4-fold. In practice this should not matter that much for these aggregations since their memory footprint is also nothing in comparison with other aggregations like percentiles (since its still only 2 doubles per parent bucket), and the time for calculating the sum should still be low compared with the time to collect the doc value itself but I am mentioning it here in case its relevant for other places we may want to implement Kahan summation in the future. |
* master: Trim down usages of `ShardOperationFailedException` interface (#28312) Do not return all indices if a specific alias is requested via get aliases api. [Test] Lower bwc version for rank-eval rest tests CountedBitSet doesn't need to extend BitSet. (#28239) Calculate sum in Kahan summation algorithm in aggregations (#27807) (#27848) Remove the `update_all_types` option. (#28288) Add information when master node left to DiscoveryNodes' shortSummary() (#28197) Provide explanation of dangling indices, fixes #26008 (#26999)
* 6.x: Trim down usages of `ShardOperationFailedException` interface (#28312) Clean up commits when global checkpoint advanced (#28140) Do not return all indices if a specific alias is requested via get aliases api. CountedBitSet doesn't need to extend BitSet. (#28239) Calculate sum in Kahan summation algorithm in aggregations (#27807) (#27848)
* master: (94 commits) Completely remove Painless Type from AnalyzerCaster in favor of Java Class. (elastic#28329) Fix spelling error Reindex: Wait for deletion in test Reindex: log more on rare test failure Ensure we protect Collections obtained from scripts from self-referencing (elastic#28335) [Docs] Fix asciidoc style in composite agg docs Adds the ability to specify a format on composite date_histogram source (elastic#28310) Provide a better error message for the case when all shards failed (elastic#28333) [Test] Re-Add integer_range and date_range field types for query builder tests (elastic#28171) Added Put Mapping API to high-level Rest client (elastic#27869) Revert change that does not return all indices if a specific alias is requested via get alias api. (elastic#28294) Painless: Replace Painless Type with Java Class during Casts (elastic#27847) Notify affixMap settings when any under the registered prefix matches (elastic#28317) Trim down usages of `ShardOperationFailedException` interface (elastic#28312) Do not return all indices if a specific alias is requested via get aliases api. [Test] Lower bwc version for rank-eval rest tests CountedBitSet doesn't need to extend BitSet. (elastic#28239) Calculate sum in Kahan summation algorithm in aggregations (elastic#27807) (elastic#27848) Remove the `update_all_types` option. (elastic#28288) Add information when master node left to DiscoveryNodes' shortSummary() (elastic#28197) ...
Kahan summation gives better accuracy than naive summation. For instance when summing up N positive values, the relative error of Kahan summation is bound by
2^-52
while with naive summation, the error is linear with the number of values that are summed up.The text was updated successfully, but these errors were encountered: