Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum, stats and avg aggregations should use Kahan summation #27807

Closed
jpountz opened this issue Dec 14, 2017 · 1 comment
Closed

sum, stats and avg aggregations should use Kahan summation #27807

jpountz opened this issue Dec 14, 2017 · 1 comment

Comments

@jpountz
Copy link
Contributor

jpountz commented Dec 14, 2017

Kahan summation gives better accuracy than naive summation. For instance when summing up N positive values, the relative error of Kahan summation is bound by 2^-52 while with naive summation, the error is linear with the number of values that are summed up.

@colings86
Copy link
Contributor

colings86 commented Dec 14, 2017

One thing to note with this is the tradeoff for accuracy is that the memory required to store the sum for each parent bucket will be doubled (since we need to store the sum and the compensation value) and the number of operations to calculate the sum increase 4-fold. In practice this should not matter that much for these aggregations since their memory footprint is also nothing in comparison with other aggregations like percentiles (since its still only 2 doubles per parent bucket), and the time for calculating the sum should still be low compared with the time to collect the doc value itself but I am mentioning it here in case its relevant for other places we may want to implement Kahan summation in the future.

liketic added a commit to liketic/elasticsearch that referenced this issue Dec 16, 2017
jasontedor added a commit that referenced this issue Jan 22, 2018
* master:
  Trim down usages of `ShardOperationFailedException` interface (#28312)
  Do not return all indices if a specific alias is requested via get aliases api.
  [Test] Lower bwc version for rank-eval rest tests
  CountedBitSet doesn't need to extend BitSet. (#28239)
  Calculate sum in Kahan summation algorithm in aggregations (#27807) (#27848)
  Remove the `update_all_types` option. (#28288)
  Add information when master node left to DiscoveryNodes' shortSummary() (#28197)
  Provide explanation of dangling indices, fixes #26008 (#26999)
jasontedor added a commit that referenced this issue Jan 22, 2018
* 6.x:
  Trim down usages of `ShardOperationFailedException` interface (#28312)
  Clean up commits when global checkpoint advanced (#28140)
  Do not return all indices if a specific alias is requested via get aliases api.
  CountedBitSet doesn't need to extend BitSet. (#28239)
  Calculate sum in Kahan summation algorithm in aggregations (#27807) (#27848)
jasontedor added a commit to matarrese/elasticsearch that referenced this issue Jan 24, 2018
* master: (94 commits)
  Completely remove Painless Type from AnalyzerCaster in favor of Java Class. (elastic#28329)
  Fix spelling error
  Reindex: Wait for deletion in test
  Reindex: log more on rare test failure
  Ensure we protect Collections obtained from scripts from self-referencing (elastic#28335)
  [Docs] Fix asciidoc style in composite agg docs
  Adds the ability to specify a format on composite date_histogram source (elastic#28310)
  Provide a better error message for the case when all shards failed (elastic#28333)
  [Test] Re-Add integer_range and date_range field types for query builder tests (elastic#28171)
  Added Put Mapping API to high-level Rest client (elastic#27869)
  Revert change that does not return all indices if a specific alias is requested via get alias api. (elastic#28294)
  Painless: Replace Painless Type with Java Class during Casts (elastic#27847)
  Notify affixMap settings when any under the registered prefix matches (elastic#28317)
  Trim down usages of `ShardOperationFailedException` interface (elastic#28312)
  Do not return all indices if a specific alias is requested via get aliases api.
  [Test] Lower bwc version for rank-eval rest tests
  CountedBitSet doesn't need to extend BitSet. (elastic#28239)
  Calculate sum in Kahan summation algorithm in aggregations (elastic#27807) (elastic#27848)
  Remove the `update_all_types` option. (elastic#28288)
  Add information when master node left to DiscoveryNodes' shortSummary() (elastic#28197)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants