Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram: Smarter bounds for bucket creation #1370

Closed
PSeitz opened this issue May 12, 2022 · 0 comments · Fixed by #1898
Closed

Histogram: Smarter bounds for bucket creation #1370

PSeitz opened this issue May 12, 2022 · 0 comments · Fixed by #1898

Comments

@PSeitz
Copy link
Contributor

PSeitz commented May 12, 2022

Currently the histogram pre-creates buckets in the segment collector for the whole data range in the fast field, adjusted by hard_bounds or extended_bounds if set.

In a use case where the histogram is a sub_aggregation from another aggregation on the same field which applies the some bounds, these bounds should be forwarded.

E.g.

Range on a timestamp, one bucket per month, and inside each bucket a histogram on the timestamp.

PSeitz added a commit that referenced this issue May 12, 2022
Validation happens on different phases depending on the aggregation
Term: During segment collection
Histogram: At the end when converting in intermediate buckets (we preallocate empty buckets for the range) Revisit after #1370
Range: When validating the request

update CHANGELOG
PSeitz added a commit that referenced this issue May 12, 2022
Validation happens on different phases depending on the aggregation
Term: During segment collection
Histogram: At the end when converting in intermediate buckets (we preallocate empty buckets for the range) Revisit after #1370
Range: When validating the request

update CHANGELOG
PSeitz added a commit that referenced this issue Feb 22, 2023
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370
PSeitz added a commit that referenced this issue Feb 23, 2023
* switch to sparse collection for histogram

Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370

* refactor, clippy

* fix bucket_pos overflow issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant