You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently histogram buckets are densely pre-created depending on the fastfields min max values and passed bounds. This allows fast computation of the bucket pos for incoming values.
In some scenarios this may cause issue with max bucket limit (defaults to 65000) and server memory consumption
Data example
{"value":0}{"value":1_000_000_000}
A histogram query on value with interval 1 and min_doc_count > 0 would create 1 billion buckets, and potentially overload the server, even though only 2 buckets will be returned.
An alternative would be to have sparse histogram collection. It could also be a hybrid of lazy dense collection with automatic switching to sparse.
Sparse histogram collection may have a potential reuse capability with a a future date histogram.
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes#1704closes#1370
* switch to sparse collection for histogram
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes#1704closes#1370
* refactor, clippy
* fix bucket_pos overflow issue
Currently histogram buckets are densely pre-created depending on the fastfields min max values and passed bounds. This allows fast computation of the bucket pos for incoming values.
In some scenarios this may cause issue with max bucket limit (defaults to 65000) and server memory consumption
Data example
A histogram query on
value
withinterval
1 andmin_doc_count
> 0 would create 1 billion buckets, and potentially overload the server, even though only 2 buckets will be returned.An alternative would be to have sparse histogram collection. It could also be a hybrid of lazy dense collection with automatic switching to sparse.
Sparse histogram collection may have a potential reuse capability with a a future date histogram.
Related Issues #1703, quickwit-oss/quickwit#2503
The text was updated successfully, but these errors were encountered: