Evaluate sparse histogram collection #1704

PSeitz · 2022-11-30T10:09:51Z

Currently histogram buckets are densely pre-created depending on the fastfields min max values and passed bounds. This allows fast computation of the bucket pos for incoming values.

In some scenarios this may cause issue with max bucket limit (defaults to 65000) and server memory consumption

Data example

{"value": 0}
{"value": 1_000_000_000}

A histogram query on value with interval 1 and min_doc_count > 0 would create 1 billion buckets, and potentially overload the server, even though only 2 buckets will be returned.

An alternative would be to have sparse histogram collection. It could also be a hybrid of lazy dense collection with automatic switching to sparse.

Sparse histogram collection may have a potential reuse capability with a a future date histogram.

Related Issues #1703, quickwit-oss/quickwit#2503

The text was updated successfully, but these errors were encountered:

Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370

* switch to sparse collection for histogram Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370 * refactor, clippy * fix bucket_pos overflow issue

PSeitz mentioned this issue Feb 22, 2023

switch to sparse collection for histogram #1898

Merged

PSeitz closed this as completed in #1898 Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate sparse histogram collection #1704

Evaluate sparse histogram collection #1704

PSeitz commented Nov 30, 2022 •

edited

Loading

Evaluate sparse histogram collection #1704

Evaluate sparse histogram collection #1704

Comments

PSeitz commented Nov 30, 2022 • edited Loading

Data example

PSeitz commented Nov 30, 2022 •

edited

Loading