[Rollup] Loosen validations when only raw data is queried #35744
Labels
>enhancement
:StorageEngine/Rollup
Turn fine-grained time-based data into coarser-grained data
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Ask coming from kibana-land: elastic/kibana#24059
If a user hits the RollupSearch endpoint, we enforce a variety of constraints based on the matching job that queried. The most notable restriction is the interval. Once a chart is rendered, a user may wish to zoom in on a region. If this region is purely "raw" data, the interval validation isn't technically required any more because it is all raw data, and UIs may wish to display finer granularity buckets in this region.
This is tricky to support in Rollup today. We don't know the extent of data bounds until the search is executed. Only after the results come back do we know where the live and rolled data exist (and potentially overlap). So bypassing the validation through RollupSearch endpoint would be relatively complicated. But telling to the user to switch to the regular search is not possible, since the user doesn't know where the bounds are either.
If we want to support this kind of behavior, there are a few routes we could take:
Pre-search request to find bounds
Simple approach is to internally fire off a pre-search to find the bounds of the live (or rolled) data, so that the rollup search endpoint knows where boundaries exist. This sounds expensive for a behavior that I expect will be the minority case.
Technically this could be applied to the client-side too, and just tell clients/UI/Kibana to only use RollupSearch endpoint if they want both. Not super user-friendly though.
Obtain bounds from running task
We could get the data bounds from the currently running task. This also requires a pre-flight request, and has the disadvantage of not working if the task is gone. E.g. there's no guarantee a running task will match up with the index being searched, it may be gone
Enrich responses with metadata
We could enrich the aggregation response with metadata indicating which buckets were generated from "raw" data. This would be trivially easy to implement since we already know this information when merging shard responses.
This would give the client/UI enough information to know which region was entirely "raw" data, so if they wanted to zoom into this region exclusively they could switch to the regular search endpoint.
Or maybe hit the Rollup endpoint with some kind of parameter saying to ignore validation? Not sure. In any case, this feels like the most workable and flexible solution, albeit the least "magic" solution.
The text was updated successfully, but these errors were encountered: