Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scripting support for ranges #50190

Open
not-napoleon opened this issue Dec 13, 2019 · 6 comments
Open

Add scripting support for ranges #50190

not-napoleon opened this issue Dec 13, 2019 · 6 comments
Labels
:Analytics/Aggregations Aggregations :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team

Comments

@not-napoleon
Copy link
Member

This is intended to be a starting discussion about the options and possibilities for range scripting. While working on aggregation support for range fields, we've encountered a couple of situations that seem better served by scripting than by new aggregations or aggregation parameters. A couple of examples:

  • We would like to provide numerical metric aggregations, such as max and min, on ranges, but there is no natural ordering so we would need a way for users to specify how to order (e.g. by start point, end point, range width, etc). To support this strictly in aggs, we'd either need range specific metric aggregations, or to add a new parameter to select what mode to use. Neither is terribly appealing. Instead, we'd like to let users send a script that would extract whatever single number the user wants for the metric to operate on.

  • Open ended ranges (or any very large range, really) currently generate too many buckets and trip a circuit breaker when aggregating (see date_histogram of date_range with null end points triggers a circuit breaker #50109). Scripting could offer a recourse for this situation, allowing users to either rewrite large ranges (in the linked issue, they want to replace their unbounded endpoint with now(), for example) or just omitting ranges that will produce too many buckets.

To support this, several things need to happen:

  • Painless needs a way to access Range fields and provide an interface to their start and end fields. Those fields are Objects in Java, and can resolve to a couple of different types depending on the type of ranges. See https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/mapper/RangeType.java for the list of supported types.
  • We need to be able to return ranges from a script, which will likely require some ValueType or ValuesSourceType support on the aggregation side. I'm not sure what's required on the scripting side.
  • Value Scripts seem like a natural choice here, but the aggregations ValuesSource logic currently assumes that a Value Script will yield the same type as its input field, which means we can't use that to extract a single numeric value from a range. We probably need to make some changes aggregation side for this.
@not-napoleon not-napoleon added >enhancement :Analytics/Aggregations Aggregations :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache labels Dec 13, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

@nik9000
Copy link
Member

nik9000 commented Jan 6, 2020

I think #47469 will complicate this somewhat.

@not-napoleon
Copy link
Member Author

I think #47469 will complicate this somewhat.

Can you elaborate? I don't think the doc value formatter gets applied to script outputs or inputs (other than missing values, but missing already doesn't work with ranges in general, in part because of that issue).

@nik9000
Copy link
Member

nik9000 commented Jan 7, 2020

I thought there was some issue where we assumed that the value of a field in the response matched the value of the field on disk. I thought this would change the value of a range to a date or number. But maybe it won't do that anyway.

@not-napoleon
Copy link
Member Author

That's...possible, I'm not sure. I do remember I had to do some jank to get DateHistogram over ranges to format the bucket keys correctly, and this might hit something similar. My instinct is we should be able to work around that without fully solving the formatter issue though. Or, at least, that's my hope. I'm worried that #47469 is going to require a query syntax change and thus will need some deprecation plan, and I really don't want to block scripted ranges on that. But if we have to, we have to. 🤞 it doesn't come to that...

@rjernst rjernst added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team labels May 4, 2020
@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@jimczi jimczi removed the needs:triage Requires assignment of a team area label label Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team
Projects
None yet
Development

No branches or pull requests

5 participants