-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate scripted metric aggregation scripts to ScriptContext design #30111
Migrate scripted metric aggregation scripts to ScriptContext design #30111
Conversation
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class MetricAggScriptsTests extends ScriptTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these tests just for convenient sanity checking that the new contexts were working. But perhaps they're redundant with more general Painless tests of script parameter access functionality.
return params; | ||
} | ||
|
||
public Object getAgg() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason that agg should be provided via execute() parameter instead of via an accessor like this?
@@ -100,7 +100,7 @@ public static void initMockScripts() { | |||
}); | |||
SCRIPTS.put("mapScriptScore", params -> { | |||
Map<String, Object> agg = (Map<String, Object>) params.get("_agg"); | |||
((List<Double>) agg.get("collector")).add(((ScoreAccessor) params.get("_score")).doubleValue()); | |||
((List<Double>) agg.get("collector")).add(((Number) params.get("_score")).doubleValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScoreAccessor here doesn't work with the new contexts. I don't see why this use should be coupled specifically to ScoreAccessor.
@@ -1016,4 +1058,37 @@ public void testConflictingAggAndScriptParams() { | |||
SearchPhaseExecutionException ex = expectThrows(SearchPhaseExecutionException.class, builder::get); | |||
assertThat(ex.getCause().getMessage(), containsString("Parameter name \"param1\" used in both aggregation and script parameters")); | |||
} | |||
|
|||
public void testAggFromContext() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new test here for now to get some specific coverage easily. I think what actually makes sense here is to run all the test cases here against both old param and new context variable scripts, while both are implemented.
Pinging @elastic/es-search-aggs |
@elastic/es-search-aggs thoughts on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rationull Sorry for the delay in reviewing this. I left some comments but I think it looks good in general. I would like @rjernst to also look at it as well before we merge
firstAggregation.reduceScript, ExecutableScript.AGGS_CONTEXT); | ||
ExecutableScript script = factory.newInstance(vars); | ||
aggregation = Collections.singletonList(script.run()); | ||
params.put("_aggs", aggregationObjects); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? I thought that the reason to use script contexts was to that we then wouldn't need to put the _aggs
variable in the params because it is passed directly to the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (along with analogous code for _agg
in ScriptedMetricAggregatorFactory.java and some of the integration tests I did not modify) is needed if we want to retain backwards compatibility with the old way for one ES version, letting people migrate to the context variables while both still work.
If a clean break is acceptable then this would be removed and some of the tests could be cleaned up. But I would assume we want some compatibility leniency and possibly some warning messages or other notification (which I have not implemented yet -- item 2 on my list in the first comment on this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, understood. I'm +1 on keeping this for backwards compatibility, thanks for explaining. Maybe we could add a comment just noting that its there for backwards compatibility to avoid someone removing it thinking its just left over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I will add a comment here and in other relevant spots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you brought this up -- when looking for spots that needed a comment I found a leftover usage of params.get("_agg")
in ScriptedMetricAggregator.java that needed to be replaced with a direct usage of the agg object. No functional issue but it would've had to get cleaned up later.
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class MetricAggScripts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this ScriptedMetricAggContexts
instead since "metric agg" is a term that includes all leaf aggregation types not just the scripted_metric
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely -- that is much clearer. I will make that change ASAP.
@elasticmachine test this please |
@colings86 the build failure looks to be an error I saw running them on my own as well, against a revision on master prior to my changes. I spoke with @rjernst about this and he suggested it could be due to my branch being out of date wrt 6.x when it comes to Lucene versions. I've been trying to base my changes on master revisions that have passed CI but I will update to latest master in my next push along with the suggested changes. |
… to remaining references to params._agg(s)
Master is presently failing in CI but I merged it into my branch anyway so at least any further builds should fail consistently with it until it’s fixed and this branch can be updated again. |
@rjernst note I made some comments when creating this PR which are still relevant but are hidden as outdated after my last minor cleanup commit. Specifically comments about naming and usage of context variables vs |
@rjernst any feedback on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay in responding. In general, I wonder if the "agg" name should continue to be used, or if something better like "state" could make it more clear what the purpose of the object is for?
Also, in order to have a clear deprecation path for using params._agg, I would control adding that with a system property, which emits a deprecation warning when read the first time if it is enabled, but that could be done in a followup.
import java.util.List; | ||
import java.util.Map; | ||
|
||
public class ScriptedMetricAggContexts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name this MetricAggScripts so that the inner classes read more naturally, eg MetricAggScripts.Init
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had originally, actually. I abandoned e.g. Init
in favor of InitScript
because naming one of the classes Map
ends up requiring fully qualifying java.util.Map in a bunch of spots which is pretty annoying (although really isn't a dealbreaker since it could be limited to this file). And past that, @colings86 suggestion that "metric agg" as a term is more general than this feature made sense to me. Admittedly the current situation makes the outer class kind of unnecessary since references to the scripts end up using long names anyway. This could be replaced with a package (or not even) and names like ScriptedMetricAggInitScript
or just MetricAggInitScript
which may be more palatable since it avoids a more general name without "script" in it.
TBH I might lean toward the latter; using a class as a small namespace in this context is a little clunky anyway IMO (even if it's the best Java can do). I defer to what you two think makes the most sense in context. I will await comment from @rjernst and @colings86 to settle this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjernst I'm inclined to leave this as-is because personally I think the MapScript name qualification is more annoying than the ScriptedMetricAggContexts.*Script naming, and on further reflection I do think having them all in one file is better than splitting them apart. Let me know if you are firm in a different opinion on this though.
|
||
double _score = 0.0; | ||
if (scorer != null) { | ||
_score = scorer.score(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause the score to always be calculated, even when not being used. Instead, in SearchScript, we have the get_score()
method, which painless converts to a _score
variable and is only invoked when it is actually used. So, the MapScript should have setScorer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I guess lazy evaluation is an advantage of accessors vs execute()
args and I hadn't picked up on this disadvantage.
SearchScript
has getScore()
not get_score()
and I'm unclear how Painless translates that into a _score
variable in the script so I will have to dig into this one a bit more. AFAICT that logic is SearchScript specific so I think that will require modifications on PainlessScriptEngine or somewhere else deeper in. (Before implementing this I did a quick test to confirm that get_score()
won't automatically be converted to a _score
variable).
Having to modify the scripting engine for a particular context seems counter to the intent of the context design. So perhaps there is a more general solution here for causing Painless to recognize accessors for names with a leading underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_score()
should work; if it doesn't, it's a bug. @jdconrad Can you check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was mistaken here.. another quick hack test just now shows variables that start with underscores do work. Not sure what test lead me to the conclusion before that it didn't. That should make this an easy fix. No need to mobilize @jdconrad :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My last commit resolved this.
import java.util.Map; | ||
|
||
public class ScriptedMetricAggContexts { | ||
private abstract static class ParamsAndAggBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this base class could be removed by passing the arguments directly to execute()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have a general question (hidden/outdated now in this PR) about execute()
args vs accessors. Is there a general intent in the script context design for which mechanism any given script context variable should use? My read was that "ambient" or earlier-bound context should be via accessors and more execution specific or later-bound context should be via execute()
args.
In this case specifically both agg
and params
are conveniently available at the time the script contexts are created, but less conveniently (in some cases) available when the script is executed, so I opted to use accessors instead. They could be passed down the chain of course if that design is preferable for some reason.
The base class could be removed regardless of that though, IMO just a matter of style taste. I have no love for extraneous classes or deep inheritance so happy to remove it if you feel it just complicates things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought the params were setup next to the execution, but I see now the extra level of indirection in the aggs classes. I'm fine with the way you have it in that case.
More generally, the args and accessors are mostly interchangeable, for the reasons you stated. Args are necessary when the value will change per execution.
@rjernst I like the idea of a more general name like re the deprecation warning: to make sure I understand, are you proposing that the old |
Actually I'm proposing the opposite. The default behavior should be the same as existing (to not suddenly break users on a minor upgrade, since I assume this will be backported to 6.x). The flag will control whether aggs is added to params. So if this is set, a deprecation warning should be emitted. By using a system property, a single deprecation message can be logged. By making it lazily checked (on the first use of the scripted metric agg), only users using scripted metric aggs will get the warning. And a user can prepare for upgrade to 7.0 (where we will remove it, in a follow up PR to master) by updating their scripts to use the new variable, and restarting with the sysprop set to disabled. Alternatively, this could be an updatable cluster setting, but this then gets tricky with what happens to the setting on upgrade (it would be auto archived I believe), and how the current value is read and updated down inside scripted metric agg construction code. |
Got it. I think we were on the same page other than the default state of the system property. I didn't realize this would likely end up in 6.x; obviously we wouldn't want to make it break by default in that case. @rjernst I will work on updates per your suggestions and probably have an update for the PR either this weekend or in a couple weeks, time permitting. Thanks! |
… scripts This causes the score to be evaluated only when it's used.
@rjernst @colings86 I believe the following are the remaining issues, each noted with how I think it needs to proceed. Input is appreciated:
|
@colings86 @rjernst the rename is done and all the doc updates that I know about are made. |
@rationull thanks for updating the PR. I've dismissed my review since its out of date now. I see there are some merge conflicts now, could you update the branch with the latest master and push?Then I'll set off a CI build and do another review |
@colings86 yep my timing on my last comment was a bit unfortunate as another small tweak in a different PR was merged a few hours later :) The conflicts are expected. I will resolve this after work today. |
@colings86 conflicts resolved. |
@elasticmachine jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rationull this looks good but I would like @rjernst to take a final look to make sure he is happy with the script specific changes before we merge.
Awesome, thanks @colings86 for the review. |
I'm ok with the script specific changes. While I think long term we should investigate using accumulators (instead of an arbitrary map), this will depend a lot on some painless optimizations that need to be made in order to fully utilize the performance improvements accumulators can offer (note, that also may allow scripted metric aggs to be on par with java based metric aggs). |
Thanks @rjernst for taking another look. @rationull could I ask you to update the branch with the latest master on more time, then I'll get the build to run and we can merge this. |
@colings86 Updated! |
@elasticmachine test this please |
@rationull Thanks for submitting this PR, I think it will be a really positive change for the scripted_metric aggregation. |
@colings86 no problem, thanks for the reviews! I have a couple of backwards compatibility/deprecation followup changes that I'll submit in the next few days. I will update the issue with details as well. |
* elastic/master: (92 commits) Reduce number of raw types warnings (elastic#31523) Migrate scripted metric aggregation scripts to ScriptContext design (elastic#30111) turn GetFieldMappingsResponse to ToXContentObject (elastic#31544) Close xcontent parsers (partial) (elastic#31513) Ingest Attachment: Upgrade Tika to 1.18 (elastic#31252) TEST: Correct the assertion arguments order (elastic#31540) Add get field mappings to High Level REST API Client (elastic#31423) [DOCS] Updates Watcher examples for code testing (elastic#31152) TEST: Add bwc recovery tests with synced-flush index [DOCS] Move sql to docs (elastic#31474) [DOCS] Move monitoring to docs folder (elastic#31477) Core: Combine doExecute methods in TransportAction (elastic#31517) IndexShard should not return null stats (elastic#31528) fix repository update with the same settings but different type (elastic#31458) Fix Mockito trying to mock IOException that isn't thrown by method (elastic#31433) (elastic#31527) Node selector per client rather than per request (elastic#31471) Core: Combine messageRecieved methods in TransportRequestHandler (elastic#31519) Upgrade to Lucene 7.4.0. (elastic#31529) [ML] Add ML filter update API (elastic#31437) Allow multiple unicast host providers (elastic#31509) ...
Thanks rationull and colings86 , When can we use this wonderful improvement |
* master: ingest: Add ignore_missing property to foreach filter (#22147) (#31578) Fix a formatting issue in the docvalue_fields documentation. (#31563) reduce log level at gradle configuration time [TEST] Close additional clients created while running yaml tests (#31575) Docs: Clarify sensitive fields watcher encryption (#31551) Watcher: Remove never executed code (#31135) Add support for switching distribution for all integration tests (#30874) Improve robustness of geo shape parser for malformed shapes (#31449) QA: Create xpack yaml features (#31403) Improve test times for tests using `RandomObjects::addFields` (#31556) [Test] Add full cluster restart test for Rollup (#31533) Enhance thread context uniqueness assertion [DOCS] Fix heading format errors (#31483) fix writeIndex evaluation for aliases (#31562) Add x-opaque-id to search slow logs (#31539) Watcher: Fix put watch action (#31524) Add package pre-install check for java binary (#31343) Reduce number of raw types warnings (#31523) Migrate scripted metric aggregation scripts to ScriptContext design (#30111) turn GetFieldMappingsResponse to ToXContentObject (#31544) Close xcontent parsers (partial) (#31513) Ingest Attachment: Upgrade Tika to 1.18 (#31252) TEST: Correct the assertion arguments order (#31540)
…30111) * Migrate scripted metric aggregation scripts to ScriptContext design #29328 * Rename new script context container class and add clarifying comments to remaining references to params._agg(s) * Misc cleanup: make mock metric agg script inner classes static * Move _score to an accessor rather than an arg for scripted metric agg scripts This causes the score to be evaluated only when it's used. * Documentation changes for params._agg -> agg * Migration doc addition for scripted metric aggs _agg object change * Rename "agg" Scripted Metric Aggregation script context variable to "state" * Rename a private base class from ...Agg to ...State that I missed in my last commit * Clean up imports after merge
@duxy-ios This will be in 6.4.0. |
* 6.x: Fix not waiting for Netty ThreadDeathWatcher in IT (#31758) (#31789) [Docs] Correct default window_size (#31582) S3 fixture should report 404 on unknown bucket (#31782) [ML] Limit ML filter items to 10K (#31731) Fixture for Minio testing (#31688) [ML] Return statistics about forecasts as part of the jobsstats and usage API (#31647) [DOCS] Add missing get mappings docs to HLRC (#31765) [DOCS] Starting Elasticsearch (#31701) Fix coerce validation_method in GeoBoundingBoxQueryBuilder (#31747) Painless: Complete Removal of Painless Type (#31699) Consolidate watcher setting update registration (#31762) [DOCS] Adds empty 6.3.1 release notes page ingest: Introduction of a bytes processor (#31733) [test] don't run bats tests for suse boxes (#31749) Add analyze API to high-level rest client (#31577) Implemented XContent serialisation for GetIndexResponse (#31675) [DOCS] Typos DOC: Add examples to the SQL docs (#31633) Add support for AWS session tokens (#30414) Watcher: Reenable start/stop yaml tests (#31754) JDBC: Fix stackoverflow on getObject and timestamp conversion (#31735) Support multiple system store types (#31650) Add write*Blob option to replace existing blob (#31729) Split CircuitBreaker-related tests (#31659) Painless: Add Context Docs (#31190) Docs: Remove missing reference Migrate scripted metric aggregation scripts to ScriptContext design (#30111) Watcher: Fix chain input toXcontent serialization (#31721) Remove _all example (#31711) rest-high-level: added get cluster settings (#31706) Docs: Match the examples in the description (#31710) [Docs] Correct typos (#31720) Extend allowed characters for grok field names (#21745) (#31653) (#31722) [DOCS] Check for Windows and *nix file paths (#31648) [ML] Validate ML filter_id (#31535) Fix gradle4.8 deprecation warnings (#31654) Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
This is the start to an implementation of new script contexts for scripted metric agg scripts as outlined in #29328. I don't think this is ready to merge yet but wanted to submit it for feedback on some specific questions and also a sanity check on general direction.
General questions (I will comment in the diff for more specific points):