-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewire terms agg to use new VS registry #51182
Rewire terms agg to use new VS registry #51182
Conversation
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
...src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, thanks for fielding this one. I'm a little concerned that the tests aren't covering our expected use case, with respect to userValueTypeHint
. Unless I'm misunderstanding exactly what those tests are trying to exercise (in which case, please add some javadoc). I also left a couple of discussion comments, but we don't need to fix those in this PR.
...src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
Outdated
Show resolved
Hide resolved
* This supplier is used for all the field types that should be aggregated as bytes/strings, | ||
* including those that need global ordinals | ||
*/ | ||
private static TermsAggregatorSupplier bytesSupplier() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this static method pattern, very readable and clear which VSTs get which aggregator.
...src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/search/aggregations/bucket/terms/TermsAggregatorFactory.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/search/aggregations/bucket/terms/StringTermsAggregatorTests.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/search/aggregations/bucket/terms/StringTermsAggregatorTests.java
Outdated
Show resolved
Hide resolved
...test/java/org/elasticsearch/search/aggregations/bucket/terms/StringTermsAggregatorTests.java
Outdated
Show resolved
Hide resolved
- The only "bytes" that is even accepts a format is Binary, so split that into it's own test for include/exclude regex testing - Keyword test is now basically just match_all/match_none boilerplate, waiting for some industrious soul to fill it out later - Numeric has include/exclude tests for all formatting - with and without user supplied type hints
…/elasticsearch into vs_refactor_terms_agg
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.elasticsearch.search.aggregations.support; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I was thinking about while wiring up Cardinality - these aggregator supplier interfaces should probably go in the same package as the aggregators. Their signature is tightly bound to the aggregator signature, and it means we don't need to increase the visibility on package private aggregators (which some of them are).
…/elasticsearch into vs_refactor_terms_agg
- Removes the need to whitelist due to merging in feature branch - adds some TODOs - Moves the Supplier into the terms package, makes it package-private
@not-napoleon think this is ready for another look :) |
LGTM, Thanks for wiring this up! |
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041) this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (elastic#48638) * ValuesSourceRegistry Prototype (elastic#48758) * Remove generics from ValuesSource related classes (elastic#49606) * fix percentile aggregation tests (elastic#50712) * Basic thread safety for ValuesSourceRegistry (elastic#50340) * Remove target value type from ValuesSourceAggregationBuilder (elastic#49943) * Cleanup default values source type (elastic#50992) * CoreValuesSourceType no longer implements Writable (elastic#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (elastic#51131) * Put values source types on fields (elastic#51503) * Remove VST Any (elastic#51539) * Rewire terms agg to use new VS registry (elastic#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (elastic#51337) * Wire Percentiles aggregator into new VS framework (elastic#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (elastic#51647) * fix checkstyle after merge (elastic#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (elastic#51710) * Convert RareTerms to new VS registry (elastic#52166) * Wire up Value Count (elastic#52225) * Wire up Max & Min aggregations (elastic#52219) * ValuesSource refactoring: Wire up Sum aggregation (elastic#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (elastic#52590) * Soft immutability for VSConfig (elastic#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (elastic#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (elastic#52891) * ValuesSource refactoring: Wire up string_stats aggregation (elastic#52875) * VS refactoring: Wire up median (MAD) aggregation (elastic#52945) * fix valuesourcetype issue with constant_keyword field (elastic#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (elastic#52752) * Wire PercentileRanks aggregator into new VS framework (elastic#51693) * Add a VSConfig resolver for aggregations not using the registry (elastic#53038) * Vs refactor wire up ranges and date ranges (elastic#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (elastic#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates elastic#42949. * VS refactoring: convert Boxplot to new registry (elastic#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (elastic#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue elastic#42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (elastic#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue elastic#42949. * Fix type tests for Missing aggregation (elastic#53501) * ValuesSource Refactor: move histo VSType into XPack module (elastic#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (elastic#53484) * Vs refactor parser cleanup (elastic#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
* Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo*_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. * Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Zachary Tong <[email protected]> Co-authored-by: Christos Soulios <[email protected]> Co-authored-by: Tal Levy <[email protected]>
This is mostly moving the agg construction code into the registry model, without making many changes to the way the aggs are constructed. I think there is an opportunity to refactor how we're using the
ExecutionMode
enum to delegate some construction details, but didn't want to clutter the PR with unrelated changes (and wanted to keep it simple to minimize test failures).Also added a basic
AggregatorTestCase
for an untested code path. We'll need to convert the rests of the tests over eventually, ala #42893