-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script_filter tokenfilter #33431
Conversation
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
} | ||
|
||
public boolean isKeyword() { | ||
return isKeyword; | ||
return keywordAtt.isKeyword(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use consistent terminology in the class names and token filter name? I think we should use "predicate" terminology everywhere? This will leave the namespace open for other types of scripted token filters in the future.
-------------------------------------------------- | ||
// CONSOLE | ||
|
||
<1> This will skip tokens that are 5 characters long or less |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description would make more sense with positive logic, since the predicate is positive based. So something like:
<1> This will emit tokens that are more than 5 characters long
@rjernst so call it |
How about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek. The new naming LGTM.
[[analysis-scriptfilter-tokenfilter]] | ||
=== Scripted Filtering Token Filter | ||
[[analysis-predicatefilter-tokenfilter]] | ||
=== Predicate Token Script Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Token Script Filter -> Token Filter Script
This allows users to filter out tokens from a TokenStream using painless scripts, instead of having to write specialised Java code and packaging it up into a plugin. The commit also refactors the AnalysisPredicateScript.Token class so that it wraps and makes read-only an AttributeSource.
* master: Preserve cluster settings on full restart tests (elastic#33590) Use IndexWriter.getFlushingBytes() rather than tracking it ourselves (elastic#33582) Fix upgrading of list settings (elastic#33589) Add read-only Engine (elastic#33563) HLRC: Add ML get categories API (elastic#33465) SQL: Adds MONTHNAME, DAYNAME and QUARTER functions (elastic#33411) Add predicate_token_filter (elastic#33431) Fix Replace function. Adds more tests to all string functions. (elastic#33478) [ML] Rename input_fields to column_names in file structure (elastic#33568)
* master: (91 commits) Preserve cluster settings on full restart tests (elastic#33590) Use IndexWriter.getFlushingBytes() rather than tracking it ourselves (elastic#33582) Fix upgrading of list settings (elastic#33589) Add read-only Engine (elastic#33563) HLRC: Add ML get categories API (elastic#33465) SQL: Adds MONTHNAME, DAYNAME and QUARTER functions (elastic#33411) Add predicate_token_filter (elastic#33431) Fix Replace function. Adds more tests to all string functions. (elastic#33478) [ML] Rename input_fields to column_names in file structure (elastic#33568) Add full cluster restart base class (elastic#33577) Validate list values for settings (elastic#33503) Copy and validatie soft-deletes setting on resize (elastic#33517) Test: Fix package name SQL: Fix result column names for arithmetic functions (elastic#33500) Upgrade to latest Lucene snapshot (elastic#33505) Enable not wiping cluster settings after REST test (elastic#33575) MINOR: Remove Dead Code in SearchScript (elastic#33569) [Test] Remove duplicate method in TestShardRouting (elastic#32815) mute test on windows Update beats template to include apm-server metrics (elastic#33286) ...
* support predicate_token_filter elastic/elasticsearch#33431 * add new file * fix failing unit tests
* support predicate_token_filter elastic/elasticsearch#33431 * add new file * fix failing unit tests
* support predicate_token_filter elastic/elasticsearch#33431 * add new file * fix failing unit tests
* support predicate_token_filter elastic/elasticsearch#33431 * add new file * fix failing unit tests (cherry picked from commit 6d5340b)
* support predicate_token_filter elastic/elasticsearch#33431 * add new file * fix failing unit tests (cherry picked from commit 6d5340b)
This will allow users to filter out terms using scripted predicates, rather than having to write Java code and wiring things up via analysis plugins.