-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to ignore synonym token filter exceptions due to analyzer processing #30968
Comments
Pinging @elastic/es-search-aggs |
We discussed internally and agreed that's something we'd like to support. |
@jimczi Thanks, that's great news. |
Big +1 here |
Just as an update I am working on this and should have a patch by next week. |
* Added lenient flag for synonym-tokenfilter. Relates to #30968 * added docs for synonym-graph-tokenfilter -- Also made lenient final -- changed from !lenient to lenient == false * Changes after review (1) -- Renamed to ElasticsearchSynonymParser -- Added explanation for ElasticsearchSynonymParser::add method -- Changed ElasticsearchSynonymParser::logger instance to static * Added lenient option for WordnetSynonymParser -- also added more documentation * Added additional documentation * Improved documentation
* Added lenient flag for synonym-tokenfilter. Relates to #30968 * added docs for synonym-graph-tokenfilter -- Also made lenient final -- changed from !lenient to lenient == false * Changes after review (1) -- Renamed to ElasticsearchSynonymParser -- Added explanation for ElasticsearchSynonymParser::add method -- Changed ElasticsearchSynonymParser::logger instance to static * Added lenient option for WordnetSynonymParser -- also added more documentation * Added additional documentation * Improved documentation (cherry picked from commit 88c270d)
@mayya-sharipova This can be closed now? |
ElasticSearch 6.0 introduced a breaking change in the way the synonym token filter behaves in custom analyzers. The tokenizer of the analyzer and the tokenfilters preceding the synonym token filter in the analyzer will be applied to each synonym. If a synonym is removed as a result of applying them, it will throw, causing index creation to fail.
I propose that a flag be added to index settings on the synonym token filter settings to simply ignore the error that is thrown or do some other check to prevent throwing. This will allow those of us with complex synonym strategies to be flexible with our usage of synonyms with the full understanding that some synonyms will be ignored in some analyzers.
The main use case here is that we have a thousands of synonyms managed by a content team. We emit a single synonyms config file for configuration that is used in many custom analyzers for many variations on text fields. This strategy has worked very well for us. Coupling the validity of a synonym to a specific analyzer requires a specific synonym configuration per analyzer in addition to a non-technical team having a full understanding of elasticsearch analyzers, both of which are obtrusive.
Example:
Take the synonym
&,and
The ampersand will be eliminated by the standard tokenizer used in custom analyzer
A
which will throw causing the index creation to throw. However, that's a synonym that is useful in a different custom analyzer,B
, that uses the whitespace tokenizer where the&
is preserved.Reference:
https://discuss.elastic.co/t/why-the-synonym-filter-change-in-6-0/133740
#27481
https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_60_analysis_changes.html
The text was updated successfully, but these errors were encountered: