Add flag to ignore synonym token filter exceptions due to analyzer processing #30968

jtreher · 2018-05-30T15:22:10Z

ElasticSearch 6.0 introduced a breaking change in the way the synonym token filter behaves in custom analyzers. The tokenizer of the analyzer and the tokenfilters preceding the synonym token filter in the analyzer will be applied to each synonym. If a synonym is removed as a result of applying them, it will throw, causing index creation to fail.

I propose that a flag be added to index settings on the synonym token filter settings to simply ignore the error that is thrown or do some other check to prevent throwing. This will allow those of us with complex synonym strategies to be flexible with our usage of synonyms with the full understanding that some synonyms will be ignored in some analyzers.

The main use case here is that we have a thousands of synonyms managed by a content team. We emit a single synonyms config file for configuration that is used in many custom analyzers for many variations on text fields. This strategy has worked very well for us. Coupling the validity of a synonym to a specific analyzer requires a specific synonym configuration per analyzer in addition to a non-technical team having a full understanding of elasticsearch analyzers, both of which are obtrusive.

Example:

Take the synonym &,and

The ampersand will be eliminated by the standard tokenizer used in custom analyzer A which will throw causing the index creation to throw. However, that's a synonym that is useful in a different custom analyzer, B, that uses the whitespace tokenizer where the & is preserved.

Reference:

https://discuss.elastic.co/t/why-the-synonym-filter-change-in-6-0/133740

#27481

https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking_60_analysis_changes.html

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-05-30T15:59:53Z

Pinging @elastic/es-search-aggs

jimczi · 2018-06-04T16:10:21Z

We discussed internally and agreed that's something we'd like to support.
It will require a fork of the synonym parser in Lucene to add the lenient option. I marked the issue as adopt me for now and will come back to it if it's not already taken.

jtreher · 2018-06-04T17:02:13Z

@jimczi Thanks, that's great news.

byronvoorbach · 2018-06-19T14:35:44Z

Big +1 here

sohaibiftikhar · 2018-06-20T13:04:55Z

Just as an update I am working on this and should have a patch by next week.

Relates to elastic#30968

* Added lenient flag for synonym-tokenfilter. Relates to #30968 * added docs for synonym-graph-tokenfilter -- Also made lenient final -- changed from !lenient to lenient == false * Changes after review (1) -- Renamed to ElasticsearchSynonymParser -- Added explanation for ElasticsearchSynonymParser::add method -- Changed ElasticsearchSynonymParser::logger instance to static * Added lenient option for WordnetSynonymParser -- also added more documentation * Added additional documentation * Improved documentation

* Added lenient flag for synonym-tokenfilter. Relates to #30968 * added docs for synonym-graph-tokenfilter -- Also made lenient final -- changed from !lenient to lenient == false * Changes after review (1) -- Renamed to ElasticsearchSynonymParser -- Added explanation for ElasticsearchSynonymParser::add method -- Changed ElasticsearchSynonymParser::logger instance to static * Added lenient option for WordnetSynonymParser -- also added more documentation * Added additional documentation * Improved documentation (cherry picked from commit 88c270d)

sohaibiftikhar · 2018-07-13T21:15:25Z

@mayya-sharipova This can be closed now?

jimczi added the :Search Relevance/Analysis How text is split into tokens label May 30, 2018

jimczi added discuss team-discuss >enhancement help wanted adoptme and removed discuss team-discuss labels May 30, 2018

damienalexandre mentioned this issue Jun 9, 2018

Synonym Token Filter wrongly assume that term is completely eliminated by analyzer #31224

Closed

sohaibiftikhar mentioned this issue Jun 20, 2018

Added lenient flag for synonym token filter #31484

Merged

mayya-sharipova removed the help wanted adoptme label Jun 22, 2018

sohaibiftikhar added a commit to sohaibiftikhar/elasticsearch that referenced this issue Jul 3, 2018

Added lenient flag for synonym-tokenfilter.

aa9cbfb

Relates to elastic#30968

sohaibiftikhar mentioned this issue Jul 4, 2018

'Failed to build synonyms' when using delimiter_graph + synonym_graph - 6.2.3 #29426

Open

mayya-sharipova closed this as completed Jul 13, 2018

romseygeek mentioned this issue Sep 11, 2018

keyword_repeat and multiplexer don't play well with subsequent synonym filters #33609

Closed

javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add flag to ignore synonym token filter exceptions due to analyzer processing #30968

Add flag to ignore synonym token filter exceptions due to analyzer processing #30968

jtreher commented May 30, 2018 •

edited

Loading

elasticmachine commented May 30, 2018

jimczi commented Jun 4, 2018

jtreher commented Jun 4, 2018

byronvoorbach commented Jun 19, 2018

sohaibiftikhar commented Jun 20, 2018

sohaibiftikhar commented Jul 13, 2018

Add flag to ignore synonym token filter exceptions due to analyzer processing #30968

Add flag to ignore synonym token filter exceptions due to analyzer processing #30968

Comments

jtreher commented May 30, 2018 • edited Loading

elasticmachine commented May 30, 2018

jimczi commented Jun 4, 2018

jtreher commented Jun 4, 2018

byronvoorbach commented Jun 19, 2018

sohaibiftikhar commented Jun 20, 2018

sohaibiftikhar commented Jul 13, 2018

jtreher commented May 30, 2018 •

edited

Loading