Deprecate word_delimiter in favour of word_delimiter_graph #29216

liketic · 2018-03-23T06:23:57Z

Deprecate token filter word_delimiter in favour of word_delimiter_graph in 6.3.0.

Relates to #29061

elasticmachine · 2018-03-23T06:23:58Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

elasticmachine · 2018-03-23T06:23:58Z

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

elasticmachine · 2018-03-23T14:39:40Z

Pinging @elastic/es-search-aggs

nik9000 · 2018-03-23T14:40:06Z

@liketic, we usually don't make separate PRs for backports unless that change is super different.

javanna · 2018-05-07T13:06:05Z

@elastic/es-search-aggs thoughts on this one?

romseygeek · 2018-05-08T12:12:35Z

Sorry, I completely dropped the ball on this one. @liketic I got the order of doing things wrong for deprecation, what we need to do is to deprecate it in master, then we backport (changing versions appropriately), then we remove in master later. So this PR needs to be opened against master instead, and committed before #29092

Other than that I think this looks good, @cbuescher could you have a look as you've done some other analysis deprecations recently?

cbuescher · 2018-05-23T15:57:37Z

...-common/src/main/java/org/elasticsearch/analysis/common/WordDelimiterTokenFilterFactory.java

@@ -94,6 +99,9 @@ public WordDelimiterTokenFilterFactory(IndexSettings indexSettings, Environment
                settings, "protected_words");
        this.protoWords = protectedWords == null ? null : CharArraySet.copy(protectedWords);
        this.flags = flags;
+        if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_6_3_0)) {
+            DEPRECATION_LOGGER.deprecated("[word_delimiter] has been deprecated in favour of [word_delimiter_graph]");


There is also DeprecationLogger#deprecatedAndMaybeLog() that could probably be used here to prevent this to be logged multiple times. Then again, since this is the constructor, it shouldn't be called too often. Just a suggestion.

cbuescher · 2018-05-23T16:08:51Z

@romseygeek @liketic sorry also for the late response. I compared this to the things I did in #30209 where I wrapped the deprecation logging in the create-function that is passed to the PreConfiguredTokenFilter. It didn't occur to me at the time that the TokenFilterFactory might also be an appropriate place. I tried to understand the difference better today with regards to how those two places are called. I think I found a case where deprecating in the WordDelimiterTokenFilterFactory alone is not sufficient.
When you use the default "word_delimiter" in an analyzer like so, and then index a document:

    - do:
        indices.create:
          index: test_word_delimiter_deprecation
          body:
            settings:
              index:
                analysis:
                  analyzer:
                    my_analyzer:
                      tokenizer: standard
                      token_filter: ["word_delimiter"]
            mappings:
              type:
                properties:
                  name:
                    type: text
                    analyzer: my_analyzer

    - do:
        index:
          index:   test_word_delimiter_deprecation
          type:    type
          id:      1
          body:    { "name": "foo bar" }

It won't throw a deprecation warning (excuse the yaml syntax, its what I tried extending the rest test).

Adding another deprecation warning to the PreConfiguredTokenFilter (via the lambda) would still not emit a warning when the analyzer is specified, but upon indexing documents using the field. I'm not sure if there is a better way of achieving both or if this the best approach, interested in your thoughts.

colings86 · 2018-10-24T10:58:37Z

@romseygeek What do we need to do to progress this PR and get it and #29092 merged?

romseygeek · 2018-10-24T14:31:26Z

I think we need to re-open this against master, and add the second deprecation warning in the PreConfiguredTokenFilter constructor. @liketic would you be able to pick this up again?

liketic · 2018-10-25T01:24:32Z

@romseygeek No problem. I'll update this soon.

liketic · 2018-10-27T08:21:43Z

Thanks @romseygeek. I updated the PR as you commented. It's a long time sine my last commit. Please let me know if I missed anything.

romseygeek

Hi @liketic, thanks for the updated PR! I left some comments on how best to implement this. Would you also be able to add unit tests in WordDelimiterTokenFilterFactoryTests to check that we emit deprecation warnings on indexes created before v7, and throw an exception on indexes created after?

romseygeek · 2018-10-29T09:12:30Z

...s/analysis-common/src/test/resources/rest-api-spec/test/analysis-common/40_token_filters.yml

@@ -99,7 +99,14 @@

 ---
 "word_delimiter":
+    - skip:
+        version: " - 6.2.99"


This will need to be set to - 6.9.99 to pass backwards compatibility checks, and we can then change it later

romseygeek · 2018-10-29T10:08:19Z

server/src/main/java/org/elasticsearch/index/analysis/PreConfiguredTokenFilter.java

@@ -75,6 +81,9 @@ private PreConfiguredTokenFilter(String name, boolean useFilterForMultitermQueri
        super(name, cache);
        this.useFilterForMultitermQueries = useFilterForMultitermQueries;
        this.create = create;


Rather than doing this here, I think we need to do it in CommonAnalysisPlugin so that we can check the index version. We should be throwing an IllegalArgumentException if the index created version is greater than 7.0, and issuing a deprecation warning otherwise - you can see the same logic used for the nGram filter at CommonAnalysisPlugin#439.

This also needs to be done in the WordDelimiterTokenFilterFactory constructor (see NGramTokenFilterFactory for an example of deprecated/illegal behaviour based on version)

Thanks @romseygeek , if I'm not wrong, we should add the following code to WordDelimiterTokenFilterFactory's constructor and CommonAnalysisPlugin#472:

if (version.onOrAfter(Version.V_7_0_0_alpha1)) { throw new IllegalArgumentException( "The [word_delimiter] token filter has been removed. Please change the filter name to [word_delimiter_graph] instead."); } else { deprecationLogger.deprecatedAndMaybeLog("word_delimiter_deprecation", "The [word_delimiter] token filter name is deprecated and will be removed in a future version. " + "Please change the filter name to [word_delimiter_graph] instead."); }

However, I didn't find out how to check deprecation warning in WordDelimiterTokenFilterFactoryTests, which is not subclass of ESTestCase. Could you help me? Thanks in advance!

Deprecation checks should go in CommonAnalysisPluginTests instead - there are already some checks in there for the ngram filter, which should give you a base to work with.

romseygeek · 2018-12-05T13:55:38Z

Hi @liketic, are you still interested in iterating on this one?

liketic · 2018-12-09T14:01:54Z

@romseygeek I made some updates. Please review again.

romseygeek

Thanks @liketic! I have one more question, but apart from that this looks great.

romseygeek · 2018-12-09T16:51:32Z

...alysis-common/src/test/java/org/elasticsearch/analysis/common/CommonAnalysisPluginTests.java

+        IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
+        try (CommonAnalysisPlugin commonAnalysisPlugin = new CommonAnalysisPlugin()) {
+            Map<String, TokenFilterFactory> tokenFilters = createTestAnalysis(idxSettings, settings, commonAnalysisPlugin).tokenFilter;
+            TokenFilterFactory tokenFilterFactory = tokenFilters.get("word_delimiter");


I'd expect the exception to be thrown here, rather than when create() is called below? Can you double-check this test?

Thanks @romansanchez . The test is passed in my local environment. The tokenFilters is a map and if we do nothing with the tokenFilterFactory, no exception will be thrown.

Right, but there should be an Exception thrown in the WordDelimiterTokenFilterFactory constructor, so createTestAnalysis ought to fail, I think? I can have a look tomorrow if you don't get to it.

mayya-sharipova · 2019-03-01T18:19:55Z

@romseygeek @liketic There was no progress on this PR for some time, and also on related PR: #29092. I wonder what is the status on them (can they be merged, closed, marked as stalled)?
Context: on Fixit Thursday we went through some of these old PRs

romseygeek · 2019-03-04T10:30:50Z

I'm going to mark this one as stalled for now, as we have another issue (#37474) with a different approach - thanks for all you work on it @liketic!

cbuescher · 2021-03-29T10:20:51Z

No activity here for a long time and it seems we have another issue (#37474) with a different approach, so I'm closing here.

liketic mentioned this pull request Mar 23, 2018

Disallow word_delimiter in favour of word_delimiter_graph #29092

Open

nik9000 added the :Search Relevance/Analysis How text is split into tokens label Mar 23, 2018

colings86 added the >deprecation label Apr 24, 2018

javanna added the review label May 7, 2018

cbuescher self-assigned this May 8, 2018

cbuescher reviewed May 23, 2018

View reviewed changes

rjernst removed the review label Oct 10, 2018

liketic added 2 commits October 27, 2018 16:01

Deprecate word_delimiter in 6.3 (elastic#29061)

40c147a

Update deprecated logger

aa751c0

liketic force-pushed the fix-29061-6.x branch from 10ab6a4 to aa751c0 Compare October 27, 2018 08:04

liketic changed the base branch from 6.x to master October 27, 2018 08:05

Fix comment

92ba6e2

romseygeek suggested changes Oct 29, 2018

View reviewed changes

romseygeek mentioned this pull request Dec 5, 2018

[DOCS] Adds deprecation note to word delimiter token filter #34198

Closed

liketic added 3 commits December 9, 2018 21:21

Add tests

06889d7

remove empty line

7773a1c

Merge master

9a6e3d6

romseygeek reviewed Dec 9, 2018

View reviewed changes

cbuescher assigned romseygeek and unassigned cbuescher Dec 10, 2018

Fix removed version

d851ae4

romseygeek mentioned this pull request Jan 15, 2019

Consider merging word_delimiter and word_delimiter_graph #37474

Open

romseygeek added the stalled label Mar 4, 2019

rjernst added the Team:Search Meta label for search team label May 4, 2020

cbuescher closed this Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate word_delimiter in favour of word_delimiter_graph #29216

Deprecate word_delimiter in favour of word_delimiter_graph #29216

liketic commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

nik9000 commented Mar 23, 2018

javanna commented May 7, 2018

romseygeek commented May 8, 2018

cbuescher May 23, 2018

cbuescher commented May 23, 2018 •

edited

Loading

colings86 commented Oct 24, 2018

romseygeek commented Oct 24, 2018

liketic commented Oct 25, 2018

liketic commented Oct 27, 2018

romseygeek left a comment

romseygeek Oct 29, 2018

romseygeek Oct 29, 2018

liketic Nov 4, 2018

romseygeek Nov 6, 2018

romseygeek commented Dec 5, 2018

liketic commented Dec 9, 2018

romseygeek left a comment

romseygeek Dec 9, 2018

liketic Dec 10, 2018

romseygeek Dec 10, 2018

mayya-sharipova commented Mar 1, 2019

romseygeek commented Mar 4, 2019

cbuescher commented Mar 29, 2021

Deprecate word_delimiter in favour of word_delimiter_graph #29216

Deprecate word_delimiter in favour of word_delimiter_graph #29216

Conversation

liketic commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

elasticmachine commented Mar 23, 2018

nik9000 commented Mar 23, 2018

javanna commented May 7, 2018

romseygeek commented May 8, 2018

Choose a reason for hiding this comment

cbuescher commented May 23, 2018 • edited Loading

colings86 commented Oct 24, 2018

romseygeek commented Oct 24, 2018

liketic commented Oct 25, 2018

liketic commented Oct 27, 2018

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek commented Dec 5, 2018

liketic commented Dec 9, 2018

romseygeek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova commented Mar 1, 2019

romseygeek commented Mar 4, 2019

cbuescher commented Mar 29, 2021

cbuescher commented May 23, 2018 •

edited

Loading