Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate word_delimiter in favour of word_delimiter_graph #29216

Closed
wants to merge 7 commits into from

Conversation

liketic
Copy link

@liketic liketic commented Mar 23, 2018

Deprecate token filter word_delimiter in favour of word_delimiter_graph in 6.3.0.

Relates to #29061

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@nik9000
Copy link
Member

nik9000 commented Mar 23, 2018

@liketic, we usually don't make separate PRs for backports unless that change is super different.

@javanna
Copy link
Member

javanna commented May 7, 2018

@elastic/es-search-aggs thoughts on this one?

@romseygeek
Copy link
Contributor

Sorry, I completely dropped the ball on this one. @liketic I got the order of doing things wrong for deprecation, what we need to do is to deprecate it in master, then we backport (changing versions appropriately), then we remove in master later. So this PR needs to be opened against master instead, and committed before #29092

Other than that I think this looks good, @cbuescher could you have a look as you've done some other analysis deprecations recently?

@cbuescher cbuescher self-assigned this May 8, 2018
@@ -94,6 +99,9 @@ public WordDelimiterTokenFilterFactory(IndexSettings indexSettings, Environment
settings, "protected_words");
this.protoWords = protectedWords == null ? null : CharArraySet.copy(protectedWords);
this.flags = flags;
if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_6_3_0)) {
DEPRECATION_LOGGER.deprecated("[word_delimiter] has been deprecated in favour of [word_delimiter_graph]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also DeprecationLogger#deprecatedAndMaybeLog() that could probably be used here to prevent this to be logged multiple times. Then again, since this is the constructor, it shouldn't be called too often. Just a suggestion.

@cbuescher
Copy link
Member

cbuescher commented May 23, 2018

@romseygeek @liketic sorry also for the late response. I compared this to the things I did in #30209 where I wrapped the deprecation logging in the create-function that is passed to the PreConfiguredTokenFilter. It didn't occur to me at the time that the TokenFilterFactory might also be an appropriate place. I tried to understand the difference better today with regards to how those two places are called. I think I found a case where deprecating in the WordDelimiterTokenFilterFactory alone is not sufficient.
When you use the default "word_delimiter" in an analyzer like so, and then index a document:

    - do:
        indices.create:
          index: test_word_delimiter_deprecation
          body:
            settings:
              index:
                analysis:
                  analyzer:
                    my_analyzer:
                      tokenizer: standard
                      token_filter: ["word_delimiter"]
            mappings:
              type:
                properties:
                  name:
                    type: text
                    analyzer: my_analyzer

    - do:
        index:
          index:   test_word_delimiter_deprecation
          type:    type
          id:      1
          body:    { "name": "foo bar" }

It won't throw a deprecation warning (excuse the yaml syntax, its what I tried extending the rest test).

Adding another deprecation warning to the PreConfiguredTokenFilter (via the lambda) would still not emit a warning when the analyzer is specified, but upon indexing documents using the field. I'm not sure if there is a better way of achieving both or if this the best approach, interested in your thoughts.

@rjernst rjernst removed the review label Oct 10, 2018
@colings86
Copy link
Contributor

@romseygeek What do we need to do to progress this PR and get it and #29092 merged?

@romseygeek
Copy link
Contributor

I think we need to re-open this against master, and add the second deprecation warning in the PreConfiguredTokenFilter constructor. @liketic would you be able to pick this up again?

@liketic
Copy link
Author

liketic commented Oct 25, 2018

@romseygeek No problem. I'll update this soon.

@liketic liketic changed the base branch from 6.x to master October 27, 2018 08:05
@liketic
Copy link
Author

liketic commented Oct 27, 2018

Thanks @romseygeek. I updated the PR as you commented. It's a long time sine my last commit. Please let me know if I missed anything.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liketic, thanks for the updated PR! I left some comments on how best to implement this. Would you also be able to add unit tests in WordDelimiterTokenFilterFactoryTests to check that we emit deprecation warnings on indexes created before v7, and throw an exception on indexes created after?

@@ -99,7 +99,14 @@

---
"word_delimiter":
- skip:
version: " - 6.2.99"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be set to - 6.9.99 to pass backwards compatibility checks, and we can then change it later

@@ -75,6 +81,9 @@ private PreConfiguredTokenFilter(String name, boolean useFilterForMultitermQueri
super(name, cache);
this.useFilterForMultitermQueries = useFilterForMultitermQueries;
this.create = create;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than doing this here, I think we need to do it in CommonAnalysisPlugin so that we can check the index version. We should be throwing an IllegalArgumentException if the index created version is greater than 7.0, and issuing a deprecation warning otherwise - you can see the same logic used for the nGram filter at CommonAnalysisPlugin#439.

This also needs to be done in the WordDelimiterTokenFilterFactory constructor (see NGramTokenFilterFactory for an example of deprecated/illegal behaviour based on version)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek , if I'm not wrong, we should add the following code to WordDelimiterTokenFilterFactory's constructor and CommonAnalysisPlugin#472:

        if (version.onOrAfter(Version.V_7_0_0_alpha1)) {
            throw new IllegalArgumentException(
                "The [word_delimiter] token filter has been removed. Please change the filter name to [word_delimiter_graph] instead.");
        } else {
            deprecationLogger.deprecatedAndMaybeLog("word_delimiter_deprecation",
                "The [word_delimiter] token filter name is deprecated and will be removed in a future version. "
                    + "Please change the filter name to [word_delimiter_graph] instead.");
        }

However, I didn't find out how to check deprecation warning in WordDelimiterTokenFilterFactoryTests, which is not subclass of ESTestCase. Could you help me? Thanks in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecation checks should go in CommonAnalysisPluginTests instead - there are already some checks in there for the ngram filter, which should give you a base to work with.

@romseygeek
Copy link
Contributor

Hi @liketic, are you still interested in iterating on this one?

@liketic
Copy link
Author

liketic commented Dec 9, 2018

@romseygeek I made some updates. Please review again.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liketic! I have one more question, but apart from that this looks great.

IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings);
try (CommonAnalysisPlugin commonAnalysisPlugin = new CommonAnalysisPlugin()) {
Map<String, TokenFilterFactory> tokenFilters = createTestAnalysis(idxSettings, settings, commonAnalysisPlugin).tokenFilter;
TokenFilterFactory tokenFilterFactory = tokenFilters.get("word_delimiter");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the exception to be thrown here, rather than when create() is called below? Can you double-check this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romansanchez . The test is passed in my local environment. The tokenFilters is a map and if we do nothing with the tokenFilterFactory, no exception will be thrown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but there should be an Exception thrown in the WordDelimiterTokenFilterFactory constructor, so createTestAnalysis ought to fail, I think? I can have a look tomorrow if you don't get to it.

@cbuescher cbuescher assigned romseygeek and unassigned cbuescher Dec 10, 2018
@mayya-sharipova
Copy link
Contributor

@romseygeek @liketic There was no progress on this PR for some time, and also on related PR: #29092. I wonder what is the status on them (can they be merged, closed, marked as stalled)?
Context: on Fixit Thursday we went through some of these old PRs

@romseygeek
Copy link
Contributor

I'm going to mark this one as stalled for now, as we have another issue (#37474) with a different approach - thanks for all you work on it @liketic!

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@cbuescher
Copy link
Member

No activity here for a long time and it seems we have another issue (#37474) with a different approach, so I'm closing here.

@cbuescher cbuescher closed this Mar 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>deprecation :Search Relevance/Analysis How text is split into tokens stalled Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants