-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate word_delimiter in favour of word_delimiter_graph #29216
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
Pinging @elastic/es-search-aggs |
@liketic, we usually don't make separate PRs for backports unless that change is super different. |
@elastic/es-search-aggs thoughts on this one? |
Sorry, I completely dropped the ball on this one. @liketic I got the order of doing things wrong for deprecation, what we need to do is to deprecate it in master, then we backport (changing versions appropriately), then we remove in master later. So this PR needs to be opened against master instead, and committed before #29092 Other than that I think this looks good, @cbuescher could you have a look as you've done some other analysis deprecations recently? |
@@ -94,6 +99,9 @@ public WordDelimiterTokenFilterFactory(IndexSettings indexSettings, Environment | |||
settings, "protected_words"); | |||
this.protoWords = protectedWords == null ? null : CharArraySet.copy(protectedWords); | |||
this.flags = flags; | |||
if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_6_3_0)) { | |||
DEPRECATION_LOGGER.deprecated("[word_delimiter] has been deprecated in favour of [word_delimiter_graph]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also DeprecationLogger#deprecatedAndMaybeLog() that could probably be used here to prevent this to be logged multiple times. Then again, since this is the constructor, it shouldn't be called too often. Just a suggestion.
@romseygeek @liketic sorry also for the late response. I compared this to the things I did in #30209 where I wrapped the deprecation logging in the create-function that is passed to the PreConfiguredTokenFilter. It didn't occur to me at the time that the TokenFilterFactory might also be an appropriate place. I tried to understand the difference better today with regards to how those two places are called. I think I found a case where deprecating in the WordDelimiterTokenFilterFactory alone is not sufficient.
It won't throw a deprecation warning (excuse the yaml syntax, its what I tried extending the rest test). Adding another deprecation warning to the PreConfiguredTokenFilter (via the lambda) would still not emit a warning when the analyzer is specified, but upon indexing documents using the field. I'm not sure if there is a better way of achieving both or if this the best approach, interested in your thoughts. |
@romseygeek What do we need to do to progress this PR and get it and #29092 merged? |
I think we need to re-open this against master, and add the second deprecation warning in the PreConfiguredTokenFilter constructor. @liketic would you be able to pick this up again? |
@romseygeek No problem. I'll update this soon. |
10ab6a4
to
aa751c0
Compare
Thanks @romseygeek. I updated the PR as you commented. It's a long time sine my last commit. Please let me know if I missed anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @liketic, thanks for the updated PR! I left some comments on how best to implement this. Would you also be able to add unit tests in WordDelimiterTokenFilterFactoryTests to check that we emit deprecation warnings on indexes created before v7, and throw an exception on indexes created after?
@@ -99,7 +99,14 @@ | |||
|
|||
--- | |||
"word_delimiter": | |||
- skip: | |||
version: " - 6.2.99" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be set to - 6.9.99
to pass backwards compatibility checks, and we can then change it later
@@ -75,6 +81,9 @@ private PreConfiguredTokenFilter(String name, boolean useFilterForMultitermQueri | |||
super(name, cache); | |||
this.useFilterForMultitermQueries = useFilterForMultitermQueries; | |||
this.create = create; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this here, I think we need to do it in CommonAnalysisPlugin so that we can check the index version. We should be throwing an IllegalArgumentException if the index created version is greater than 7.0, and issuing a deprecation warning otherwise - you can see the same logic used for the nGram
filter at CommonAnalysisPlugin#439
.
This also needs to be done in the WordDelimiterTokenFilterFactory constructor (see NGramTokenFilterFactory
for an example of deprecated/illegal behaviour based on version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek , if I'm not wrong, we should add the following code to WordDelimiterTokenFilterFactory
's constructor and CommonAnalysisPlugin#472:
if (version.onOrAfter(Version.V_7_0_0_alpha1)) {
throw new IllegalArgumentException(
"The [word_delimiter] token filter has been removed. Please change the filter name to [word_delimiter_graph] instead.");
} else {
deprecationLogger.deprecatedAndMaybeLog("word_delimiter_deprecation",
"The [word_delimiter] token filter name is deprecated and will be removed in a future version. "
+ "Please change the filter name to [word_delimiter_graph] instead.");
}
However, I didn't find out how to check deprecation warning in WordDelimiterTokenFilterFactoryTests
, which is not subclass of ESTestCase
. Could you help me? Thanks in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecation checks should go in CommonAnalysisPluginTests
instead - there are already some checks in there for the ngram filter, which should give you a base to work with.
Hi @liketic, are you still interested in iterating on this one? |
@romseygeek I made some updates. Please review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liketic! I have one more question, but apart from that this looks great.
IndexSettings idxSettings = IndexSettingsModule.newIndexSettings("index", settings); | ||
try (CommonAnalysisPlugin commonAnalysisPlugin = new CommonAnalysisPlugin()) { | ||
Map<String, TokenFilterFactory> tokenFilters = createTestAnalysis(idxSettings, settings, commonAnalysisPlugin).tokenFilter; | ||
TokenFilterFactory tokenFilterFactory = tokenFilters.get("word_delimiter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the exception to be thrown here, rather than when create()
is called below? Can you double-check this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romansanchez . The test is passed in my local environment. The tokenFilters
is a map
and if we do nothing with the tokenFilterFactory
, no exception will be thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but there should be an Exception thrown in the WordDelimiterTokenFilterFactory constructor, so createTestAnalysis ought to fail, I think? I can have a look tomorrow if you don't get to it.
@romseygeek @liketic There was no progress on this PR for some time, and also on related PR: #29092. I wonder what is the status on them (can they be merged, closed, marked as stalled)? |
No activity here for a long time and it seems we have another issue (#37474) with a different approach, so I'm closing here. |
Deprecate token filter
word_delimiter
in favour ofword_delimiter_graph
in 6.3.0.Relates to #29061