-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for customizing the rule file in ICU tokenizer #13651
Conversation
super(index, indexSettings, name, settings); | ||
config = getIcuConfig(env, settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to call a public non-final method from a constructor? This can cause issues when somebody subclasses IcuTokenizerFactory
as initialization order gets important. I'd opt for making #getIcuConfig()
private which avoids the issue. Alternatives: Declare #getIcuConfig()
or the class as final
.
@xuzha: Sorry that it took so long that somebody reviews your PR. I left a few comments. Just let me know if you have further questions. I think it would make sense to resolve conflicts first given that this branch is now quite out of date. |
Thx @danielmitterdorfer for the review. |
ping @xuzha, I think this needs updating and then another review by @danielmitterdorfer ? |
Sorry @dakrone, I was sick last week. I will try to update this ASAP. |
Thanks @danielmitterdorfer for the review. I just pushed another commit to address the comments. Sorry that it took a while to update this, it looks so hacky to me :-) |
@xuzha I answered your questions. I hope that helps now. |
Thanks @danielmitterdorfer, I just pushed another commit. |
private final ICUTokenizerConfig config; | ||
private static final String RULE_FILES = "rule_files"; | ||
|
||
public static final Setting<List<String>> SETTING_RULE_FILES = | ||
Setting.listSetting(RULE_FILES, new ArrayList<>(), Function.identity(), Setting.Property.IndexScope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could use Collections.emptyList()
instead of new ArrayList<>()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha thanks @danielmitterdorfer ;-)
Left a minor comment, otherwise LGTM. |
@xuzha Can you please add an appropriate version label? |
Thanks @danielmitterdorfer so much for the review. Will add a version when merging the PR. I will leave this PR open for another few days. Comments are very welcomed. <3 <3 <3 |
Lucene allows to create a ICUTokenizer with a special config argument enabling the customization of the rule based iterator by providing custom rules files. This commit enable this feature. Users could provide a list of RBBI rule files to ICU tokenizer. closes elastic#13146
PR broke the integ tests. ;-(, changed using the new setting back the old version : 3e4b470 |
Lucene allows to create a ICUTokenizer with a special config argument
enabling the customization of the rule based iterator by providing
custom rules files.
This commit enable this feature. Users could provide a list of RBBI rule
files to ICU tokenizer.
closes #13146