-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to decompose "Taschenhersteller" #23
Comments
Does it work for other Terms like Ölpumpe ? Motoröl? Also you need to activate the default tokenizers like lowercase in your custom analyzer before the decomp token filter. That where my testcases back in the days. I also noticed some terms are not splitted by the plugin at all. Maybe we could improve the plugin together @jprante ? My analyser looks like this: "svb_decompoundAnalyzer":{
"filter":[
"lowercase",
"svb_decompound",
"unique"
],
"tokenizer":"standard"
} And filter: "svb_decompound":{
"type":"decompound"
}, |
The current implementation can be extended by custom compound words, for example code, see https://github.com/jprante/elasticsearch-analysis-decompound/blob/master/src/test/java/org/xbib/decompound/TrainerTests.java Possible input for german is the morphy lexicon morphy-mapping-20110717.latin1.gz |
Great !! It works really well :) This was my mapping which was erroneous. Here is the corrected version which works :
|
Is it possible for you to make a backport to Elastic 2.0 version ? It could be wunderbach :) Best regards, |
@jprante Thanks for that pointer, I will read into it. |
Hi,
First of all, thanks for your plugin, which could avoid to use the obscure compound word token filter with hyphenation_decompounder (https://www.elastic.co/guide/en/elasticsearch/reference/2.0/analysis-compound-word-tokenfilter.html)
Having said that I cannot decompose "Taschenhersteller" which is a german word which should be decomposed as 2 words : Taschen & Hersteller
Having installed your plugin, I made the following (possibly erroneous) mapping :
When trying to analyze the text "Taschenhersteller"
It gives me
Don't understand what I'm doing wrong ....
Could you help me please ? :)
The text was updated successfully, but these errors were encountered: