-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DYN-6061 search missing nodes #14258
DYN-6061 search missing nodes #14258
Conversation
Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term. Also I've added a Unit Test that validates if a set of specific nodes are found or not.
case "pt-BR": | ||
return new BrazilianAnalyzer(LuceneConfig.LuceneNetVersion); | ||
case "ru-RU": | ||
return new RussianAnalyzer(LuceneConfig.LuceneNetVersion); | ||
default: | ||
return new StandardAnalyzer(LuceneConfig.LuceneNetVersion); | ||
return new LuceneCustomAnalyzer(LuceneConfig.LuceneNetVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @RobertGlobant20 Why do we apply LuceneCustomAnalyzer only on certain cases? How about other languages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This problem is happening due that we are using the StandarAnalyzer (this use english by default), that removes some the english words like "a", "an", "and", "are", "as", "at", "be", "but", "by", "And", "Or", "If" and also some specific symbols like "+", "*", so when we execute a query those words/symbols are removed (or escaped) and the node won't listed in the results.
Not sure if for other languages this change will apply or not but we can test if is needed (e.g. if the language is Chinese Simplified probably will remove “并且”、“或者”、“如果” and I guess that the nodes "And", "Or", "If" are not translated (do they?) so there is not problem but we need to check with "+", "-", "*" nodes.
For doing this test we need to switch to Dynamo Chinese language and then create a package with several nodes in Chinese and then search the nodes "+", "And", "*" and also for the nodes in "Chinese", I will test this case tomorrow to see the behavior.
Let me know your thoughts about it.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, make sense
@RobertGlobant20 Would you cherry-pick this? |
Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term. Also I've added a Unit Test that validates if a set of specific nodes are found or not.
Purpose
Fixing missing nodes in Lucene nodes search
Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term.
Also I've added a Unit Test that validates if a set of specific nodes are found or not.
Declarations
Check these if you believe they are true
*.resx
filesRelease Notes
Fixing missing nodes in Lucene nodes search
Reviewers
@QilongTang @reddyashish
FYIs