DYN-6061 search missing nodes #14258

RobertGlobant20 · 2023-08-14T18:32:02Z

Purpose

Fixing missing nodes in Lucene nodes search
Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term.
Also I've added a Unit Test that validates if a set of specific nodes are found or not.

Declarations

Check these if you believe they are true

The codebase is in a better state after this PR
Is documented according to the standards
The level of testing this PR includes is appropriate
User facing strings, if any, are extracted into *.resx files
All tests pass using the self-service CI.
Snapshot of UI changes, if any.
Changes to the API follow Semantic Versioning and are documented in the API Changes document.
This PR modifies some build requirements and the readme is updated
This PR contains no files larger than 50 MB

Release Notes

Fixing missing nodes in Lucene nodes search

Reviewers

@QilongTang @reddyashish

FYIs

Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term. Also I've added a Unit Test that validates if a set of specific nodes are found or not.

RobertGlobant20 · 2023-08-14T18:32:43Z

Screenshot showing the result before my fix and after my fix:

QilongTang · 2023-08-14T20:18:53Z

src/DynamoCore/Utilities/LuceneSearchUtility.cs

                case "pt-BR":
                    return new BrazilianAnalyzer(LuceneConfig.LuceneNetVersion);
                case "ru-RU":
                    return new RussianAnalyzer(LuceneConfig.LuceneNetVersion);
                default:
-                    return new StandardAnalyzer(LuceneConfig.LuceneNetVersion);
+                    return new LuceneCustomAnalyzer(LuceneConfig.LuceneNetVersion);


hi @RobertGlobant20 Why do we apply LuceneCustomAnalyzer only on certain cases? How about other languages?

This problem is happening due that we are using the StandarAnalyzer (this use english by default), that removes some the english words like "a", "an", "and", "are", "as", "at", "be", "but", "by", "And", "Or", "If" and also some specific symbols like "+", "*", so when we execute a query those words/symbols are removed (or escaped) and the node won't listed in the results.

Not sure if for other languages this change will apply or not but we can test if is needed (e.g. if the language is Chinese Simplified probably will remove “并且”、“或者”、“如果” and I guess that the nodes "And", "Or", "If" are not translated (do they?) so there is not problem but we need to check with "+", "-", "*" nodes.

For doing this test we need to switch to Dynamo Chinese language and then create a package with several nodes in Chinese and then search the nodes "+", "And", "*" and also for the nodes in "Chinese", I will test this case tomorrow to see the behavior.

Let me know your thoughts about it.
Thanks

Sure, make sense

QilongTang · 2023-08-15T15:23:58Z

@RobertGlobant20 Would you cherry-pick this?

Due that we were using a StandardAnalyzer for Lucene searching we were not able to find some nodes in the results like "+", "*", "And". So I had to implement a Custom Analyzer so we will be using a specific Tokenizer that supports those special characters in the search term. Also I've added a Unit Test that validates if a set of specific nodes are found or not.

RobertGlobant20 added 2 commits August 14, 2023 11:58

Merge branch 'master' into DYN-6061-Search-MissingNodes

53a90b0

RobertGlobant20 requested review from QilongTang and reddyashish August 14, 2023 18:32

QilongTang added this to the 2.19.0 milestone Aug 14, 2023

QilongTang reviewed Aug 14, 2023

View reviewed changes

QilongTang approved these changes Aug 15, 2023

View reviewed changes

QilongTang merged commit 06beb16 into DynamoDS:master Aug 15, 2023

RobertGlobant20 mentioned this pull request Aug 15, 2023

Cherry-Pick DYN-6061 search missing nodes #14264

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DYN-6061 search missing nodes #14258

DYN-6061 search missing nodes #14258

RobertGlobant20 commented Aug 14, 2023

RobertGlobant20 commented Aug 14, 2023

QilongTang Aug 14, 2023

RobertGlobant20 Aug 15, 2023

QilongTang Aug 15, 2023

QilongTang commented Aug 15, 2023

DYN-6061 search missing nodes #14258

DYN-6061 search missing nodes #14258

Conversation

RobertGlobant20 commented Aug 14, 2023

Purpose

Declarations

Release Notes

Reviewers

FYIs

RobertGlobant20 commented Aug 14, 2023

QilongTang Aug 14, 2023

Choose a reason for hiding this comment

RobertGlobant20 Aug 15, 2023

Choose a reason for hiding this comment

QilongTang Aug 15, 2023

Choose a reason for hiding this comment

QilongTang commented Aug 15, 2023