Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DYN-5806 lucene node autocomplete #14169

Merged
merged 10 commits into from
Jul 21, 2023
Merged

DYN-5806 lucene node autocomplete #14169

merged 10 commits into from
Jul 21, 2023

Conversation

RobertGlobant20
Copy link
Contributor

Purpose

Implementing the Lucene Search in Node Autocomplete Search funcionality
Added functionality for enabling the use of Lucene Search in node autocomplete, I've modified the LuceneSearchUtility class for supporting in-memory indexing.
Also I did changes for indexing (in memory RAMDirectory) a subset of nodes every time that the node autocomplete search functionality is executed so the search will be executed using only the subset of nodes indexed (and not over all the 3000 nodes).

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated
  • This PR contains no files larger than 50 MB

Release Notes

Implementing the Lucene Search in Node Autocomplete Search funcionality

Reviewers

@QilongTang @reddyashish

FYIs

Added functionality for enabling the use of Lucene Search in node autocomplete, also I've modified the LuceneSearchUtility class for supporting in-memory indexing.
Changes for indexing (in memory RAMDirectory) a subset of nodes every time that the node autocomplete search functionality is executed so the search will be executed using only the subset of nodes indexed (and not over all the 3000 nodes).
@RobertGlobant20
Copy link
Contributor Author

This is a GIF showing Lucene Search in Node Autocomplete.
LuceneNodeAutocomplete

Adding brackets in else statement and removing the Distinct() due that all the nodes should be different then there is no need to use this function
Copy link
Contributor

@QilongTang QilongTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one more comment

@QilongTang QilongTang added this to the 2.19.0 milestone Jul 18, 2023
Rolling back the code for SearchViewModel.Search() due that tests are using this method (Lucene indexing is not done when tests are executed - IsTestMode = true).
Also I've modified the part in which Node Autocomplete is initialized so it will calling the Lucene search to initialize the results.
Removing unnecessary Distinct()
@dynamo-ci-user
Copy link

@RobertGlobant20 @zeusongit Any idea why the BinDiff job would fail?

@zeusongit
Copy link
Contributor

zeusongit commented Jul 18, 2023

@RobertGlobant20 @zeusongit Any idea why the BinDiff job would fail?

The job fails when only a part of it is re-run, because it depends on the attempt ID, re running all the the jobs in that workflow fixed it.

btw @QilongTang is that you? ;)

@QilongTang
Copy link
Contributor

hi @RobertGlobant20 I think there are a lot regressions from this PR so you need to take another look of the edge cases causing a crash I think

Trying to fix regressions generated when trying to initialize NodeAutocomplete with Lucene Search.
@QilongTang
Copy link
Contributor

I've added an extra validation for checking if the StorageType is RAMDirectory or FSDirectory otherwise it won't do things like create the indexWriter or get a new Lucene Document (due that previously were skipped if IsTestMode is True).
@RobertGlobant20
Copy link
Contributor Author

RobertGlobant20 commented Jul 19, 2023

Hi @RobertGlobant20 One regression: DynamoCoreWpfTests.NodeAutoCompleteSearchTests.SearchNodeAutocompletionSuggestions

it was failing due that when IsTestMode = True several things are not initialized but because for Node Autocomplete we are using RAMDirectory for indexing then doesn't matter the isTestMode flag.
Fix applied in the next commit: 3ad6884

@RobertGlobant20
Copy link
Contributor Author

hi @RobertGlobant20 I think there are a lot regressions from this PR so you need to take another look of the edge cases causing a crash I think

Please consider that all this errors were happening due that I was using the NodeAutocomplete Search implementation for pre-populating the candidates (we are only indexing a small subset of node in Memory) then the fix was using the normal Lucene Search which has all the nodes.

// Initialize Lucene index writer, unless in test mode.
if (!DynamoModel.IsTestMode)
// Initialize Lucene index writer, unless in test mode or we are using RAMDirectory for indexing info.
if (!DynamoModel.IsTestMode || currentStorageType == LuceneStorage.RAM)
Copy link
Contributor

@reddyashish reddyashish Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this condition currentStorageType == LuceneStorage.RAM here, when we are not using any files to index for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need the condition otherwise a specific test is failing.
Basically I've added this validation because in the test SearchNodeAutocompletionSuggestions() is doing th next actions:

  • Populate Node Autocomplete with candidates
  • Search the text "ar" using Node Autocomplete Search (over a small set of node and using Lucene Search with RAMDirectory).
  • Check that we are getting 5 results.

So if you check this piece of code is creating the writer then if IsTestMode is True (due that is executing the text) then the writer won't be created and when searching it will return null or crash (meaning that no results were found), the same case for the validation added in the method InitializeIndexDocumentForNodes(), if we don't check the currentStorageType variable then is always returning null and is crashing.

Just to clarify things: remember that with NodeAutocomplete indexing process we are using RAMDirectory (just for the subset of nodes) so the IsTestMode flag doesn't matter any more (there is no problem with concurrency and will be executing the search over the subset of nodes indexed), that's why we have to check for the currentStorageType variable.

At the end we will be removing all those conditions related to IsTestMode and currentStorageType but we need to refactor the code so that when tests are being executed then the indexing is done using RAMDirectory otherwise we will be using FSDirectory.

Please let me know if this explanation is clear or we need to check the implementation in detail

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think that matches my expectation, we should refactor the code soon for unit testing

@@ -248,7 +269,7 @@ internal string CreateSearchQuery(string[] fields, string SearchTerm)
internal void DisposeWriter()
{
//We need to check if we are not running Dynamo tests because otherwise parallel test start to fail when trying to write in the same Lucene directory location
if (!DynamoModel.IsTestMode)
if (!DynamoModel.IsTestMode || currentStorageType == LuceneStorage.RAM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is similar than above, with AutoComplete we are indexing and searching for both cases:

  • Dynamo execution
  • Tests execution
    so for Node Autocomplete the writer needs to be disposed (doesn't matter if we are in test mode or not)

@avidit avidit changed the title Dyn 5806 lucene node autocomplete DYN-5806 lucene node autocomplete Jul 19, 2023
re-triggering pullRequestValidation job
@RobertGlobant20
Copy link
Contributor Author

@QilongTang QilongTang merged commit 380bd8e into DynamoDS:master Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants