Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DYN-5975 Indexing Package Nodes. #14120

Merged
merged 5 commits into from
Jul 5, 2023
Merged

DYN-5975 Indexing Package Nodes. #14120

merged 5 commits into from
Jul 5, 2023

Conversation

RobertGlobant20
Copy link
Contributor

Purpose

Indexing installed package nodes using LuceneNET
This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated
  • This PR contains no files larger than 50 MB

Release Notes

Indexing installed package nodes using LuceneNET

Reviewers

@QilongTang @reddyashish

FYIs

This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.
@RobertGlobant20
Copy link
Contributor Author

GIF showing that after installing a package the nodes are available in InCanvasSearch.
IndexingPackageNodes

@RobertGlobant20
Copy link
Contributor Author

GIF showing that when starting Dynamo all the node package are loaded and shown in the InCanvasSearch.
IndexingPackageNodes2

@QilongTang QilongTang added this to the 2.19.0 milestone Jun 29, 2023
Copy link
Contributor

@QilongTang QilongTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I request some re-arrange of code

Code Review changes related to create methods for commit the changes and dispose the Lucene writer, so we can reuse the same code in several places.
Copy link
Contributor

@QilongTang QilongTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment

@@ -1416,6 +1419,16 @@ private void InitializeCustomNodeManager()
customNodeSearchRegistry.Add(info.FunctionId);
var searchElement = new CustomNodeSearchElement(CustomNodeManager, info);
SearchModel.Add(searchElement);

//Indexing node packages installed using PackageManagerSearch
var iDoc = LuceneSearchUtility.InitializeIndexDocumentForNodes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After committing these changes, will it overwrite the old index files or create new ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create new ones.
The only way to delete/update indexed files (Documents) is by using the functions:

writer.updateDocument()
writer.deleteDocument()

But this is not the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dont we use writer.updateDocument()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QilongTang
In this case we are adding new nodes added when installing a package, so right now we can say that we are representing each node as one document, then just supposing that if in total we have 800 out of the box nodes we will have 800 documents, then if a package is installed more nodes will be added so more documents needs to be added (using writer.AddDocument()).
I think writer.updateDocument() will be needed in case any node is renamed or change it's category (only in Dynamo execution time - I think this case is not possible right now), also for packages when a package info is updated, but again this only should be used during dynamo execution time (also I think this is not possible now).
Please let me know if I explain myself correclty othewise we can create a more detailed explanation of this.
Thanks

Copy link
Contributor

@QilongTang QilongTang Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @RobertGlobant20 good explanation. I am not sure if I confused myself, calling of the InitializeIndexDocumentForNodes() will recreate the docs in memory and then we append new nodes and then commit, so the index files are overwritten, correct? I guess previously I imagined something like reading the written index files on disk and append, then re-write. Do you think that is not possible, I worry about the initialization performance cost only

Copy link
Contributor Author

@RobertGlobant20 RobertGlobant20 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QilongTang well, right now we are not using the files created in the index directory:
C:\Users\tellro\AppData\Roaming\Dynamo\Dynamo Core\2.19\Index
Lucene is just using them for searching purposes (use this files for fetch data quickly), if we want to use this files (as it should be) for reading the information indexed previously (and not indexing the info again) that's a whole different story, surely we will need a Jira task for implementing this changes.
But consider that we will face several challenges: if the index directory already exists, we will need to iterate all the documents (around 800) and check if something changed for each node (field: category, description, keywords, etc...) so depending of the analysis (if nodes were added, deleted or updated) we should execute the instructions:

  • writer.addDocument()
  • writer.updateDocument()
  • writer.deleteDocument()

The same for package and node packages, if new packages were installed we will continue to use writer.addDocument() but if the package was deleted manually then we should execute writer.deleteDocument().

I just tested the behavior in a external test app, I ran the app and indexed the info, then closed the app the re-open it so I was able to read all the info (previously indexed) using DirectoryReader.Open(index) and the results.

Also I think we should explore the functionality of using ByteBuffersDirectory(RAMDirectory) instead of using FSDirectory (File System Directory), we are using just 800 nodes with 4 or 5 fields that is very few information (the index directory weight is 688 KB ) so I don't think it will impact too much the RAM memory performance.

Sorry, I was not aware of this functionality, please let me know what you think about it. Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can make another task for it, what you described would be the way I preferred

@QilongTang
Copy link
Contributor

@RobertGlobant20 Please check if the reported regressions are real then merge

Testing for Fixing Regressions
@RobertGlobant20
Copy link
Contributor Author

@RobertGlobant20 Please check if the reported regressions are real then merge

@QilongTang seems that is a problem with disposing the writer at DynamoModel level (in DynamoModel.Dispose() method) but if is disposed after indexing the info (as previously was done) then we cannot index more info unless we create the writer again, I will check in detail just to see if there is another solution.

Some tests are failing in the parallels job due that DynamoModel.IsTestMode = false when running some tests.
Then I'm catching the exception and setting the writer to null, as a consequence for this tests no info will be indexed and the search won't be returning anything.
@RobertGlobant20
Copy link
Contributor Author

@RobertGlobant20 Please check if the reported regressions are real then merge

@QilongTang seems that is a problem with disposing the writer at DynamoModel level (in DynamoModel.Dispose() method) but if is disposed after indexing the info (as previously was done) then we cannot index more info unless we create the writer again, I will check in detail just to see if there is another solution.

@QilongTang I've made a fix that will be preventing that parallel tests send exceptions when trying to create the IndexWriter (after this fix all the tests are passing), basically the fix was adding a try-catch statement when creating the IndexWriter and if the catch section is reached we dispose the writer and set the writer to null.
So with this fix all the parallel tests that set the DynamoModel.IsTestMode = false (when running tests) will not have the nodes indexed info and the search results will not retrieve the results.
Please let me know if this fix sounds good to you or if we should explore another solution. Thanks

@QilongTang
Copy link
Contributor

hi @RobertGlobant20 I think this is fine for now

@QilongTang QilongTang merged commit 46c628c into DynamoDS:master Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants