-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DYN-5975 Indexing Package Nodes. #14120
DYN-5975 Indexing Package Nodes. #14120
Conversation
This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I request some re-arrange of code
Code Review changes related to create methods for commit the changes and dispose the Lucene writer, so we can reuse the same code in several places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one comment
@@ -1416,6 +1419,16 @@ private void InitializeCustomNodeManager() | |||
customNodeSearchRegistry.Add(info.FunctionId); | |||
var searchElement = new CustomNodeSearchElement(CustomNodeManager, info); | |||
SearchModel.Add(searchElement); | |||
|
|||
//Indexing node packages installed using PackageManagerSearch | |||
var iDoc = LuceneSearchUtility.InitializeIndexDocumentForNodes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After committing these changes, will it overwrite the old index files or create new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create new ones.
The only way to delete/update indexed files (Documents) is by using the functions:
writer.updateDocument()
writer.deleteDocument()
But this is not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why dont we use writer.updateDocument()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QilongTang
In this case we are adding new nodes added when installing a package, so right now we can say that we are representing each node as one document, then just supposing that if in total we have 800 out of the box nodes we will have 800 documents, then if a package is installed more nodes will be added so more documents needs to be added (using writer.AddDocument()).
I think writer.updateDocument() will be needed in case any node is renamed or change it's category (only in Dynamo execution time - I think this case is not possible right now), also for packages when a package info is updated, but again this only should be used during dynamo execution time (also I think this is not possible now).
Please let me know if I explain myself correclty othewise we can create a more detailed explanation of this.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @RobertGlobant20 good explanation. I am not sure if I confused myself, calling of the InitializeIndexDocumentForNodes()
will recreate the docs in memory and then we append new nodes and then commit, so the index files are overwritten, correct? I guess previously I imagined something like reading the written index files on disk and append, then re-write. Do you think that is not possible, I worry about the initialization performance cost only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QilongTang well, right now we are not using the files created in the index directory:
C:\Users\tellro\AppData\Roaming\Dynamo\Dynamo Core\2.19\Index
Lucene is just using them for searching purposes (use this files for fetch data quickly), if we want to use this files (as it should be) for reading the information indexed previously (and not indexing the info again) that's a whole different story, surely we will need a Jira task for implementing this changes.
But consider that we will face several challenges: if the index directory already exists, we will need to iterate all the documents (around 800) and check if something changed for each node (field: category, description, keywords, etc...) so depending of the analysis (if nodes were added, deleted or updated) we should execute the instructions:
- writer.addDocument()
- writer.updateDocument()
- writer.deleteDocument()
The same for package and node packages, if new packages were installed we will continue to use writer.addDocument() but if the package was deleted manually then we should execute writer.deleteDocument().
I just tested the behavior in a external test app, I ran the app and indexed the info, then closed the app the re-open it so I was able to read all the info (previously indexed) using DirectoryReader.Open(index) and the results.
Also I think we should explore the functionality of using ByteBuffersDirectory(RAMDirectory) instead of using FSDirectory (File System Directory), we are using just 800 nodes with 4 or 5 fields that is very few information (the index directory weight is 688 KB ) so I don't think it will impact too much the RAM memory performance.
Sorry, I was not aware of this functionality, please let me know what you think about it. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can make another task for it, what you described would be the way I preferred
Fixing comment
@RobertGlobant20 Please check if the reported regressions are real then merge |
Testing for Fixing Regressions
@QilongTang seems that is a problem with disposing the writer at DynamoModel level (in DynamoModel.Dispose() method) but if is disposed after indexing the info (as previously was done) then we cannot index more info unless we create the writer again, I will check in detail just to see if there is another solution. |
Some tests are failing in the parallels job due that DynamoModel.IsTestMode = false when running some tests. Then I'm catching the exception and setting the writer to null, as a consequence for this tests no info will be indexed and the search won't be returning anything.
@QilongTang I've made a fix that will be preventing that parallel tests send exceptions when trying to create the IndexWriter (after this fix all the tests are passing), basically the fix was adding a try-catch statement when creating the IndexWriter and if the catch section is reached we dispose the writer and set the writer to null. |
hi @RobertGlobant20 I think this is fine for now |
Purpose
Indexing installed package nodes using LuceneNET
This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.
Declarations
Check these if you believe they are true
*.resx
filesRelease Notes
Indexing installed package nodes using LuceneNET
Reviewers
@QilongTang @reddyashish
FYIs