Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DYN-5806 lucene node autocomplete #14169

Merged
merged 10 commits into from
Jul 21, 2023
43 changes: 32 additions & 11 deletions src/DynamoCore/Utilities/LuceneSearchUtility.cs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,16 @@ internal class LuceneSearchUtility
internal Lucene.Net.Store.Directory indexDir;
internal IndexWriter writer;
internal string directory;
internal LuceneStorage currentStorageType;

public enum LuceneStorage
{
//Lucene Storage will be located in RAM and all the info indexed will be lost when Dynamo app is closed
RAM,

//Lucene Storage will be located in the local File System and the files will remain in ...AppData\Roaming\Dynamo\Dynamo Core\2.19\Index folder
FILE_SYSTEM
}

// Used for creating the StandardAnalyzer
internal Analyzer Analyzer;
Expand All @@ -36,23 +46,34 @@ internal LuceneSearchUtility(DynamoModel model)
/// <summary>
/// Initialize Lucene config file writer.
/// </summary>
internal void InitializeLuceneConfig(string dirName)
internal void InitializeLuceneConfig(string dirName, LuceneStorage storageType = LuceneStorage.FILE_SYSTEM)
{
addedFields = new List<string>();

DirectoryInfo webBrowserUserDataFolder;
DirectoryInfo luceneUserDataFolder;
var userDataDir = new DirectoryInfo(dynamoModel.PathManager.UserDataDirectory);
webBrowserUserDataFolder = userDataDir.Exists ? userDataDir : null;
luceneUserDataFolder = userDataDir.Exists ? userDataDir : null;

directory = dirName;
string indexPath = Path.Combine(webBrowserUserDataFolder.FullName, LuceneConfig.Index, dirName);
indexDir = Lucene.Net.Store.FSDirectory.Open(indexPath);
string indexPath = Path.Combine(luceneUserDataFolder.FullName, LuceneConfig.Index, dirName);

currentStorageType = storageType;

if (storageType == LuceneStorage.RAM)
{
indexDir = new RAMDirectory();
}
else
{
indexDir = FSDirectory.Open(indexPath);
QilongTang marked this conversation as resolved.
Show resolved Hide resolved
}


// Create an analyzer to process the text
Analyzer = new StandardAnalyzer(LuceneConfig.LuceneNetVersion);

// Initialize Lucene index writer, unless in test mode.
if (!DynamoModel.IsTestMode)
// Initialize Lucene index writer, unless in test mode or we are using RAMDirectory for indexing info.
if (!DynamoModel.IsTestMode || currentStorageType == LuceneStorage.RAM)
Copy link
Contributor

@reddyashish reddyashish Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this condition currentStorageType == LuceneStorage.RAM here, when we are not using any files to index for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need the condition otherwise a specific test is failing.
Basically I've added this validation because in the test SearchNodeAutocompletionSuggestions() is doing th next actions:

  • Populate Node Autocomplete with candidates
  • Search the text "ar" using Node Autocomplete Search (over a small set of node and using Lucene Search with RAMDirectory).
  • Check that we are getting 5 results.

So if you check this piece of code is creating the writer then if IsTestMode is True (due that is executing the text) then the writer won't be created and when searching it will return null or crash (meaning that no results were found), the same case for the validation added in the method InitializeIndexDocumentForNodes(), if we don't check the currentStorageType variable then is always returning null and is crashing.

Just to clarify things: remember that with NodeAutocomplete indexing process we are using RAMDirectory (just for the subset of nodes) so the IsTestMode flag doesn't matter any more (there is no problem with concurrency and will be executing the search over the subset of nodes indexed), that's why we have to check for the currentStorageType variable.

At the end we will be removing all those conditions related to IsTestMode and currentStorageType but we need to refactor the code so that when tests are being executed then the indexing is done using RAMDirectory otherwise we will be using FSDirectory.

Please let me know if this explanation is clear or we need to check the implementation in detail

Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I think that matches my expectation, we should refactor the code soon for unit testing

{
// Create an index writer
IndexWriterConfig indexConfig = new IndexWriterConfig(LuceneConfig.LuceneNetVersion, Analyzer)
Expand All @@ -77,7 +98,7 @@ internal void InitializeLuceneConfig(string dirName)
/// <returns></returns>
internal Document InitializeIndexDocumentForNodes()
{
if (DynamoModel.IsTestMode) return null;
if (DynamoModel.IsTestMode && currentStorageType == LuceneStorage.FILE_SYSTEM) return null;

var name = new TextField(nameof(LuceneConfig.NodeFieldsEnum.Name), string.Empty, Field.Store.YES);
var fullCategory = new TextField(nameof(LuceneConfig.NodeFieldsEnum.FullCategoryName), string.Empty, Field.Store.YES);
Expand Down Expand Up @@ -153,7 +174,7 @@ internal void SetDocumentFieldValue(Document doc, string field, string value, bo
((StringField)doc.GetField(field)).SetStringValue(value);
}

if (isLast && indexedFields.Any())
if (isLast && indexedFields != null && indexedFields.Any())
{
List<string> diff = indexedFields.Except(addedFields).ToList();
foreach (var d in diff)
Expand Down Expand Up @@ -248,7 +269,7 @@ internal string CreateSearchQuery(string[] fields, string SearchTerm)
internal void DisposeWriter()
{
//We need to check if we are not running Dynamo tests because otherwise parallel test start to fail when trying to write in the same Lucene directory location
if (!DynamoModel.IsTestMode)
if (!DynamoModel.IsTestMode || currentStorageType == LuceneStorage.RAM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is similar than above, with AutoComplete we are indexing and searching for both cases:

  • Dynamo execution
  • Tests execution
    so for Node Autocomplete the writer needs to be disposed (doesn't matter if we are in test mode or not)

{
writer?.Dispose();
writer = null;
Expand All @@ -257,7 +278,7 @@ internal void DisposeWriter()

internal void CommitWriterChanges()
{
if (!DynamoModel.IsTestMode)
if (!DynamoModel.IsTestMode || currentStorageType == LuceneStorage.RAM)
{
//Commit the info indexed
writer?.Commit();
Expand Down
103 changes: 100 additions & 3 deletions src/DynamoCoreWpf/ViewModels/Search/NodeAutoCompleteSearchViewModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
using Dynamo.Utilities;
using Dynamo.Wpf.ViewModels;
using Greg;
using Lucene.Net.Documents;
using Lucene.Net.QueryParsers.Classic;
using Lucene.Net.Search;
using Newtonsoft.Json;
using ProtoCore.AST.AssociativeAST;
using ProtoCore.Mirror;
Expand All @@ -39,6 +42,9 @@ public class NodeAutoCompleteSearchViewModel : SearchViewModel
private bool displayLowConfidence;
private const string nodeAutocompleteMLEndpoint = "MLNodeAutocomplete";

// Lucene search utility to perform indexing operations just for NodeAutocomplete.
internal LuceneSearchUtility LuceneSearchUtilityNodeAutocomplete { get; set; }

/// <summary>
/// The Node AutoComplete ML service version, this could be empty if user has not used ML way
/// </summary>
Expand Down Expand Up @@ -602,6 +608,61 @@ private NodeSearchElementViewModel GetViewModelForNodeSearchElement(NodeSearchEl
return null;
}


/// <summary>
/// Performs a search using the given string as query and subset, if provided.
/// </summary>
/// <returns> Returns a list with a maximum MaxNumSearchResults elements.</returns>
/// <param name="search"> The search query </param>
/// <param name="useLucene"> Temporary flag that will be used for searching using Lucene.NET </param>
internal IEnumerable<NodeSearchElementViewModel> SearchNodeAutocomplete(string search, bool useLucene)
{
if (useLucene)
{
//The DirectoryReader and IndexSearcher have to be assigned after commiting indexing changes and before executing the Searcher.Search() method, otherwise new indexed info won't be reflected
LuceneSearchUtilityNodeAutocomplete.dirReader = LuceneSearchUtilityNodeAutocomplete.writer?.GetReader(applyAllDeletes: true);
if (LuceneSearchUtilityNodeAutocomplete.dirReader == null) return null;

LuceneSearchUtilityNodeAutocomplete.Searcher = new IndexSearcher(LuceneSearchUtilityNodeAutocomplete.dirReader);

string searchTerm = search.Trim();
var candidates = new List<NodeSearchElementViewModel>();
var parser = new MultiFieldQueryParser(LuceneConfig.LuceneNetVersion, LuceneConfig.NodeIndexFields, LuceneSearchUtilityNodeAutocomplete.Analyzer)
{
AllowLeadingWildcard = true,
DefaultOperator = LuceneConfig.DefaultOperator,
FuzzyMinSim = LuceneConfig.MinimumSimilarity
};

Query query = parser.Parse(LuceneSearchUtilityNodeAutocomplete.CreateSearchQuery(LuceneConfig.NodeIndexFields, searchTerm));
TopDocs topDocs = LuceneSearchUtilityNodeAutocomplete.Searcher.Search(query, n: LuceneConfig.DefaultResultsCount);

for (int i = 0; i < topDocs.ScoreDocs.Length; i++)
{
// read back a Lucene doc from results
Document resultDoc = LuceneSearchUtilityNodeAutocomplete.Searcher.Doc(topDocs.ScoreDocs[i].Doc);

string name = resultDoc.Get(nameof(LuceneConfig.NodeFieldsEnum.Name));
string docName = resultDoc.Get(nameof(LuceneConfig.NodeFieldsEnum.DocName));
string cat = resultDoc.Get(nameof(LuceneConfig.NodeFieldsEnum.FullCategoryName));
string parameters = resultDoc.Get(nameof(LuceneConfig.NodeFieldsEnum.Parameters));


var foundNode = FindViewModelForNodeNameAndCategory(name, cat, parameters);
if (foundNode != null)
{
candidates.Add(foundNode);
}
}

return candidates;
}
else
{
return Search(search);
}
}

/// <summary>
/// Filters the matching node search elements based on user input in the search field.
/// </summary>
Expand All @@ -617,9 +678,25 @@ internal void SearchAutoCompleteCandidates(string input)
}
else
{
// Providing the saved search results to limit the scope of the query search.
// Then add back the ML info on filterted nodes as the Search function accepts elements of type NodeSearchElement
var foundNodes = Search(input, searchElementsCache.Select(x => x.Model));
LuceneSearchUtilityNodeAutocomplete = new LuceneSearchUtility(dynamoViewModel.Model);

//The dirName parameter doesn't matter because we are using RAMDirectory indexing and no files are created
LuceneSearchUtilityNodeAutocomplete.InitializeLuceneConfig(string.Empty, LuceneSearchUtility.LuceneStorage.RAM);

//Memory indexing process for Node Autocomplete (indexing just the nodes returned by the NodeAutocomplete service so we limit the scope of the query search)
foreach (var node in searchElementsCache.Select(x => x.Model))
{
var doc = LuceneSearchUtilityNodeAutocomplete.InitializeIndexDocumentForNodes();
AddNodeTypeToSearchIndex(node, doc);
}

//Write the Lucene documents to memory
LuceneSearchUtilityNodeAutocomplete.CommitWriterChanges();

var luceneResults = SearchNodeAutocomplete(input, true);
var foundNodesModels = luceneResults.Select(x => x.Model);
var foundNodes = foundNodesModels.Select(MakeNodeSearchElementVM);

var filteredSearchElements = new List<NodeSearchElementViewModel>();

foreach (var node in foundNodes)
Expand All @@ -635,10 +712,30 @@ internal void SearchAutoCompleteCandidates(string input)
}
}
FilteredResults = new List<NodeSearchElementViewModel>(filteredSearchElements).OrderBy(x => x.Name).ThenBy(x => x.Description);

LuceneSearchUtilityNodeAutocomplete.DisposeWriter();
}
}
}

/// <summary>
/// Add node information to Lucene index
/// </summary>
/// <param name="node">node info that will be indexed</param>
/// <param name="doc">Lucene document in which the node info will be indexed</param>
private void AddNodeTypeToSearchIndex(NodeSearchElement node, Document doc)
{
if (LuceneSearchUtilityNodeAutocomplete.addedFields == null) return;

LuceneSearchUtilityNodeAutocomplete.SetDocumentFieldValue(doc, nameof(LuceneConfig.NodeFieldsEnum.FullCategoryName), node.FullCategoryName);
LuceneSearchUtilityNodeAutocomplete.SetDocumentFieldValue(doc, nameof(LuceneConfig.NodeFieldsEnum.Name), node.Name);
LuceneSearchUtilityNodeAutocomplete.SetDocumentFieldValue(doc, nameof(LuceneConfig.NodeFieldsEnum.Description), node.Description);
if (node.SearchKeywords.Count > 0) LuceneSearchUtilityNodeAutocomplete.SetDocumentFieldValue(doc, nameof(LuceneConfig.NodeFieldsEnum.SearchKeywords), node.SearchKeywords.Aggregate((x, y) => x + " " + y), true, true);
LuceneSearchUtilityNodeAutocomplete.SetDocumentFieldValue(doc, nameof(LuceneConfig.NodeFieldsEnum.Parameters), node.Parameters ?? string.Empty);

LuceneSearchUtilityNodeAutocomplete.writer?.AddDocument(doc);
}

/// <summary>
/// Returns a collection of node search elements for nodes
/// that output a type compatible with the port type if it's an input port.
Expand Down
4 changes: 2 additions & 2 deletions src/DynamoCoreWpf/ViewModels/Search/SearchViewModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,7 @@ internal IEnumerable<NodeSearchElementViewModel> Search(string search, bool useL
/// <param name="nodeCategory">Full Category of the node</param>
/// <param name="parameters">Node input parameters</param>
/// <returns></returns>
private NodeSearchElementViewModel FindViewModelForNodeNameAndCategory(string nodeName, string nodeCategory, string parameters)
internal NodeSearchElementViewModel FindViewModelForNodeNameAndCategory(string nodeName, string nodeCategory, string parameters)
{
var result = Model.SearchEntries.Where(e => {
if (e.Name.Equals(nodeName) && e.FullCategoryName.Equals(nodeCategory))
Expand Down Expand Up @@ -1034,7 +1034,7 @@ private static IEnumerable<NodeSearchElementViewModel> GetVisibleSearchResults(N
}
}

private NodeSearchElementViewModel MakeNodeSearchElementVM(NodeSearchElement entry)
internal NodeSearchElementViewModel MakeNodeSearchElementVM(NodeSearchElement entry)
{
var element = entry as CustomNodeSearchElement;
var elementVM = element != null
Expand Down