Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DYN-5787 LuceneNET in the InCanvasSearch functionality #14013

Merged
merged 18 commits into from
May 25, 2023
Merged

DYN-5787 LuceneNET in the InCanvasSearch functionality #14013

merged 18 commits into from
May 25, 2023

Conversation

RobertGlobant20
Copy link
Contributor

Purpose

feat: Implementation of the Search functionality in the InCanvasSearch popup using Lucene.NET.
Most of the code is coming from the next Dynamo branch: https://github.com/DynamoDS/Dynamo/tree/AlternativeSearch I added code for indexing information for the code block (I had to add a default constructor for CodeBlockNodeModel), input and output nodes and also I didn't include the code related to indexing/search info related to packages or node documentation.

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated
  • This PR contains no files larger than 50 MB

Release Notes

Implementation of the Search functionality in the InCanvasSearch popup using Lucene.NET.

Reviewers

@QilongTang

FYIs

@reddyashish

…earch

feat: Implementation of the Search functionality in the InCanvasSearch popup using Lucene.NE.
Most of the code is coming from the next Dynamo branch:
https://github.com/DynamoDS/Dynamo/tree/AlternativeSearch
I added code for indexing information for the code block (I had to add a default constructor for CodeBlockNodeModel), input and output nodes and also I didn't include the code related to indexing/search info related to packages or node documentation.
@RobertGlobant20
Copy link
Contributor Author

RobertGlobant20 commented May 20, 2023

Screenshot showing differences between Library and InCanvasSearch (the Input, Output and Code Block nodes are duplicated due that are indexed two times but after LuceneNet is also used in the Library this behavior won't exist)
image

@RobertGlobant20
Copy link
Contributor Author

Screenshot showing differences between Library and InCanvasSearch
image

@QilongTang QilongTang added this to the 2.19.0 milestone May 22, 2023
var fullDoc = new TextField("Documentation", "", Field.Store.YES);

var pkgName = new TextField("PackageName", "", Field.Store.YES);
var pkgVer = new TextField("PackageVersion", "", Field.Store.YES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment to move these hard coded string to a different class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment, see commit: 9af2277

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can later maintain a key/value data structure, such as a Dictionary, to store field name with their value type, so that we won't have to worry about what type we used for each field.

@QilongTang QilongTang requested a review from zeusongit May 22, 2023 18:57
Added several comments and removed unused fields.
@QilongTang
Copy link
Contributor

QilongTang commented May 23, 2023

I see lots of regressions in the PR check

Error : Dynamo.Tests.AnalyticsTests.EventTrackingDisabled
03:44:41  SetUp : Lucene.Net.Store.LockObtainFailedException : Lock obtain timed out: NativeFSLock@C:\Users\****\AppData\Roaming\Dynamo\Dynamo Core\2.19\Index\write.lock: System.IO.IOException: The process cannot access the file because another process has locked a portion of the file.
03:44:41  
03:44:41     at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
03:44:41     at System.IO.FileStream.Lock(Int64 position, Int64 length)
03:44:41     at Lucene.Net.Store.NativeFSLock.Obtain()
03:44:42    ----> System.IO.IOException : The process cannot access the file because another process has locked a portion of the file.
 TearDown : System.NullReferenceException : Object reference not set to an instance of an object.
03:44:42     at Lucene.Net.Store.Lock.Obtain(Int64 lockWaitTimeout)
03:44:42     at Lucene.Net.Index.IndexWriter..ctor(Directory d, IndexWriterConfig conf)
03:44:42     at Dynamo.Models.DynamoModel.InitializeLuceneConfig() in c:\WorkspaceDynamo\src\DynamoCore\Models\DynamoModel.cs:line 640
03:44:42     at Dynamo.Models.DynamoModel..ctor(IStartConfiguration config) in c:\WorkspaceDynamo\src\DynamoCore\Models\DynamoModel.cs:line 956
03:44:42     at Dynamo.Models.DynamoModel.Start(IStartConfiguration configuration) in c:\WorkspaceDynamo\src\DynamoCore\Models\DynamoModel.cs:line 613
03:44:42     at Dynamo.DynamoModelTestBase.StartDynamo(IPreferences settings) in c:\WorkspaceDynamo\test\DynamoCoreTests\DynamoModelTestBase.cs:line 75
03:44:42     at Dynamo.Tests.AnalyticsTests.Setup() in c:\WorkspaceDynamo\test\DynamoCoreTests\AnalyticsTests.cs:line 55
03:44:42  --IOException
03:44:42     at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
03:44:42     at System.IO.FileStream.Lock(Int64 position, Int64 length)
03:44:42     at Lucene.Net.Store.NativeFSLock.Obtain()
03:44:42  --TearDown
03:44:42     at Dynamo.Tests.AnalyticsTests.Cleanup() in c:\WorkspaceDynamo\test\DynamoCoreTests\AnalyticsTests.cs:line 67

Looks like the tests are running in parallel to try to index nodes and write to the same location, we might want to disable this function when under test mode and write dedicated test for index if needed. FYI: @mjkkirschner


private void InitializeLuceneConfig()
{
addedFields = new List<string>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip these during TestMode I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we skip indexing during tests, how do we want to handle the search tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking creating dedicated tests for index and search in a single test or try to make it work when running in parallel? The current setup will not work for sure, unfortunately..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I checked that there are 7 calls to the SearchViewModel.Search in the LibraryTests.
An option could be adding another Search method in SearchViewModel that will be calling the SearchDictionary.Search (as previously) or try to call the SearchDictionary.Search directly in the tests.
image

@QilongTang
Copy link
Contributor

QilongTang commented May 23, 2023

@zeusongit There should be new dlls from this PR, how should dev be notified? If there are new files, should we mark a different state for the PR check?

image

@QilongTang
Copy link
Contributor

@RobertGlobant20 I also tested some edge cases in this fork but seems some regular nodes are not indexed correctly.

image

@zeusongit
Copy link
Contributor

zeusongit commented May 23, 2023

@zeusongit There should be new dlls from this PR, how should dev be notified? If there are new files, should we mark a different state for the PR check?

image

But failing the check won't make sense if those changes are expected, but we also don't have any other option.

@QilongTang
Copy link
Contributor

@zeusongit There are ways to introduce warning, we dont have to fail the check but the current way is not informative enough IMHO. See something like https://stackoverflow.com/questions/74907704/is-there-a-github-actions-warning-state

@zeusongit
Copy link
Contributor

@zeusongit There are ways to introduce warning, we dont have to fail the check but the current way is not informative enough IMHO. See something like https://stackoverflow.com/questions/74907704/is-there-a-github-actions-warning-state

This looks good, will try to implement this!

Fix for adding missing nodes
Added some validation for preventing null values
Adding comments in the SetDocumentFieldValue
@RobertGlobant20
Copy link
Contributor Author

@RobertGlobant20 I also tested some edge cases in this fork but seems some regular nodes are not indexed correctly.

image

Fixed, please double check
commit: b5994fd

@QilongTang
Copy link
Contributor

Not sure if the PR check would pass now, I think the lock issue would still happen with all of the test initializing DynamoModel and indexing at the same time

RobertGlobant20 and others added 2 commits May 23, 2023 16:53
Fix for checking If app is in TestMode then we don't initialize LuceneNET.
Added the previous Search method due that is used by Unit Tests ( previously when running the parallel job that execute the tests in Jenkins is failing due that several processes are trying to create files in AppData folder), so now we will have the previous Search method and the Search method for Lucene.NET.
@QilongTang
Copy link
Contributor

@RobertGlobant20 I gave some updates to the PR and also trying to re-trigger the PR checks because the earlier build failed

@QilongTang
Copy link
Contributor

@RobertGlobant20 There are 4068 regressions reported still

@QilongTang
Copy link
Contributor

@reddyashish @zeusongit @RobertGlobant20 I think I figured it out, we are trying to initialize Lucene index writer every time as part of Dynamo initialization, and this causes a lock issue when running tests in parallel. We can do some search to optimize that process specifically after this PR but for now I added code to skip the writer initialization in TestMode, which would also mean Lucene search is disabled in Unit tests for now. We can off course write tests including Lucene initialization and test only within the Unit test. Let's look at best approach after this PR. I would like to get this in so we can unblock follow up work.

// TODO: make Lucene writer a singleton
if (!IsTestMode)
{
writer = new IndexWriter(indexDir, indexConfig);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reddyashish @RobertGlobant20 @zeusongit This line cant be run in parallel as unit tests

QilongTang and others added 8 commits May 24, 2023 14:30
Fixing regressions due that some test are failing.
Adding the Input and Output nodes are breaking the tests (due that we have two input and two output nodes), then removing the indexing of this nodes.

Other tests were failing due that writer?.Dispose() disposed the object but later (when opening the dyn file) is trying to index again the nodes and the writer instance was already disposed.
@QilongTang
Copy link
Contributor

QilongTang commented May 25, 2023

@RobertGlobant20 This should be finalized, PTAL
image

From debugging, a majority of the dlls added are also loaded when Dynamo opened.

@QilongTang QilongTang merged commit bcad251 into DynamoDS:master May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants