-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize item Remove operations #6716
Conversation
/// as long as practical. If an operation would result in too much of such work, the dictionary is simply dropped and recreated | ||
/// later if/when needed. | ||
/// </remarks> | ||
public void RemoveMatchingItems(ItemSpec<P, I> itemSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better design-wise to ask the ItemSpec to compute its intersection with a given OrderedItemDataCollection via a new IntersectWith(OrderedItemDataCollection)
instead of passing the ItemSpec to RemoveMatchingItems?
That way you:
- prevent leaking the implementation detail that some itemspec fragments do not support enumeration
- can neatly cache normalized values of the itemspec values without leaking that implementation detail. This would be useful if itemspec values get repeatedly normalized (for example by repeatedly calling any of the IsMatch, MatchCount, or IntersectWith methods on the same ItemSpec instance)
- can reuse the IntersectWith operation to remove the same O(n^2) issue with the Update operation.
To support this, OrderedItemDataCollection (or the Builder) could provide a string indexer which would be the normalized key. Ideally you could even hide OrderedItemDataCollection behind some collection interface but maybe that's too much for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great suggestion. I think I had the code structured differently but hit a technical issue which had me settle for this mess. I'll wait for CI to start working again to see where I stand with unit tests and will then tackle this. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored it a bit as suggested and it is indeed nicer. For Update operation, it is possible that we can now revert the previous optimization (#5853) and achieve the same using the new datastructure but I'd like to try this separately later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with this, but it's late for me, so I don't want to approve.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Co-authored-by: Marius Ungureanu <[email protected]>
Co-authored-by: Marius Ungureanu <[email protected]>
Co-authored-by: Marius Ungureanu <[email protected]>
This reverts commit efb5e4e.
This reverts commit 813de90.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little curious how few things you would have to remove for it to be not worth it to make this ordered collection. Probably not important, since it would be fast either way. Thanks!
src/Build/Evaluation/LazyItemEvaluator.OrderedItemDataCollection.cs
Outdated
Show resolved
Hide resolved
|
||
public void RemoveAll(ICollection<I> itemsToRemove) | ||
{ | ||
_listBuilder.RemoveAll(item => itemsToRemove.Contains(item.Item)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own benefit—Except would look pretty here, but it would both remove all the items in itemsToRemove and remove duplicates, possibly reordering the list, so it is not ideal.
src/Build/Evaluation/LazyItemEvaluator.OrderedItemDataCollection.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Forgind <[email protected]>
The remove operation is dominated by normalization of item values. This will run on the first remove either way. The extra overhead of building the dictionary should be relatively small and I'd be surprised if two removes were not already faster. I will measure and confirm, it's a fair question. |
There is a measurable regression in "small scale" removes, just as you suspected @Forgind. It slows down evaluation of a Hello World .NET Core project by a few percent. I will profile and make changes to address this. |
I've made the following changes:
I have updated the description with fresh measurements. There is a positive impact on evaluation time even if the dictionary is not used, due to other smaller optimizations, primarily caching the normalized value on |
// Historically we've used slightly different normalization logic depending on the type of matching | ||
// performed in IsMatchNormalized. We have to keep doing it for compat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a bit more color on the types of scenarios where the difference is observable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to unify the code to use the same path normalization function for pattern matching and for single path matching resulted in unit test failures for me. I can revisit if you want but generally I treat this as something that should not be changed just for code cleanliness, even if it affects only corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
Fixes #6696
Context
Removing items by referencing other item types or by multiple single-item Remove operations has bad performance characteristics because it leads to N*M comparisons, each involving path normalization of both sides. For projects with as few as small thousands of items, evaluating them becomes a bottleneck with direct impact on build time and VS responsiveness.
Changes Made
Replaced
ImmutableList<ItemData>
with a new collection typeOrderedItemDataCollection
to be used when representing the intermediate list of items. In addition to being a list, the new collection also lazily creates a dictionary of items keyed by normalized paths, which allows for fast matching.Testing
Functional:
Performance:
Used a benchmark project with 2000 files in a subdirectory.
Evaluation time of:
went from 7300 ms down to 180 ms.
Evaluation time of:
went from 7900 ms down to 330 ms.
Evaluation time of a Hello World .NET Core project went got slightly faster as well (520 -> 515 ms, note that this was without NGEN so most of it is JITting).
Notes
I opted for a rather simple implementation of
OrderedItemDataCollection
based onImmutableList
andDictionary
. It has limitations but seems to work well for the scenarios of interest. It should be possible to implement a completely custom data structure for even better results (requirements: immutable, maintains insertion order for enumeration).