-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of projects with large numbers of consecutive item updates without wildcards Fixes #5776 #5853
Conversation
…ards and perform them in mini-batches. Performing all item updates at the end breaks the ordering guarantee, both between update operations (if some included globs or wild cards) and between update operations and other operations. This retains that by pausing to perform the relevant updates between non-batchable updates and other operations.
@@ -16,7 +16,6 @@ | |||
using Shouldly; | |||
using InvalidProjectFileException = Microsoft.Build.Exceptions.InvalidProjectFileException; | |||
using Xunit; | |||
using System.Runtime.InteropServices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted to avoid merge conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general comment on PRs: it would be really helpful if you could describe how you did what you did in the PR description (as well as why). For this PR, I'd love to see explicit mention of
- That you changed the process to batch up consecutive Updates of the same item that don't have wildcards
- What changes you had to make to the lazy evaluator to identify and execute on those batches.
This does two things: it helps reviewers follow the changes, and it helps reviewers take a step back from the details of the change and think about whether there might be a better approach to solve the problem.
@@ -30,7 +30,7 @@ private abstract class LazyItemOperation : IItemOperation | |||
// This is used only when evaluating an expression, which instantiates | |||
// the items and then removes them | |||
protected readonly IItemFactory<I, I> _itemFactory; | |||
|
|||
internal ItemSpec<P, I> ISpec => _itemSpec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very strong convention in .NET that names like IWhatever
are interfaces.
internal ItemSpec<P, I> ISpec => _itemSpec; | |
internal ItemSpec<P, I> ItemSpec => _itemSpec; |
Or even just Spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried ItemSpec when I first introduced ISpec but quickly realized that's a type hence not reusable. I can change it to Spec.
@@ -151,15 +151,15 @@ public ItemData Clone(IItemFactory<I, I> itemFactory, ProjectItemElement initial | |||
|
|||
private class MemoizedOperation : IItemOperation | |||
{ | |||
public IItemOperation Operation { get; } | |||
public LazyItemOperation Operation { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially made the change when I wanted to batch different types of operations, since that would mean I could have a map from string to LazyItemOperation more easily. When I switched to only batching Update, I didn't see any reason to change it back. Is there one?
// Another update will already happen on this path. Make that happen before evaluating this one. | ||
addToBatch = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right approach, or should it be a dictionary of lists of operations? This basically falls back to the current behavior, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a dictionary of lists of operations would be asymptotically better, but with exactly 0 evidence, it feels unusual to have a large number of item updates to the same fragment in a row with nothing between them. Note that each time this happens, it resets, so we evaluate the current batch and move on; it doesn't prevent another mini-batch later. Having lists of operations would also be more complicated and noticeably increase allocations (a list for every fragment without a wildcard in an update operation), so I don't think it's worth it. I could do something in-between and immediately evaluate the mini-batch when I see a duplicate and add it to the next mini-batch, but I'm not sure it's worth it. (If I'm wrong about how common this sort of thing is, I'd be happy to make the change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning feels fine to me--leave as is.
@@ -342,15 +338,64 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte | |||
|
|||
ImmutableHashSet<string> currentGlobsToIgnore = globsToIgnoreStack == null ? globsToIgnore : globsToIgnoreStack.Peek(); | |||
|
|||
Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(FileUtilities.GetIsFileSystemCaseSensitive() ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? Is a check like this elsewhere what causes #5749, which seems to be a regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you mention it, I'm actually surprised this works. I put this in after this test failed, but I initially misread it as saying it should be case-sensitive on linux and case-insensitive on windows, and then it apparently passed. I should add another test to make this fail, then correct this line.
{ | ||
bool addToBatch = true; | ||
int i; | ||
for (i = 0; i < op.ISpec.Fragments.Count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fragments coming from multiple different elements, or a single one? The way I read it this applies to like <I Update="a;b;c*;d" />
so I don't see how it applies to <I Update="a" /><I Update="b" /><I Update="c*" /><I Update="d" />
. Misunderstanding on my part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right. It applies to <I Update="a;b;c*;d" />
, but then if there aren't any globs, it doesn't do anything else, just goes to the next operation. If the next operation also happens to be an Update and not have any globs, then the two are batched together in the dictionary. For your second case, this loop would run once for a
, then run for b
, then try to run for c*
but realize it has a glob, quickly do the update for (a and b), do the update for c*, then proceed with d, possibly doing the update immediately or including it in a future batch depending on if there are other Updates without wildcards after it.
@@ -97,6 +97,7 @@ internal static class MSBuildConstants | |||
internal static readonly char[] ForwardSlash = { '/' }; | |||
internal static readonly char[] ForwardSlashBackslash = { '/', '\\' }; | |||
internal static readonly char[] WildcardChars = { '*', '?' }; | |||
internal static readonly string[] CharactersForExpansion = { "*", "?", "$(", "@(", "%" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels off to me but I can't articulate why so it's probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be more robust to explicitly (try to) expand every item spec fragment and see if it changed from the original, but at that point, I'm doing a nontrivial amount of the work for the whole operation, which felt undesirable from a perf perspective.
This isn't the cleanest change; if it works, I should change it.
@@ -338,7 +338,7 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte | |||
|
|||
ImmutableHashSet<string> currentGlobsToIgnore = globsToIgnoreStack == null ? globsToIgnore : globsToIgnoreStack.Peek(); | |||
|
|||
Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(FileUtilities.GetIsFileSystemCaseSensitive() ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase); | |||
Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(StringComparer.OrdinalIgnoreCase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can possibly break linux here. If on disk, you have both a
and A
, this will break matching them properly there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it should be case-insensitive on any platform. There's a bug about that here.
@@ -15,6 +15,9 @@ internal partial class LazyItemEvaluator<P, I, M, D> | |||
class UpdateOperation : LazyItemOperation | |||
{ | |||
private readonly ImmutableList<ProjectMetadataElement> _metadata; | |||
private ImmutableList<ItemBatchingContext>.Builder _itemsToUpdate = null; | |||
private ItemSpecMatchesItem _matchItemSpec = null; | |||
private bool? _needToExpandMetadataForEachItem = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking my own review of this file, it looks like the only changes made here are caching the items needed to update and reorganizing some code into functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: +10 points for general code cleanup of legacy code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, pretty much. I should note that I didn't do the caching part because of expected perf gains but because I wanted to access these from different functions.
// Remove items added before realizing we couldn't skip the item list | ||
for (int j = 0; j < i; j++) | ||
{ | ||
itemsWithNoWildcards.Remove(currentList._memoizedOperation.Operation.Spec.Fragments[j].TextFragment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of trouble parsing why this is needed. Could you explain this with a small example of what happens here? You iterate through all fragments until you find one that's already going to be updated, then remove everything that came before it? I think my issue is I can't picture what exactly these fragments are. ie. is **
a fragment, is a filename a fragment, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a (minor) perf optimization and also a minor correctness fix. There can be lists of fragments with some having wildcards and some not having wildcards. If I see some without wildcards first, I add them to a larger batch to be processed later. The normal UpdateOperation, when applied, includes all fragments, so if we added some to a batch to be processed later, then discover we can't add the rest to a batch, we'll have to do all the fragments in that item anyway when we apply it. (I haven't added anything to only do a partial update at this point—that would require a more complicated change to UpdateOperation.) That means first that there's no reason to apply the fragments that we added if we're about to do it anyway, and it's also best to avoid applying a single update twice. This removes the fragments we added to the batch from the item operation currently being processed that, it turns out, can't be processed as part of a batch. Does that make sense?
@@ -359,9 +404,26 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte | |||
currentList._memoizedOperation.Apply(items, currentGlobsToIgnore); | |||
} | |||
|
|||
ProcessNonWildCardItemUpdates(itemsWithNoWildcards, items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this call necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. If the last operation were an update operation without wildcards, omitting this would make it have no effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment saying as much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasizing this.
Co-authored-by: Ben Villalobos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's just me but there's a lot of information to take in with this PR, as this area of newer to me. Particularly in taking a step back and seeing the bigger picture here. Where in the build this happens, what data is being passed through here, and how some of these classes interact with each other. Adding some comments now would go a long way in helping others understand this area.
{ | ||
if (itemsWithNoWildcards.TryGetValue(FileUtilities.GetFullPath(items[i].Item.EvaluatedInclude, items[i].Item.ProjectDirectory), out UpdateOperation op)) | ||
{ | ||
items[i] = op.UpdateItem(items[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does applying the operation here prevent it from eventually getting called on line 404?
Some comments in the code would really make things easier to review, this area in particular seems to be lacking some comments. If you have a fresh understanding of some of these functions/classes (ie, MemoizedOperation, what currentList._memoizedOperation.Apply(items, currentGlobsToIgnore);
does vs op.UpdateItem(items[i])
and how they're similar/different they would be greatly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorta, but it's a little more complicated. (I'm sure that's exactly what you wanted to hear, right? 😄 )
If it's determined that the operation doesn't have things that expand like wildcards, it will hit the continue statement on line 386. That prevents line 404 from running for that operation. Additionally, at that point, it would have been "addedToBatch," meaning it's in the dictionary that will be passed to this function next time we determine that we can't just add more. This function should always be called if the dictionary is nonempty at such a point (or, in reality, at all other times, but it does ~nothing if it's empty), so the real dependency here is that if an item has no fragments with wildcards, and it's an update operation, it will eventually get here, and line 404 doesn't run, but they don't affect each other directly.
N.B.: One of the restrictions here is that it only affects UpdateOperations. That's mostly because that's the easy way to do it—they get replaced in a 1:1 way. Other operations (Remove) could be included as well, but I wanted to try to keep this fairly simple. I didn't see a good reason to make it more complicated without any obvious benefit.
Within an ItemGroup, you can have things like Any particular points you think need comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my suggestions here are about comments. Obviously you don't have full context as to all of the code here, so I'd focus on "correct, and just enough context" comments.
We should toss this into an exp/
branch to make sure this doesn't have any side effects.
return clonedData; | ||
} | ||
|
||
private void SetMatchItemSpec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a quick description here or at the call site. Preferably here.
{ | ||
var itemData = listBuilder[i]; | ||
|
||
var matchResult = _matchItemSpec(_itemSpec, itemData.Item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have while you're here: A comment here suggesting what's happening, ideally with a small example. Something like: "see if, for example, *.cs matches somefilename.cs" ( don't use this example 🙂)
DecorateItemsWithMetadata(_itemsToUpdate.ToImmutableList(), _metadata, _needToExpandMetadataForEachItem); | ||
} | ||
|
||
internal ItemData UpdateItem(ItemData item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely summarize this method since it's internal.
A very rough idea:
"Set the item spec, do X with _itemsToUpdate, and return the cloned matching item"
MatchResult matchResult = _matchItemSpec(_itemSpec, item.Item); | ||
if (matchResult.IsMatch) | ||
{ | ||
ItemData clonedData = UpdateItem(item, matchResult.CapturedItemsFromReferencedItemTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you call the private UpdateItem
method here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? That's what it's doing.
@@ -151,15 +151,15 @@ public ItemData Clone(IItemFactory<I, I> itemFactory, ProjectItemElement initial | |||
|
|||
private class MemoizedOperation : IItemOperation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a summary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's out of scope in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you gained enough context on this class I'd still suggest writing a summary, but it is out of scope. I don't feel too strongly about it.
@@ -315,13 +315,9 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte | |||
|
|||
// If this is a remove operation, then add any globs that will be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a high level comment summarizing what ComputeItems
does.
Because this is a beefier function, here's an idea. For the summary:
// Compute items does some stuff.
// 1. Does A because B
// 2. Does C because D
Then in the code, when A or C happens you can place another comment:
// 1. Here is some more detail on what A is.
and later when C is happening
//2. Perform C to do a thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasizing this comment.
{ | ||
bool addToBatch = true; | ||
int i; | ||
for (i = 0; i < op.Spec.Fragments.Count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a comment for an example of what a fragment would be here would be awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasizing this.
} | ||
if (!addToBatch) | ||
{ | ||
// Remove items added before realizing we couldn't skip the item list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a slightly more descriptive idea.
// Remove items added before realizing we couldn't skip the item list | |
// We found an item with a wildcard, remove all fragments from the current list so we don't process them yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emphasizing this one.
@@ -359,9 +404,26 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte | |||
currentList._memoizedOperation.Apply(items, currentGlobsToIgnore); | |||
} | |||
|
|||
ProcessNonWildCardItemUpdates(itemsWithNoWildcards, items); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a comment saying as much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I'd like to see an exp/
run on this if it hasn't hasn't already been through one.
{"m1", "eighth" } | ||
}; | ||
|
||
ObjectModelHelpers.AssertItemHasMetadata(expectedAfterFirsta, items[3]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should AssertItemHasMetadata have an overload that accepts a tuple name-value, so you can eliminate all these dictionaries in the test?
@@ -2842,6 +2976,25 @@ public void UpdateFromReferencedItemShouldBeCaseInsensitive() | |||
ObjectModelHelpers.AssertItemHasMetadata(expectedMetadataA, items[1]); | |||
} | |||
|
|||
[Fact] | |||
public void UpdateFromReferencedItemShouldBeCaseInsensitive2() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you name this more explicitly?
@@ -3180,28 +3333,15 @@ public void UpdateAndRemoveShouldUseCaseInsensitiveMatching() | |||
|
|||
IList<ProjectItem> items = ObjectModelHelpers.GetItemsFromFragment(content); | |||
|
|||
if (FileUtilities.GetIsFileSystemCaseSensitive()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this removal. Isn't it a behavior change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we decided this was best. See #5888 (comment)
// Another update will already happen on this path. Make that happen before evaluating this one. | ||
addToBatch = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning feels fine to me--leave as is.
eng/Packages.props
Outdated
@@ -9,7 +9,7 @@ | |||
<PackageReference Update="Microsoft.Net.Compilers.Toolset" Version="$(MicrosoftNetCompilersToolsetVersion)" /> | |||
<PackageReference Update="Microsoft.VisualStudio.SDK.EmbedInteropTypes" Version="15.0.15" /> | |||
<PackageReference Update="Microsoft.VisualStudio.Setup.Configuration.Interop" Version="1.16.30" /> | |||
<PackageReference Update="Microsoft.Win32.Registry" Version="4.3.0" /> | |||
<PackageReference Update="Microsoft.Win32.Registry" Version="4.6.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with this change?
This PR changed the process of applying consecutive UpdateOperations on distinct fragments without wildcards.
Previously, each one was applied individually, so if there were a large number of items and a large number of updates, this would take time proportional to the product of the two. Now, those update operations are batched together in a dictionary such that we only have to make one pass to apply all the update operations at once. In other words, time would then be proportional to the number of items but not the number of updates.
Note that this is very specific to update operations in sequence without wildcards or other characters indicating that we might need to expand them. Interleaving RemoveOperations, for instance, would effectively run the older, unoptimized code. This is because the UpdateOperations are added to the batch in sequence, but if an operation that cannot be added to the batch is next, it pauses to evaluate all the operations in the batch. This ensures we respect ordering of operations.
Updating the same fragment twice isn't permitted in a single batch because then we would have to keep track of whether the second update would overwrite the first or not, and if so, apply them in the correct order.
Time to add/delete a class decreased to 1-3 seconds, thus fixing #5776