Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of projects with large numbers of consecutive item updates without wildcards Fixes #5776 #5853

Merged
merged 15 commits into from
Jan 9, 2021

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Nov 3, 2020

This PR changed the process of applying consecutive UpdateOperations on distinct fragments without wildcards.

Previously, each one was applied individually, so if there were a large number of items and a large number of updates, this would take time proportional to the product of the two. Now, those update operations are batched together in a dictionary such that we only have to make one pass to apply all the update operations at once. In other words, time would then be proportional to the number of items but not the number of updates.

Note that this is very specific to update operations in sequence without wildcards or other characters indicating that we might need to expand them. Interleaving RemoveOperations, for instance, would effectively run the older, unoptimized code. This is because the UpdateOperations are added to the batch in sequence, but if an operation that cannot be added to the batch is next, it pauses to evaluate all the operations in the batch. This ensures we respect ordering of operations.

Updating the same fragment twice isn't permitted in a single batch because then we would have to keep track of whether the second update would overwrite the first or not, and if so, apply them in the correct order.

Time to add/delete a class decreased to 1-3 seconds, thus fixing #5776

…ards and perform them in mini-batches.

Performing all item updates at the end breaks the ordering guarantee, both between update operations (if some included globs or wild cards) and between update operations and other operations. This retains that by pausing to perform the relevant updates between non-batchable updates and other operations.
Tests that mixing items with globs as they're updated either as a group of individually applies all updates correctly (even when mixed with Includes) and follows the order in which they're updated.
@@ -16,7 +16,6 @@
using Shouldly;
using InvalidProjectFileException = Microsoft.Build.Exceptions.InvalidProjectFileException;
using Xunit;
using System.Runtime.InteropServices;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted to avoid merge conflict.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment on PRs: it would be really helpful if you could describe how you did what you did in the PR description (as well as why). For this PR, I'd love to see explicit mention of

  • That you changed the process to batch up consecutive Updates of the same item that don't have wildcards
  • What changes you had to make to the lazy evaluator to identify and execute on those batches.

This does two things: it helps reviewers follow the changes, and it helps reviewers take a step back from the details of the change and think about whether there might be a better approach to solve the problem.

@@ -30,7 +30,7 @@ private abstract class LazyItemOperation : IItemOperation
// This is used only when evaluating an expression, which instantiates
// the items and then removes them
protected readonly IItemFactory<I, I> _itemFactory;

internal ItemSpec<P, I> ISpec => _itemSpec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very strong convention in .NET that names like IWhatever are interfaces.

Suggested change
internal ItemSpec<P, I> ISpec => _itemSpec;
internal ItemSpec<P, I> ItemSpec => _itemSpec;

Or even just Spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried ItemSpec when I first introduced ISpec but quickly realized that's a type hence not reusable. I can change it to Spec.

@@ -151,15 +151,15 @@ public ItemData Clone(IItemFactory<I, I> itemFactory, ProjectItemElement initial

private class MemoizedOperation : IItemOperation
{
public IItemOperation Operation { get; }
public LazyItemOperation Operation { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially made the change when I wanted to batch different types of operations, since that would mean I could have a map from string to LazyItemOperation more easily. When I switched to only batching Update, I didn't see any reason to change it back. Is there one?

Comment on lines +366 to +367
// Another update will already happen on this path. Make that happen before evaluating this one.
addToBatch = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right approach, or should it be a dictionary of lists of operations? This basically falls back to the current behavior, right?

Copy link
Member Author

@Forgind Forgind Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a dictionary of lists of operations would be asymptotically better, but with exactly 0 evidence, it feels unusual to have a large number of item updates to the same fragment in a row with nothing between them. Note that each time this happens, it resets, so we evaluate the current batch and move on; it doesn't prevent another mini-batch later. Having lists of operations would also be more complicated and noticeably increase allocations (a list for every fragment without a wildcard in an update operation), so I don't think it's worth it. I could do something in-between and immediately evaluate the mini-batch when I see a duplicate and add it to the next mini-batch, but I'm not sure it's worth it. (If I'm wrong about how common this sort of thing is, I'd be happy to make the change.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning feels fine to me--leave as is.

@@ -342,15 +338,64 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte

ImmutableHashSet<string> currentGlobsToIgnore = globsToIgnoreStack == null ? globsToIgnore : globsToIgnoreStack.Peek();

Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(FileUtilities.GetIsFileSystemCaseSensitive() ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Is a check like this elsewhere what causes #5749, which seems to be a regression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you mention it, I'm actually surprised this works. I put this in after this test failed, but I initially misread it as saying it should be case-sensitive on linux and case-insensitive on windows, and then it apparently passed. I should add another test to make this fail, then correct this line.

{
bool addToBatch = true;
int i;
for (i = 0; i < op.ISpec.Fragments.Count; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these fragments coming from multiple different elements, or a single one? The way I read it this applies to like <I Update="a;b;c*;d" /> so I don't see how it applies to <I Update="a" /><I Update="b" /><I Update="c*" /><I Update="d" />. Misunderstanding on my part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. It applies to <I Update="a;b;c*;d" />, but then if there aren't any globs, it doesn't do anything else, just goes to the next operation. If the next operation also happens to be an Update and not have any globs, then the two are batched together in the dictionary. For your second case, this loop would run once for a, then run for b, then try to run for c* but realize it has a glob, quickly do the update for (a and b), do the update for c*, then proceed with d, possibly doing the update immediately or including it in a future batch depending on if there are other Updates without wildcards after it.

@@ -97,6 +97,7 @@ internal static class MSBuildConstants
internal static readonly char[] ForwardSlash = { '/' };
internal static readonly char[] ForwardSlashBackslash = { '/', '\\' };
internal static readonly char[] WildcardChars = { '*', '?' };
internal static readonly string[] CharactersForExpansion = { "*", "?", "$(", "@(", "%" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels off to me but I can't articulate why so it's probably fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more robust to explicitly (try to) expand every item spec fragment and see if it changed from the original, but at that point, I'm doing a nontrivial amount of the work for the whole operation, which felt undesirable from a perf perspective.

This isn't the cleanest change; if it works, I should change it.
@@ -338,7 +338,7 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte

ImmutableHashSet<string> currentGlobsToIgnore = globsToIgnoreStack == null ? globsToIgnore : globsToIgnoreStack.Peek();

Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(FileUtilities.GetIsFileSystemCaseSensitive() ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase);
Dictionary<string, UpdateOperation> itemsWithNoWildcards = new Dictionary<string, UpdateOperation>(StringComparer.OrdinalIgnoreCase);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can possibly break linux here. If on disk, you have both a and A, this will break matching them properly there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be case-insensitive on any platform. There's a bug about that here.

@@ -15,6 +15,9 @@ internal partial class LazyItemEvaluator<P, I, M, D>
class UpdateOperation : LazyItemOperation
{
private readonly ImmutableList<ProjectMetadataElement> _metadata;
private ImmutableList<ItemBatchingContext>.Builder _itemsToUpdate = null;
private ItemSpecMatchesItem _matchItemSpec = null;
private bool? _needToExpandMetadataForEachItem = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking my own review of this file, it looks like the only changes made here are caching the items needed to update and reorganizing some code into functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: +10 points for general code cleanup of legacy code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, pretty much. I should note that I didn't do the caching part because of expected perf gains but because I wanted to access these from different functions.

// Remove items added before realizing we couldn't skip the item list
for (int j = 0; j < i; j++)
{
itemsWithNoWildcards.Remove(currentList._memoizedOperation.Operation.Spec.Fragments[j].TextFragment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a bit of trouble parsing why this is needed. Could you explain this with a small example of what happens here? You iterate through all fragments until you find one that's already going to be updated, then remove everything that came before it? I think my issue is I can't picture what exactly these fragments are. ie. is ** a fragment, is a filename a fragment, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a (minor) perf optimization and also a minor correctness fix. There can be lists of fragments with some having wildcards and some not having wildcards. If I see some without wildcards first, I add them to a larger batch to be processed later. The normal UpdateOperation, when applied, includes all fragments, so if we added some to a batch to be processed later, then discover we can't add the rest to a batch, we'll have to do all the fragments in that item anyway when we apply it. (I haven't added anything to only do a partial update at this point—that would require a more complicated change to UpdateOperation.) That means first that there's no reason to apply the fragments that we added if we're about to do it anyway, and it's also best to avoid applying a single update twice. This removes the fragments we added to the batch from the item operation currently being processed that, it turns out, can't be processed as part of a batch. Does that make sense?

@@ -359,9 +404,26 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte
currentList._memoizedOperation.Apply(items, currentGlobsToIgnore);
}

ProcessNonWildCardItemUpdates(itemsWithNoWildcards, items);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. If the last operation were an update operation without wildcards, omitting this would make it have no effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment saying as much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasizing this.

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's just me but there's a lot of information to take in with this PR, as this area of newer to me. Particularly in taking a step back and seeing the bigger picture here. Where in the build this happens, what data is being passed through here, and how some of these classes interact with each other. Adding some comments now would go a long way in helping others understand this area.

src/Build/Evaluation/LazyItemEvaluator.cs Show resolved Hide resolved
{
if (itemsWithNoWildcards.TryGetValue(FileUtilities.GetFullPath(items[i].Item.EvaluatedInclude, items[i].Item.ProjectDirectory), out UpdateOperation op))
{
items[i] = op.UpdateItem(items[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does applying the operation here prevent it from eventually getting called on line 404?

Some comments in the code would really make things easier to review, this area in particular seems to be lacking some comments. If you have a fresh understanding of some of these functions/classes (ie, MemoizedOperation, what currentList._memoizedOperation.Apply(items, currentGlobsToIgnore); does vs op.UpdateItem(items[i]) and how they're similar/different they would be greatly appreciated.

Copy link
Member Author

@Forgind Forgind Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorta, but it's a little more complicated. (I'm sure that's exactly what you wanted to hear, right? 😄 )

If it's determined that the operation doesn't have things that expand like wildcards, it will hit the continue statement on line 386. That prevents line 404 from running for that operation. Additionally, at that point, it would have been "addedToBatch," meaning it's in the dictionary that will be passed to this function next time we determine that we can't just add more. This function should always be called if the dictionary is nonempty at such a point (or, in reality, at all other times, but it does ~nothing if it's empty), so the real dependency here is that if an item has no fragments with wildcards, and it's an update operation, it will eventually get here, and line 404 doesn't run, but they don't affect each other directly.

N.B.: One of the restrictions here is that it only affects UpdateOperations. That's mostly because that's the easy way to do it—they get replaced in a 1:1 way. Other operations (Remove) could be included as well, but I wanted to try to keep this fairly simple. I didn't see a good reason to make it more complicated without any obvious benefit.

@Forgind
Copy link
Member Author

Forgind commented Nov 20, 2020

Within an ItemGroup, you can have things like <whatever Include="thing" />, and Include can be replaced by Remove or Update. This is at that step when figuring out, after putting them all together, what should be in the Item and what its metadata should be.

Any particular points you think need comments?

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my suggestions here are about comments. Obviously you don't have full context as to all of the code here, so I'd focus on "correct, and just enough context" comments.

We should toss this into an exp/ branch to make sure this doesn't have any side effects.

return clonedData;
}

private void SetMatchItemSpec()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a quick description here or at the call site. Preferably here.

{
var itemData = listBuilder[i];

var matchResult = _matchItemSpec(_itemSpec, itemData.Item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have while you're here: A comment here suggesting what's happening, ideally with a small example. Something like: "see if, for example, *.cs matches somefilename.cs" ( don't use this example 🙂)

DecorateItemsWithMetadata(_itemsToUpdate.ToImmutableList(), _metadata, _needToExpandMetadataForEachItem);
}

internal ItemData UpdateItem(ItemData item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely summarize this method since it's internal.
A very rough idea:
"Set the item spec, do X with _itemsToUpdate, and return the cloned matching item"

MatchResult matchResult = _matchItemSpec(_itemSpec, item.Item);
if (matchResult.IsMatch)
{
ItemData clonedData = UpdateItem(item, matchResult.CapturedItemsFromReferencedItemTypes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you call the private UpdateItem method here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? That's what it's doing.

@@ -151,15 +151,15 @@ public ItemData Clone(IItemFactory<I, I> itemFactory, ProjectItemElement initial

private class MemoizedOperation : IItemOperation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a summary here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's out of scope in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you gained enough context on this class I'd still suggest writing a summary, but it is out of scope. I don't feel too strongly about it.

@@ -315,13 +315,9 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte

// If this is a remove operation, then add any globs that will be removed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a high level comment summarizing what ComputeItems does.

Because this is a beefier function, here's an idea. For the summary:
// Compute items does some stuff.
// 1. Does A because B
// 2. Does C because D

Then in the code, when A or C happens you can place another comment:
// 1. Here is some more detail on what A is.

and later when C is happening
//2. Perform C to do a thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasizing this comment.

{
bool addToBatch = true;
int i;
for (i = 0; i < op.Spec.Fragments.Count; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment for an example of what a fragment would be here would be awesome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasizing this.

}
if (!addToBatch)
{
// Remove items added before realizing we couldn't skip the item list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a slightly more descriptive idea.

Suggested change
// Remove items added before realizing we couldn't skip the item list
// We found an item with a wildcard, remove all fragments from the current list so we don't process them yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasizing this one.

@@ -359,9 +404,26 @@ private static ImmutableList<ItemData>.Builder ComputeItems(LazyItemList lazyIte
currentList._memoizedOperation.Apply(items, currentGlobsToIgnore);
}

ProcessNonWildCardItemUpdates(itemsWithNoWildcards, items);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment saying as much.

src/Build/Evaluation/LazyItemEvaluator.cs Outdated Show resolved Hide resolved
src/Build/Evaluation/LazyItemEvaluator.cs Show resolved Hide resolved
src/Build/Evaluation/LazyItemEvaluator.cs Show resolved Hide resolved
src/Build/Evaluation/LazyItemEvaluator.cs Outdated Show resolved Hide resolved
Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd like to see an exp/ run on this if it hasn't hasn't already been through one.

{"m1", "eighth" }
};

ObjectModelHelpers.AssertItemHasMetadata(expectedAfterFirsta, items[3]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should AssertItemHasMetadata have an overload that accepts a tuple name-value, so you can eliminate all these dictionaries in the test?

@Forgind
Copy link
Member Author

Forgind commented Dec 30, 2020

@@ -2842,6 +2976,25 @@ public void UpdateFromReferencedItemShouldBeCaseInsensitive()
ObjectModelHelpers.AssertItemHasMetadata(expectedMetadataA, items[1]);
}

[Fact]
public void UpdateFromReferencedItemShouldBeCaseInsensitive2()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name this more explicitly?

@@ -3180,28 +3333,15 @@ public void UpdateAndRemoveShouldUseCaseInsensitiveMatching()

IList<ProjectItem> items = ObjectModelHelpers.GetItemsFromFragment(content);

if (FileUtilities.GetIsFileSystemCaseSensitive())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this removal. Isn't it a behavior change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we decided this was best. See #5888 (comment)

Comment on lines +366 to +367
// Another update will already happen on this path. Make that happen before evaluating this one.
addToBatch = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning feels fine to me--leave as is.

@@ -9,7 +9,7 @@
<PackageReference Update="Microsoft.Net.Compilers.Toolset" Version="$(MicrosoftNetCompilersToolsetVersion)" />
<PackageReference Update="Microsoft.VisualStudio.SDK.EmbedInteropTypes" Version="15.0.15" />
<PackageReference Update="Microsoft.VisualStudio.Setup.Configuration.Interop" Version="1.16.30" />
<PackageReference Update="Microsoft.Win32.Registry" Version="4.3.0" />
<PackageReference Update="Microsoft.Win32.Registry" Version="4.6.0" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this change?

@Forgind Forgind merged commit 74eb87c into dotnet:master Jan 9, 2021
@Forgind Forgind deleted the winform-perf branch January 9, 2021 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants