Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Cache initial implementation #5936

Merged
merged 25 commits into from
Jan 15, 2021
Merged

Conversation

cdmihai
Copy link
Contributor

@cdmihai cdmihai commented Dec 7, 2020

Ready for review. Start with the documentation which should describe all the big changes here.

Todos:

  • Unit tests
  • RPS run
  • Create issue: an async LoggingService misses events logged by Plugin.EndBuildAsync when Plugin.EndBuildAsync also throws an exception

Issues addressed in future PRs:

  • constrain proxy target builds to inproc nodes, as a perf optimization.
  • parent plugin log events under the queried project's logging context so they show nicely in the binlog
  • add option to query the plugin against all graph nodes without building anything (cache warmup)
  • pipe MSBuildFileSystem to the plugin for both graph and non graph scenarios.

Since it's a bigger PR you might consider using CodeFlow to make it easier to review, it has a chrome extension: https://www.1eswiki.com/wiki/CodeFlow_integration_with_GitHub_Pull_Requests (msft internal link)

@cdmihai cdmihai added the WIP label Dec 7, 2020
@cdmihai cdmihai changed the title [WIP][Project cache] Initial implementation [Project cache] Initial implementation Dec 7, 2020
@cdmihai cdmihai changed the title [Project cache] Initial implementation Project Cache initial implementation Dec 7, 2020
CacheHit,
CacheMiss,
CacheNotApplicable,
CacheError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth keeping CacheError? I'm thinking of dropping it in favor of just checking PluginLoggerBase.HasLoggedErrors. If we keep both, what should the user or msbuild do if the plugin returns CacheError but no errors are logged? Seems simpler to just keep one mechanism for signaling errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, the way its mostly done is its success unless an error was logged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using PluginLoggerBase.HasLoggedErrors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K, I'll remove CacheError

Copy link
Contributor Author

@cdmihai cdmihai Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out CacheError may have a good use: when the plugin is queried for a project but it encounters and logs one or more errors, what should the plugin return? It could either return a null CacheResult or a CacheResult of result type CacheError (returning a CacheMiss or CacheNotApplicable doesn't seem a good fit for the error case). But then there's the above ambiguity when a plugin could return CacheError without logging any errors, which I guess could be fine.

An alternative is to remove the enum entirely. On cache hits the plugin returns a CacheResult with the build results (or proxy targets and whatnot). On anything else (cache miss, cache not applicable, cache error) the plugin returns a null CacheResult and logs why it couldn't satisfy the request. Worst case we get a plugin that returns null and does not log anything.

I am inclined now to keep CacheError in order to make the return modes explicit.

Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly more inclined to return null on any failure case. The main question in my mind is whether you'd react differently to CacheMiss, CacheNotApplicable, and CacheError, but the right response, from what I can tell, is to pretend the cache doesn't exist and build normally for all three cases. May as well just have a single "build everything" return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to just replace CacheError with a None value. Hit, Miss, and NA are domain level concepts that sort of define the meaning of a plugin so I like to have them explicit. Without them, we'd rely on arbitrary plugin implementations on how they express these 3 possibilities, versus MSBuild giving a standardized message for each type of response. Another concrete use case for having them is to enable MSBuild to report the cache hit ratio in the build summary (hits / (hits + misses), need NA to not count them as misses). That's a very useful metric for cacheability health (which, just like perf, is something that tends to degrade over time).

@@ -962,117 +1000,300 @@ internal void ExecuteSubmission(BuildSubmission submission, bool allowMainThread
ErrorUtilities.VerifyThrowArgumentNull(submission, nameof(submission));
ErrorUtilities.VerifyThrow(!submission.IsCompleted, "Submission already complete.");

lock (_syncLock)
if (ProjectCacheIsPresent())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start review here. This method is the entrypoint from where the logic starts diverting when a cache is available.

src/Build/Microsoft.Build.csproj Outdated Show resolved Hide resolved
src/Build/BackEnd/BuildManager/BuildParameters.cs Outdated Show resolved Hide resolved
src/Build/BackEnd/BuildManager/BuildSubmission.cs Outdated Show resolved Hide resolved
src/Build/BackEnd/Components/ProjectCache/CacheContext.cs Outdated Show resolved Hide resolved
CacheHit,
CacheMiss,
CacheNotApplicable,
CacheError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, the way its mostly done is its success unless an error was logged.

src/Build/BackEnd/Components/ProjectCache/CacheResult.cs Outdated Show resolved Hide resolved
benvillalobos
benvillalobos previously approved these changes Dec 23, 2020
Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP review

CacheHit,
CacheMiss,
CacheNotApplicable,
CacheError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using PluginLoggerBase.HasLoggedErrors

@@ -10,7 +10,9 @@
using Microsoft.Build.BackEnd;
using Microsoft.Build.Collections;
using Microsoft.Build.Evaluation;
using Microsoft.Build.Experimental.ProjectCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long do we plan on keeping this in the Experimental namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep it at least until we validate it working for two of our internal build accelerators.

})
.ToArray());

var cacheItems = nodeToCacheItems.Values.SelectMany(i => i).ToHashSet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of SelectMany here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why the conversion to an array then a hashset, can't the list of items be stored as a hashset to begin with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inconsequential, I think. Each node can declare zero or more plugins, so first I construct a dictionary from a node to the collection of plugins it declares. But the contract is that there can be a single plugin (path + plugin settings), and all nodes must declare that plugin. So in order to find that single plugin I flatten (SelectMany) the collections of plugins from each node into a single set (ProjectCacheItem implements Equals and GetHashCode to make this correct). I could have skipped the dictionary by flattening everything from the start, but I also want to give a nice error message with all the nodes that may be missing the plugin. And given that this method is not on a hot path, I went with a lot of LINQ :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow everything here. Wouldn't this be union rather than intersection, so wouldn't this find all the plugins declared by any node? How does this help you find the one from all of them? And I'm not sure how I follow how this turns into an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part constructs a dictionary from every node to all the declared plugins in that node:
https://github.com/cdmihai/msbuild/blob/acb4b7e2e0ffbd4a9df13c6fba11dd6aa5f37944/src/Build/BackEnd/BuildManager/BuildManager.cs#L1866-L1882

This part flattens (via SelectMany) all the declared plugins into a single set which removes the duplicates according to the overriden equals and hashcode in ProjectCacheItem.
https://github.com/cdmihai/msbuild/blob/acb4b7e2e0ffbd4a9df13c6fba11dd6aa5f37944/src/Build/BackEnd/BuildManager/BuildManager.cs#L1884

The set should contain a single item if all nodes declare a single plugin (plugin path + plugin settings).

Error cases:

  • set contains more than 1 plugins (error prints all declared plugins)
  • set contains a single plugin but there are nodes which do not declare it (error prints all nodes that do not declare the plugin)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Didn't realize SelectMany also flattened. If I were writing it, I might have said nodeToCacheItems.Values.Aggregate((x, y) => x.Union(y));, but I don't think that's better (or worse) than what you currently have.

I was originally confused in thinking that a given node could declare multiple plugins, and it was ok as long as only one of those was declared by all nodes, but I see that was wrong. Makes more sense now.

@cdmihai
Copy link
Contributor Author

cdmihai commented Dec 29, 2020

/azp run #Resolved

@azure-pipelines
Copy link

azure-pipelines bot commented Dec 29, 2020

Azure Pipelines successfully started running 1 pipeline(s).

#Resolved

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've essentially looked at documentation + BuildManager so far. I know we don't agree about all the omnisharp styling things, but it also shows up in my diff without whitespace changes, which makes it more confusing.

documentation/specs/project-cache.md Outdated Show resolved Hide resolved
documentation/specs/project-cache.md Outdated Show resolved Hide resolved
documentation/specs/project-cache.md Outdated Show resolved Hide resolved
documentation/specs/project-cache.md Outdated Show resolved Hide resolved
documentation/specs/project-cache.md Outdated Show resolved Hide resolved
src/Build/BackEnd/BuildManager/BuildManager.cs Outdated Show resolved Hide resolved
var solutionPath = config.Project.GetPropertyValue(SolutionProjectGenerator.SolutionPathPropertyName);

ErrorUtilities.VerifyThrow(
solutionPath != null && !string.IsNullOrWhiteSpace(solutionPath) && solutionPath != "*Undefined*",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone opens a single project rather than a solution, we should be able to use that as an "entrypoint" rather than the solution, right? Good fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll check and see what global properties VS sets in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VS lies when opened on a single project and reports a non existing solution path. For now I'll leave it as is and reconsider if it turns to be a common case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

})
.ToArray());

var cacheItems = nodeToCacheItems.Values.SelectMany(i => i).ToHashSet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow everything here. Wouldn't this be union rather than intersection, so wouldn't this find all the plugins declared by any node? How does this help you find the one from all of them? And I'm not sure how I follow how this turns into an error.

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little large to review too deeply, but I think I have a reasonable surface-level understanding now, thanks! I recognize that you wanted to imitate what is returned from an actual build (which clearly would allocate a lot), but I'm wondering if @ladipro might want to look, since he's been working to reduce (even temporary) allocations.

CacheHit,
CacheMiss,
CacheNotApplicable,
CacheError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly more inclined to return null on any failure case. The main question in my mind is whether you'd react differently to CacheMiss, CacheNotApplicable, and CacheError, but the right response, from what I can tell, is to pretend the cache doesn't exist and build normally for all three cases. May as well just have a single "build everything" return value.

src/Samples/ProjectCachePlugin/MockCacheFromAssembly.cs Outdated Show resolved Hide resolved
src/Shared/CollectionHelpers.cs Outdated Show resolved Hide resolved
src/Shared/UnitTests/MockLogger.cs Outdated Show resolved Hide resolved
src/Shared/UnitTests/TestEnvironment.cs Show resolved Hide resolved
})
.ToArray());

var cacheItems = nodeToCacheItems.Values.SelectMany(i => i).ToHashSet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Didn't realize SelectMany also flattened. If I were writing it, I might have said nodeToCacheItems.Values.Aggregate((x, y) => x.Union(y));, but I don't think that's better (or worse) than what you currently have.

I was originally confused in thinking that a given node could declare multiple plugins, and it was ok as long as only one of those was declared by all nodes, but I see that was wrong. Makes more sense now.

@cdmihai cdmihai force-pushed the projectCache branch 2 times, most recently from ff9d2fa to eca9219 Compare December 31, 2020 21:20
@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@cdmihai
Copy link
Contributor Author

cdmihai commented Jan 7, 2021

/azp run

Copy link
Member

@benvillalobos benvillalobos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting comments as I go. Haven't quite dug into the "meat and potatoes" yet.

src/Build/Resources/Strings.resx Outdated Show resolved Hide resolved

using (var buildManagerSession = new Helpers.BuildManagerSession(_env, _buildParametersPrototype))
using (var buildManagerSession = new Helpers.BuildManagerSession(_env, buildParameters))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to use a copy of the buildparameters here?

Copy link
Contributor Author

@cdmihai cdmihai Jan 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the BuildManagerSession mutates it to add some loggers and other BM cleanup options to avoid impacting other tests.

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gotten through everything here but no objection. Merge away while I'm out :)

<CopyNuGetImplementations>false</CopyNuGetImplementations>
<GenerateAssemblyInfo>false</GenerateAssemblyInfo>

<TargetFrameworks>netcoreapp2.1</TargetFrameworks>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note collision here @benvillalobos

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% confident, but I don't think any of the new new code will run unless the user specifies it.

@cdmihai cdmihai merged commit ab9a839 into dotnet:master Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants