Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify MainNodeSdkResolverService #7244

Merged
merged 3 commits into from
Mar 4, 2022

Conversation

jeffkl
Copy link
Contributor

@jeffkl jeffkl commented Jan 10, 2022

Fixes #7137

Context

The MainNodeSdkResolverService process incoming requests from out-of-proc nodes to handle SDK resolution. The main node has a queue of all incoming packets that are processed by appropriate handlers. The MainNodeSdkResolverService currently takes SdkResolverRequests and places them in a queue for processing. This seems unnecessary to have two queues and two threads draining the queues.

Changes Made

Remove queue and thread that process it in favor of launching a fire-and-forget task that resolves the SDK and sends the response.

Testing

Existing tests cover this area

Notes

@Forgind
Copy link
Member

Forgind commented Jan 10, 2022

I saw your commit is called WIP, but this isn't a draft, and it doesn't have the WIP label. Do you want reviews?

@jeffkl
Copy link
Contributor Author

jeffkl commented Jan 10, 2022

@Forgind Yeah sorry my original commit was a work in progress and I ran the tests locally with some failures. So I pushed the commit and remembered that some tests just don't pass locally so I opened the PR and forgot my commit name was WIP. Should I update the commit message or will we just update it if/when this is merged?

@Forgind
Copy link
Member

Forgind commented Jan 10, 2022

Not a problem; we can update it when it's merged. I just wasn't sure whether you'd wanted it reviewed 🙂

Copy link
Member

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still react properly to shutdown events? I didn't see anything about that. Otherwise, looks great! Love taking out excess complication.

@jeffkl
Copy link
Contributor Author

jeffkl commented Jan 11, 2022

Does this still react properly to shutdown events? I didn't see anything about that.

Good question, I think the packets will just stop coming in but existing requests will still be running. Let me see if I can simulate a shutdown an see what actually happens. If it doesn't respond well, I'll add cancellation support to the Task.Run().

@jeffkl
Copy link
Contributor Author

jeffkl commented Jan 18, 2022

Does this still react properly to shutdown events?

I looked into this and shutdown events aren't really applicable anymore since the tasks that respond are fire-and-forget and there's nothing to shutdown. Cancellation doesn't work, but it never has before since cancellation tokens aren't passed into evaluator.

_requestReceivedEvent.Reset();
nodeManager.SendData(request.NodeId, response);
}
}).ConfigureAwait(continueOnCapturedContext: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does continueOnCapturedContext: false make a difference when the task has no continuation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I look into it, it seems like this method is already running on a thread pool thread. The communication receives packets and runs these with a worker queue. Maybe I don't even need to use Task.Run() at all. Let me look into it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version doesn't call Task.Run() this this method is already running in a thread pool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, it is running in the thread pool but it doesn't mean that it can be synchronous. Here's the callstack I see with my test build:

 	Microsoft.Build.dll!Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(int node, Microsoft.Build.BackEnd.INodePacket packet) Line 81	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.PacketFactoryRecord.RoutePacket(int nodeId, Microsoft.Build.BackEnd.INodePacket packet) Line 110	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.PacketFactoryRecord.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.ITranslator translator) Line 102	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodePacketFactory.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.ITranslator translator) Line 58	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeManager.DeserializeAndRoutePacket(int nodeId, Microsoft.Build.BackEnd.NodePacketType packetType, Microsoft.Build.BackEnd.ITranslator translator) Line 261	C#
 	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.ReadAndRoutePacket(Microsoft.Build.BackEnd.NodePacketType packetType, byte[] packetData, int packetLength) Line 1090	C#
>	Microsoft.Build.dll!Microsoft.Build.BackEnd.NodeProviderOutOfProcBase.NodeContext.BodyReadComplete(System.IAsyncResult result) Line 1144	C#

The frame at the bottom has:

                    // Read the next packet.
                    BeginAsyncPacketRead();

as the next statement to execute. Can we really afford to block processing of further packets from the node until the SDK is resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looked at it. With current state of code we can afford to block packet processing until SDK is resolved. Nodes are single threaded, and node is blocked and wait for response while it sends request to main node for SDK resolution. So it is guaranteed, if I understand it correctly, that there is actually no other packet to be processed - on that particular namedpipe.

@rainersigwald
Copy link
Member

Also, how do you assess risk here? Would it be possible to preserve the old behavior under a changewave Just In Case™️?

@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 1, 2022

Also, how do you assess risk here? Would it be possible to preserve the old behavior under a changewave Just In Case™️?

I don't think there's much risk here. From what I can tell, fundamental flaws in this code would almost immediately show up for any build that's using out-of-proc nodes. There could be a performance impact but I expect things to be faster since the new code should be launching less threads. I think we're early enough in the release cycle to catch any big problems. I do want to try an experimental insertion and install the build just to test out the perf. My local builds aren't NGen'd and are always slower...

@jeffkl jeffkl force-pushed the simplify-mainnodesdkresolverservice branch 2 times, most recently from ff1aab1 to ab20718 Compare February 2, 2022 16:43
@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 2, 2022

@rainersigwald can I get an experimental insertion to install and try out please?

@Forgind
Copy link
Member

Forgind commented Feb 2, 2022

(Resolved offline)

@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 3, 2022

Results are in, everything works! I cloned https://github.com/msbuild/msbuildsdks which uses a lot of MSBuild project SDKs to test

Current version (17.0)

D:\msbuildsdks>msbuild /version
Microsoft (R) Build Engine version 17.0.0+c9eb9dd64 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

17.0.0.52104

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:08.96

My Version

D:\msbuildsdks>msbuild /version
Microsoft (R) Build Engine version 17.2.0-preview-22102-01+ab20718e2 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

17.2.0.10201

Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:00:05.59

So my change doesn't make anything slower. I'd imagine I'm also benefiting from other unshipped performance gains so I won't take any credit there. But nothing is broken!

@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 3, 2022

Also forgot to mention, builds in Visual Studio work just fine as well!

@Forgind
Copy link
Member

Forgind commented Feb 3, 2022

I'd imagine I'm also benefiting from other unshipped performance gains so I won't take any credit there.

Very optional, but you can try downloading a recent main build (ideally from right before your change, but eh) to see if it changes anything, I wouldn't be surprised if that was entirely your change!

Copy link
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. If I understand code correctly, this changes shall not cause regression in MSBuild command line builds. In VS I recommend to measure and compare at some high CPU cores count machine and large projects.

@jeffkl jeffkl force-pushed the simplify-mainnodesdkresolverservice branch from 0431f45 to 7ba2fe9 Compare February 14, 2022 18:53
@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 18, 2022

@rainersigwald anything we're waiting on for this PR?

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit worried about not having a way to opt back into the old behavior, but if you're super confident that this is better in all cases let's get it in for preview 2.

@jeffkl
Copy link
Contributor Author

jeffkl commented Feb 18, 2022

I'm still a bit worried about not having a way to opt back into the old behavior, but if you're super confident that this is better in all cases let's get it in for preview 2.

I did a lot of testing and couldn't find any cases where it was causing issues. I loaded a large solution in Visual Studio, did a large command-line build, I added delays to a test SDK resolver, and DDRITs passed. The sooner this goes in the better for sure, I'm really hoping we don't need an escape hatch since this code path is used so much, we should find if there are any problems with it very soon.

@Forgind Forgind added the merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now. label Feb 28, 2022
@AR-May AR-May merged commit 2e82c9f into dotnet:main Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-branch-open PRs that are approved, except that there is a problem that means we are not merging stuff right now.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MainNodeSdkResolverService does not need to process requests with a long running thread and queue
6 participants