-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Reverse Diagnostics Server #33307
Add Reverse Diagnostics Server #33307
Conversation
I couldn't add an area label to this PR. Checkout this page to find out which area owner to ping, or please add exactly one area label to help train me in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments and questions. Holding off on a more detailed review in expectation that things are still changing : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its shaping up but will still need some more work to be robust. Happy to chat if you've got questions about anything.
|
||
const BOOL fSuccessCloseHandle = ::CloseHandle(_hPipe); | ||
assert(fSuccessCloseHandle != 0); | ||
_ASSERTE(fSuccessCloseHandle != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add an assert that that OVERLAPPED operation is complete. I would also test to confirm that DisconnectNamedPipe() finishes a ConnectNamedPipe operation when it is still pending. Destroying an OVERLAPPED struct before the operation ends usually causes painful to debug memory corruption issues.
9f108d8
to
20f3745
Compare
* no select on unix * untested on unix
* Change DOTNET_DiagnosticsServerAddress to DOTNET_DiagnosticsClientModeAddress * works for original connection mode * untested for client mode
* fix array alloc * properly cast array size
* Adds ConnectionState class for hiding server/client diff * simplifies code for easier reading
* makes advertisement not block for more than 100 ms * TODO: implement on non-windows
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, things are starting to look ship shape 👍 I did find a few more minor potential issues, comments inline.
_pIpc->Close(callback); | ||
if (_pStream != nullptr) | ||
_pStream->Close(callback); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_pIpc and _pStream need to be deleted (not that a leak on shutdown would really matter, but may as well clean up properly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically leaked these since the PAL is unaware of shutdown paths and has access to these pointers. This method is specifically called from a different thread than the server thread. If we delete them in this code path it can lead to AVs on the server thread in the PAL (inside IpcStream::DiagnosticsIpc::Poll
specifically). Rather than introduce some form of knowledge of runtime state to the PAL or potentially adding locks protecting access I opted to just leak the memory on shutdown, since that is the only place this method is called. I should add a comment that calls out this assumption, though.
Remaining failure appears to be an AzDO Package Feed or NuGet manifest error. All test runs passed. |
This PR adds the ability for the runtime to connect to a pre-existing IPC Transport for the Diagnostics Server. This enables several interesting scenarios, including:
dotnet trace run <executable> ...
)This is achieved, by adding the non-blocking
Accept
(thinklisten
on Linux),Connect
, andPoll
APIs to the diagnostics PAL. Using these, the runtime is able to listen on the original IPC Transport it creates ($TMPDIR/dotnet-diagnostics-<pid>
) and this reversed connection if configured.This reversed connection is opt-in and only activates when a path is specified in the
DOTNET_DiagnosticsMonitorAddress
environment variable.If a user configures this mechanism, the runtime will infinitely attempt to connect the specified transport and is resilient to that transport closing and reopening. The retry logic is as follows:
The runtime caches reverse connections so that it is not constantly attaching to the reverse transport every time poll times out or the server connection is used. These cache entries get cleared when a connection is used or the connection is hung up.
This PR is currently missingThe following will be a separate PR:-1
(infinite), or0
ton
millisecondseemain
before an EventPipe session is started.0
(no block)CC - @tommcdon @shirhatti @noahfalk @sywhang