-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure: tracing/eventpipe/reverse/reverse/reverse.sh #38156
Comments
dotnet#38156 tracks work for solution
I'm not positive this is the same issue, but I've managed to recreate an issue locally using a modified version of the test that infinitely loops under a debugger. My conjecture is that the target process is exhausting its limit of file descriptors. My local repro infinitely loops the one test case without recycling the app (which may exacerbate the problem slightly) and eventually the monitoring app will pause indefinitely waiting for an advertisement from the target app. Looking in the stresslog for the target app shows that the pause is caused by the target app failing to Specifically, the socket(7) man page says this about the
The main point I take from that snippet is that if I'm currently looking into ways to change the calls to |
On further reflection, I'm not convinced that the potential issue I found in the previous comment is the same as the one from the CI log messages, or that it is necessarily an issue at all. The presentation is similar, but the steps for reproducing the mentioned error consistently require ~155 seconds of constantly deleting and recreating the unix domain socket without recycling the target app. I think that by not recycling the app and constantly deleting and recreating the socket, I was abusing the The error in CI seems to reproduce inconsistently, occurs only in the |
Managed to repro the failure. I ran the test that is in CI with the timeouts taken out in an infinite loop in 3 separate terminals. One of the instances eventually hit an infinite pause with identical symptoms to the CI failures. I managed to take a core dump of the subprocess, but attempting to run |
Happened again: https://helix.dot.net/api/2019-06-17/jobs/24babe78-7797-4a21-b7d0-6157ef54bf62/workitems/PayloadGroup0/console
Any update on this?
Originally posted by @safern in #35270 (comment)
The text was updated successfully, but these errors were encountered: