-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pants 2.15.0rc4 can hang under pantsd #18211
Comments
So my blundering reading leads here: pants/src/rust/engine/task_executor/src/lib.rs Lines 173 to 187 in ad53134
That is implicated in a non-main thread in the backtrace capture attached in the OP. pants/src/python/pants/engine/internals/scheduler.py Lines 454 to 465 in ad53134
pants/src/python/pants/engine/internals/scheduler.py Lines 502 to 507 in ad53134
pants/src/python/pants/engine/internals/scheduler.py Lines 543 to 567 in ad53134
Which is used by:
|
Thanks John. The only thing that jumps out at me there is that there are 4 filesystem watching tasks (threads 11, 9, 7, 5), when I would expect 1. I don't see any lock interleavings/inversions in the stacktraces of any of the threads. It's possible that the 4 filesystem watching tasks are using up enough blocking threads on the tokio runtime to cause a deadlock (depending on the computed value of rule_threads_max), but I'll need to investigate further. |
This was a red herring: the filesystem watching threads are dedicated threads, running outside the runtime. They just confusingly don't have thread names assigned. Will continue looking. |
It looks like this is actually a Patch forthcoming. |
Give dedicated threads names to assist in debugging. Otherwise, the OS will occasionally give them the names of _other_ threads ([see](#18211 (comment))), which is the opposite of helpful.
By obviously weird do you mean the non-main thread trying to get the GIL? That is not obviously weird to me. I think it might be good to record your analysis here since - unless I'm the only one who this isn't obvious to - you appear to be a man on the bus. |
Always explicitly shutdown executors, to avoid them being dropped on arbitrary threads (including under the GIL). Fixes #18211.
Always explicitly shutdown executors, to avoid them being dropped on arbitrary threads (including under the GIL). Fixes pantsbuild#18211.
Yes, but not just: "trying to get the GIL" -- rather: "blocked forever trying to get the GIL". The reason it can't get the GIL is because the main thread didn't release it before trying to shut down the Executor. The Executor is shutting down due to As mentioned on the ticket: |
Thanks - that's much better. I'd think obvious to probably no one working Pants right now except for you. |
I have suffered hangs over in scie-pants using Pants 2.15.0rc4, see:
The context is an integration test where Pants is launched (
--pantsd
mode) in a subprocess with stdout piped for test use in assertions. This code here: https://github.com/pantsbuild/scie-pants/blob/3d269eb21ab7d764054496cd33530aaa364c6252/package/src/main.rs#L863-L892 The hang happens in theassert_pants_bin_name
calls. The links above on the 1st or 2nd call, the example below in the 3rd or 4th call (can't say for sure if the hang is on pantsd down or pantsd up, but I think its on the bring-down which would make these 1st and 3rd).I used this ssh rig to debug: pantsbuild/scie-pants#114
The commands run in the ssh session were:
log-slim.txt
log-slim2.txt
The text was updated successfully, but these errors were encountered: