Scheduler transition error from erred
to memory
#6283
Labels
deadlock
The cluster appears to not make any progress
erred
to memory
#6283
Found while trying to hack together a script for users to work around #6228. The goal was to look for workers that seemed like they might be stuck, and force them to shut down.
This is probably not very important. Just writing it down. This is not minimized at all and we shouldn't look into it much until #6272 is fixed at least, because I'm not sure how much that's related, and I think that changes some of the worker reconnect logic on the scheduler that might be problematic here too.
The
Error transitioning 'break_worker-d005ab4b7e4707de0ddc4d926ecb510f' from 'erred' to 'memory'
is around the fact that the transition is receiving kwargs when it thinks it shouldn't. My guess is this is something odd related to erroring suspicious tasks after 3 worker failures, then a worker rejoining somehow that actually has that task in memory?When you run this script, it'll cycle through closing the workers a couple times, then wait until you press enter to shut down (it's not deadlocked).
The text was updated successfully, but these errors were encountered: