Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting on tasks on workers that no longer exist #6198

Closed
mrocklin opened this issue Apr 25, 2022 · 3 comments · Fixed by #6585
Closed

Waiting on tasks on workers that no longer exist #6198

mrocklin opened this issue Apr 25, 2022 · 3 comments · Fixed by #6585
Labels
deadlock The cluster appears to not make any progress

Comments

@mrocklin
Copy link
Member

I was chatting with @bnaul today. He's run into a stuck cluster and has an interesting situation.

The cluster is mostly done with everything, but there are about 100 tasks yet to complete. However currently nothing is running them.

Screen Shot 2022-04-25 at 1 52 36 PM

Looking at info pages it looks like there are a few tasks in the processing state

Screen Shot 2022-04-25 at 1 53 04 PM

Interestingly these tasks already know their type, so presumably they've run to completion before.

Also interestingly, if I click on the worker page processing that task I get a 404, meaning that the worker is no longer in the scheduler state.

Somehow, the scheduler thinks that a task is running on a worker that no longer exists.

I tried getting a story out of the scheduler, but got no results. I suspect that this is because we've run past the deque length. I've asked @bnaul to increase the length to infinity and we'll try again the next time there is a failure.

cc @fjetter

@gjoseph92
Copy link
Collaborator

My guess is that this is the same as #6263 (comment).

@fjetter
Copy link
Member

fjetter commented Jun 15, 2022

@gjoseph92 IIUC we currently assume this should be closed by #6356?

@gjoseph92
Copy link
Collaborator

By whatever fixes #6356, yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deadlock The cluster appears to not make any progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants