Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test_missing_data_errant_worker #5932

Closed
crusaderky opened this issue Mar 11, 2022 · 3 comments · Fixed by #5961 or #5910
Closed

Flaky test_missing_data_errant_worker #5932

crusaderky opened this issue Mar 11, 2022 · 3 comments · Fixed by #5961 or #5910
Assignees
Labels
flaky test Intermittent failures on CI. regression

Comments

@crusaderky
Copy link
Collaborator

cc @fjetter
Since #5883, test_missing_data_errant_worker has become highly flaky and very frequently hangs.
The issue is easily reproducible on a fast desktop host (it hangs 30~50% of the times).

@crusaderky crusaderky added the flaky test Intermittent failures on CI. label Mar 11, 2022
@crusaderky
Copy link
Collaborator Author

big shoutout to @ian-r-rose and his tool https://dask.org/distributed/test_report.html which made this kind of bisect work a breeze!

@crusaderky
Copy link
Collaborator Author

test_worker_stream_died_during_comm is also showing a suspiciously correlated pattern on https://dask.org/distributed/test_report.html but I could not reproduce the issue locally.

@crusaderky crusaderky changed the title Regression: flaky test_missing_data_errant_worker Flaky test_missing_data_errant_worker Mar 11, 2022
mrocklin pushed a commit that referenced this issue Apr 29, 2022
…6091)

This reinstates #5883
which was reverted in #5961 / #5932

I could confirm the flakyness of `test_missing_data_errant_worker` after this change and am reasonably certain this is caused by #5910 which causes a closing worker to be restarted such that, even after `Worker.close` is done, the worker still appears to be partially up. 

The only reason I can see why this change promotes this behaviour is that if we no longer block the event loop while the threadpool is closing, this opens a much larger window for incoming requests to come in and being processed while close is running.

Closes #6239
@fjetter
Copy link
Member

fjetter commented Apr 29, 2022

We just merged #6091 which will make this test flaky again until #5910 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI. regression
Projects
None yet
2 participants