Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test_missing_data_errant_worker #5961

Merged
merged 1 commit into from
Mar 18, 2022
Merged

Conversation

crusaderky
Copy link
Collaborator

#5883 introduced a (to my superficial opinion) genuine use case where Worker.close() would never return, as demonstrated by test_missing_data_errant_worker which was hanging very frequently on the line await w1.close() after it.

@graingert could you review and merge today?
@fjetter feel free to reinstate and tweak after the release

@crusaderky crusaderky requested a review from graingert March 18, 2022 17:04
@crusaderky crusaderky self-assigned this Mar 18, 2022
Copy link
Member

@graingert graingert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a clean revert

#5883 introduced a (to my superficial opinion) genuine use case where Worker.close() would never return

I'm not sure what use case this is though

@crusaderky
Copy link
Collaborator Author

I'm not sure what use case this is though

w1 is being closed while there's a gather_dep in progress to copy data from w1 to w3

@github-actions
Copy link
Contributor

Unit Test Results

       12 files  +       2         12 suites  +2   5h 54m 16s ⏱️ + 1h 41m 20s
  2 665 tests  -        1    2 579 ✔️  -        4    80 💤  -     1  6 +4 
13 029 runs  +2 644  12 385 ✔️ +2 506  637 💤 +134  7 +4 

For more details on these failures, see this check.

Results for commit d72d87a. ± Comparison against base commit 2d3fddc.

@crusaderky crusaderky merged commit 0dab526 into dask:main Mar 18, 2022
@crusaderky crusaderky deleted the revert_5883 branch March 18, 2022 20:20
mrocklin pushed a commit that referenced this pull request Apr 29, 2022
…6091)

This reinstates #5883
which was reverted in #5961 / #5932

I could confirm the flakyness of `test_missing_data_errant_worker` after this change and am reasonably certain this is caused by #5910 which causes a closing worker to be restarted such that, even after `Worker.close` is done, the worker still appears to be partially up. 

The only reason I can see why this change promotes this behaviour is that if we no longer block the event loop while the threadpool is closing, this opens a much larger window for incoming requests to come in and being processed while close is running.

Closes #6239
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test_missing_data_errant_worker
2 participants