-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regressions in #4651 #4719
Fix regressions in #4651 #4719
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ name: Cancel | |
|
||
on: | ||
workflow_run: | ||
workflows: ["Tests"] | ||
workflows: [Tests] | ||
types: | ||
- requested | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2330,6 +2330,10 @@ def assert_memory(scheduler_or_workerstate, attr: str, min_, max_, timeout=10): | |
sleep(0.1) | ||
|
||
|
||
# This test is heavily influenced by hard-to-control factors such as memory management | ||
# by the Python interpreter and the OS, so it occasionally glitches | ||
@pytest.mark.flaky(reruns=3, reruns_delay=5) | ||
# ~33s runtime, or distributed.memory.recent_to_old_time + 3s | ||
@pytest.mark.slow | ||
def test_memory(): | ||
pytest.importorskip("zict") | ||
|
@@ -2379,34 +2383,45 @@ def test_memory(): | |
] | ||
sleep(2) | ||
assert_memory(s, "managed_spilled", 1, 999) | ||
# Wait for the spilling to finish. Note that this does not make the test take | ||
# longer as we're waiting for recent_to_old_time anyway. | ||
sleep(10) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there something more direct we can probe here instead of sleeping for 10 seconds? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really because the unmanaged memory is very volatile, so we don't know how many keys are going to be spilled out exactly. Also, as noted it doesn't slow the test down. |
||
|
||
# Delete spilled keys | ||
prev = s.memory | ||
del f1 | ||
del f2 | ||
assert_memory(s, "managed_spilled", 0, prev.managed_spilled / 2 ** 20 - 1) | ||
assert s.memory.managed_in_memory == prev.managed_in_memory | ||
assert_memory(s, "managed_spilled", 0, prev.managed_spilled / 2 ** 20 - 19) | ||
|
||
# Empty the cluster, with the exception of leaked memory | ||
del more_futs | ||
assert_memory(s, "managed", 0, 0) | ||
|
||
orig_unmanaged = s_m0.unmanaged / 2 ** 20 | ||
orig_old = s_m0.unmanaged_old / 2 ** 20 | ||
|
||
# Wait until 30s have passed since the spill to observe unmanaged_recent | ||
# transition into unmanaged_old | ||
c.run(gc.collect) | ||
orig_unmanaged = s_m0.unmanaged / 2 ** 20 | ||
orig_old = s_m0.unmanaged_old / 2 ** 20 | ||
assert_memory(s, "unmanaged_old", orig_old + 90, orig_old + 190, timeout=40) | ||
assert_memory(s, "unmanaged_recent", 0, 90, timeout=40) | ||
assert_memory( | ||
s, | ||
"unmanaged_old", | ||
orig_old + 90, | ||
# On MacOS, the process memory of the Python interpreter does not shrink as | ||
# fast as on Linux/Windows | ||
9999 if MACOS else orig_old + 190, | ||
timeout=40, | ||
) | ||
|
||
# When the leaked memory is cleared, unmanaged and unmanaged_old drop | ||
# This doesn't happen on MacOS, where the process memory of the Python | ||
# interpreter does not shrink (or takes much longer to shrink) | ||
# On MacOS, the process memory of the Python interpreter does not shrink as fast | ||
# as on Linux/Windows | ||
if not MACOS: | ||
c.run(clear_leak) | ||
assert_memory(s, "unmanaged", 0, orig_unmanaged + 95) | ||
assert_memory(s, "unmanaged_old", 0, orig_old + 95) | ||
assert_memory(s, "unmanaged_recent", 0, 95) | ||
assert_memory(s, "unmanaged_recent", 0, 90) | ||
|
||
|
||
@gen_cluster(client=True, worker_kwargs={"memory_limit": 0}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a way to write a unit test for this short of monkey-patching SystemMonitor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it's going to be the only way, another test does exactly that:
distributed/distributed/tests/test_worker.py
Lines 1697 to 1717 in e4b534a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is
distributed.admin.system-monitor.interval
which controls how often the monitor runs. You could set it to incredibly high values such that it is never executed during the test runtimeAnother patch version w/ using monkeypatch, you could remove the PC before it is even started. Something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting
distributed.admin.system-monitor.interval
to a very high value before I create the Scheduler has no effect ( I can see data arriving in the heartbeat from the SystemMonitor.update).Setting
fails with
this has no effect:
this has no effect: