Cache root-ish-ness for consistency #7262

gjoseph92 · 2022-11-05T01:45:27Z

This is a way of avoiding the consistency issues in #7259 with less thinking. If root-ish-ness can't change, things are simpler.

I don't love having to do this. But hopefully this will be determined statically (and likely cached) anyway: #6922.

Tests added / passed
Passes pre-commit run --all-files

it's possible for tasks to not be rootish when they go into no-worker, but to be rootish when they come out.

gjoseph92 · 2022-11-05T01:46:19Z

distributed/scheduler.py

@@ -7052,6 +7083,8 @@ def get_metadata(self, keys: list[str], default=no_default):
    def set_restrictions(self, worker: dict[str, Collection[str] | str]):
        for key, restrictions in worker.items():
            ts = self.tasks[key]
+            if ts._rootish is not None:


I don't like that set_restrictions is a public API at all. Doesn't seem like something you should be able to do post-hoc.

This fails test_reschedule_concurrent_requests_deadlock, which sets restrictions on a processing task.

distributed/distributed/tests/test_steal.py

Lines 1292 to 1355 in 711997e

@gen_cluster(

client=True,

nthreads=[("", 1)] * 3,

config={

"distributed.scheduler.work-stealing-interval": 1_000_000,

},

)

async def test_reschedule_concurrent_requests_deadlock(c, s, *workers):

# https://github.com/dask/distributed/issues/5370

steal = s.extensions["stealing"]

w0 = workers[0]

ev = Event()

futs1 = c.map(

lambda _, ev: ev.wait(),

range(10),

ev=ev,

key=[f"f1-{ix}" for ix in range(10)],

workers=[w0.address],

allow_other_workers=True,

)

while not w0.active_keys:

await asyncio.sleep(0.01)

# ready is a heap but we don't need last, just not the next

victim_key = list(w0.active_keys)[0]

victim_ts = s.tasks[victim_key]

wsA = victim_ts.processing_on

other_workers = [ws for ws in s.workers.values() if ws != wsA]

wsB = other_workers[0]

wsC = other_workers[1]

steal.move_task_request(victim_ts, wsA, wsB)

s.set_restrictions(worker={victim_key: [wsB.address]})

s._reschedule(victim_key, stimulus_id="test")

assert wsB == victim_ts.processing_on

# move_task_request is not responsible for respecting worker restrictions

steal.move_task_request(victim_ts, wsB, wsC)

# Let tasks finish

await ev.set()

await c.gather(futs1)

assert victim_ts.who_has != {wsC}

msgs = steal.story(victim_ts)

msgs = [msg[:-1] for msg in msgs] # Remove random IDs

# There are three possible outcomes

expect1 = [

("stale-response", victim_key, "executing", wsA.address),

("already-computing", victim_key, "executing", wsB.address, wsC.address),

]

expect2 = [

("already-computing", victim_key, "executing", wsB.address, wsC.address),

("already-aborted", victim_key, "executing", wsA.address),

]

# This outcome appears only in ~2% of the runs

expect3 = [

("already-computing", victim_key, "executing", wsB.address, wsC.address),

("already-aborted", victim_key, "memory", wsA.address),

]

assert msgs in (expect1, expect2, expect3)

I don't like that set_restrictions is a public API at all. Doesn't seem like something you should be able to do post-hoc.

We can change it. First step is a deprecation warning

github-actions · 2022-11-05T03:53:08Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 6h 43m 28s ⏱️ + 13m 3s
  3 171 tests +  2   3 084 ✔️ ±0   83 💤 - 1   4 ❌ +  3
23 464 runs +16 22 543 ✔️ +2 903 💤 ±0 18 ❌ +14

For more details on these failures, see this check.

Results for commit 6f188bf. ± Comparison against base commit 4b00be1.

♻️ This comment has been updated with latest results.

fjetter · 2022-11-07T10:25:36Z

I don't like caching this. I don't consider it user friendly to pin the behavior of our decision logic pending on the time a computation is dispatched. I think this will be a hard to debug problem if users run into weird scheduling just because their cluster was still scaling up.

mrocklin · 2022-11-07T14:13:44Z

I'm curious, what are some situations where rootish-ness can change?

gjoseph92 · 2022-11-07T17:35:25Z

Broadly, it changes when either the size of the TaskGroup changes (add/cancel tasks) or the cluster size changes. Some specific examples:

With client.submit in a for-loop, the first nthreads * 2 tasks will be non-root-ish, the rest will be root-ish. But when the loop is done, is_rootish on every task is True, even the ones we originally scheduled as though they weren't root-ish.
You submit 100 tasks to a 40-thread cluster with queuing on, so they are all root-ish and get queued. They are very slow tasks. You scale up the cluster beyond 50 threads while some tasks are still queued. Now, when tasks come out of the queue, they are no longer root-ish.

gjoseph92 · 2022-11-07T18:09:06Z

We're going to take a different approach; see #7259 (comment). Rather than trying to make is_rootish static, we'll accept that it's dynamic and switch behavior and state when it changes.

gjoseph92 · 2022-11-08T03:00:19Z

I think this will be a hard to debug problem if users run into weird scheduling just because their cluster was still scaling up.

I think we're only worried about not queuing tasks when we should. That is, is_rootish gets cached as False when it later should be True. Queueing tasks when we "shouldn't" doesn't seem like a big deal to me; as we've seen, this makes little performance difference. If queuing is default scheduling behavior, or you've turned it on explicitly, it shouldn't be surprising when things get queued.

is_rootish getting cached as False when it later should be True can only happen when:

the TaskGroup grows
the cluster shrinks

So it won't happen when the cluster is scaling up. Just when submitting tasks before scaling down. And in that case:

tasks are non-rootish, they all get assigned to workers at once
workers leave. Their processing tasks are released. This resets the _rootish cache.
tasks are rescheduled, now seen as root-ish (cluster shrunk), and queued as expected.

So that case is fine. Only remaining case is the TaskGroup growing. That happens most commonly through client.submit in a for-loop.

first nthreads * 2 tasks are submitted immediately
rest are queued

...and that's the same behavior as if we weren't caching. Having the group split like this isn't ideal, but there's nothing we can do about it, and caching doesn't change things.

gjoseph92 · 2022-11-08T15:00:42Z

Also to clarify: the definition of caching in this approach is that is_rootish(ts) always returns the same thing only while the task is in queued or no-worker. Upon leaving those states, the cache is invalidated.

We don't need to cache root-ish-ness for the lifetime of a task; that's overkill. I only care that is_rootish is the same when a task gets put into the queued or unrunnable sets as when it's removed from those sets.

~~(To be clear I haven't fully implemented that here yet, but that would be the intent.)~~ edit: now implemented

gjoseph92 added 3 commits November 3, 2022 17:41

test for is_rootish changing on no-worker tasks

624add3

it's possible for tasks to not be rootish when they go into no-worker, but to be rootish when they come out.

WIP cache root-ish-ness

70bba66

update comment

711997e

gjoseph92 commented Nov 5, 2022

View reviewed changes

gjoseph92 mentioned this pull request Nov 5, 2022

All tasks without dependencies are root-ish #7221

Open

2 tasks

gjoseph92 mentioned this pull request Nov 7, 2022

Handle edge cases between queued and no-worker #7259

Closed

2 tasks

gjoseph92 closed this Nov 7, 2022

gjoseph92 mentioned this pull request Nov 8, 2022

[DNM] queued<->no-worker transitions #7267

Draft

2 tasks

invalidate _rootish when released

1806c86

gjoseph92 reopened this Nov 8, 2022

gjoseph92 added 8 commits November 8, 2022 11:06

also invalidate when entering processing

49dfa47

note when cache is set

a4a4002

add validation

b0474ae

fix incorrect validation

c757894

assertion info

c5a19f4

cache explicitly instead of by side effect

d182c2a

unnecessary diff

1a19cc6

correct comment

6f188bf

gjoseph92 mentioned this pull request Feb 8, 2023

Queue up non-rootish tasks if they break priority ordering #7526

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache root-ish-ness for consistency #7262

Cache root-ish-ness for consistency #7262

gjoseph92 commented Nov 5, 2022

gjoseph92 Nov 5, 2022

gjoseph92 Nov 5, 2022

fjetter Nov 7, 2022

github-actions bot commented Nov 5, 2022 •

edited

Loading

fjetter commented Nov 7, 2022

mrocklin commented Nov 7, 2022

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 8, 2022

gjoseph92 commented Nov 8, 2022 •

edited

Loading

	@gen_cluster(
	client=True,
	nthreads=[("", 1)] * 3,
	config={
	"distributed.scheduler.work-stealing-interval": 1_000_000,
	},
	)
	async def test_reschedule_concurrent_requests_deadlock(c, s, *workers):
	# https://github.com/dask/distributed/issues/5370
	steal = s.extensions["stealing"]
	w0 = workers[0]
	ev = Event()
	futs1 = c.map(
	lambda _, ev: ev.wait(),
	range(10),
	ev=ev,
	key=[f"f1-{ix}" for ix in range(10)],
	workers=[w0.address],
	allow_other_workers=True,
	)
	while not w0.active_keys:
	await asyncio.sleep(0.01)

	# ready is a heap but we don't need last, just not the next
	victim_key = list(w0.active_keys)[0]

	victim_ts = s.tasks[victim_key]

	wsA = victim_ts.processing_on
	other_workers = [ws for ws in s.workers.values() if ws != wsA]
	wsB = other_workers[0]
	wsC = other_workers[1]

	steal.move_task_request(victim_ts, wsA, wsB)

	s.set_restrictions(worker={victim_key: [wsB.address]})
	s._reschedule(victim_key, stimulus_id="test")
	assert wsB == victim_ts.processing_on
	# move_task_request is not responsible for respecting worker restrictions
	steal.move_task_request(victim_ts, wsB, wsC)

	# Let tasks finish
	await ev.set()
	await c.gather(futs1)

	assert victim_ts.who_has != {wsC}
	msgs = steal.story(victim_ts)
	msgs = [msg[:-1] for msg in msgs] # Remove random IDs

	# There are three possible outcomes
	expect1 = [
	("stale-response", victim_key, "executing", wsA.address),
	("already-computing", victim_key, "executing", wsB.address, wsC.address),
	]
	expect2 = [
	("already-computing", victim_key, "executing", wsB.address, wsC.address),
	("already-aborted", victim_key, "executing", wsA.address),
	]
	# This outcome appears only in ~2% of the runs
	expect3 = [
	("already-computing", victim_key, "executing", wsB.address, wsC.address),
	("already-aborted", victim_key, "memory", wsA.address),
	]
	assert msgs in (expect1, expect2, expect3)

Cache root-ish-ness for consistency #7262

Are you sure you want to change the base?

Cache root-ish-ness for consistency #7262

Conversation

gjoseph92 commented Nov 5, 2022

gjoseph92 Nov 5, 2022

Choose a reason for hiding this comment

gjoseph92 Nov 5, 2022

Choose a reason for hiding this comment

fjetter Nov 7, 2022

Choose a reason for hiding this comment

github-actions bot commented Nov 5, 2022 • edited Loading

Unit Test Results

fjetter commented Nov 7, 2022

mrocklin commented Nov 7, 2022

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 7, 2022

gjoseph92 commented Nov 8, 2022

gjoseph92 commented Nov 8, 2022 • edited Loading

github-actions bot commented Nov 5, 2022 •

edited

Loading

gjoseph92 commented Nov 8, 2022 •

edited

Loading