-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove wrong assert in handle compute #6370
Remove wrong assert in handle compute #6370
Conversation
This has been just changed in tornado git tip: tornadoweb/tornado#3117
Unnecessary after #6348 |
await wait(f3) | ||
f4 = c.submit(inc, f3, key="f4", workers=[w2.address]) | ||
|
||
await enter_get_data_1.wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post #6371, you can:
1.
await enter_get_data_1.wait() | |
await wait_for_state(f1.key, "flight", w2) | |
await wait_for_state(f2.key, "flight", w2) |
- get rid of the BlockGetDataWorker subclass
- initialise the worker with gen_cluster
- use
event = asyncio.Event()
w1.rpc = _LockedCommPool(w1.rpc, write_event=event)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like my version with the worker subclass better. The _LockedCommPool
requires much more low level knowledge and is more brittle in my opinion. I think it should only be used if nothing else is possible.
I understand the use case now. I really don't think it's healthy, but I understand it. Below is the full story (I renamed the simulus IDs for the sake of readability; broken down in 1 paragraph per stimulus ID)
What I understand is happening in a real life scenario:
What should happen, in my opinion:
|
I do not think it should fetch the tasks right away and I honestly don't think it would reduce complexity.
This has been brought up very frequently. As it stands right now, we cannot cancel a gather_dep request for a single key. As long as this is not possible we cannot get rid of the cancelled/resumed mechanism. |
2a24f3e
to
3e9134b
Compare
This removes an erroneous assert statement introduced in #6327
See #6327 (comment) for details
The added test condition triggers this exact assert statement. However, the test passes properly if the assert is removed. All transitions happen as expected.
While debugging this, I noticed that the find-missing PC is actually running concurrently. I was surprised about this because the docs of
PeriodicCallback
specifically mention that an iteration is skipped if it takes too long, see https://github.com/tornadoweb/tornado/blob/43ae5839a56e445dd2d10539718f1e0c8053d995/tornado/ioloop.py#L863-L864 I'll break this out into a dedicated PR