-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a CANCELLING
state type
#7794
Conversation
✅ Deploy Preview for prefect-orion ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
...grations/versions/sqlite/2022_12_06_161948_50ab89b8fb35_add_cancelling_to_state_type_enum.py
Outdated
Show resolved
Hide resolved
Does this feel like the correct move? |
57f86d8
to
207ee04
Compare
I think it's the correct move, I think it makes more sense than the RUNNING -> RUNNING transition we discussed. |
e58361f
to
e62e4a0
Compare
is there a specific interaction the cancelling state helps us spell out? I'm curious since I implemented something similar for |
@anticorrelator I'd have to know more about the 'marker to a run' to say if that would work here. The interaction that this specifically is about is the interaction between the agent and the running flows. It needs to know which flow runs to Cancel and to also know which flow runs have already been cancelled. Currently this is accomplished by setting the flow run into a This works, but as stated above, it cause two issues:
|
There's definitely an ick factor on doing client operations on state names. I was thinking of a mechanism that looked like attaching metadata to the |
I'm also a little worried about adding CANCELLING if it's always treated like RUNNING, but I think there are some important differences, like:
I think retaining concurrency slots may be one of the few cases where they are identical. |
@anticorrelator I'd be more that happy to discuss alternatives. In the cancellation doc we outlined a bunch of discussion we had before we implemented the current version: https://www.notion.so/prefect/Run-Cancellation-6ccd41c867aa4abf903e23db42a3f287#3536e0384bc94aebbe27b2a3cb43d543 |
@anticorrelator and I discussed this and we agreed that this is probably the best way forward given the needs of the agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, as mentioned async: we would benefit from an extension to PreventRedundantTransitions
or a new rule that only allows Cancelling
-> Cancelled
(and maybe crashed/failed?) transitions
What kind of migration strategy do we have in mind for Cloud? How can we make it clear that this feature requires changes and client / server versions can't be mixed in the OSS? |
After discussion with Zach and Dustin it appears that we'll need to:
Isn't that what the bump to the Orion API version signals? |
@bunchesofdonald Bumping the API version isn't particularly user facing. I honestly am often confused about when it causes errors to be raised these days though. What happens when I have a server with this code and a client with the old code calls |
f3f6f3c
to
0bddced
Compare
6663194
to
e95ad93
Compare
@madkinsz After discussing this with @abrookins, we decided that feature flagging was the way to go here. This change should be good to go as it is, but I'll follow up with another PR to make the CLI/Agent check that flag and change the state it's using. |
if not context.validated_state.is_running(): | ||
if context.validated_state.type not in [ | ||
states.StateType.RUNNING, | ||
states.StateType.CANCELLING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need a matching update in cloud
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
…erm-testing * 'main' of https://github.com/prefecthq/prefect: (77 commits) Update roles and permissions in documentation (PrefectHQ#8263) Add Prefect Cloud Quickstart tutorial (PrefectHQ#8227) Remove needless log Update comment for consistency Reorder migrations for clarity Refactor cancellation cleanup service Uses canonical `CANCELLING` states for run cancellations (PrefectHQ#8245) Add cancellation cleanup service (PrefectHQ#8128) Improve engine shutdown handling of SIGTERM (PrefectHQ#8127) Create a `CANCELLING` state type (PrefectHQ#7794) Update KubernetesJob options (PrefectHQ#8261) Small work pools UI updates (PrefectHQ#8257) Removes migration logic (PrefectHQ#8255) Consolidate multi-arch docker builds (PrefectHQ#7902) Include nested `pydantic.BaseModel` secret fields in blocks' schema (PrefectHQ#8246) Improve contributing documentation with venv instructions (PrefectHQ#8247) Update Python tests to use a single test matrix for both databases (PrefectHQ#8171) Adds migration logic for work pools (PrefectHQ#8214) Add `project_urls` to `setup.py` (PrefectHQ#8224) Add `is_schedule_active` to client `Deployment` class (PrefectHQ#7430) ...
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
Co-authored-by: Zach Angell <[email protected]>
The MVP for cancellation used a
CANCELLED
state with a name ofCancelling
for the agent to decide which flows need to be cancelled. The main issue with this is that concurrency slots are immediately released when that transition happens even though the tasks/flows are still technically running. There is also some general 'ick' factor in using a state name for an important operation like cancellation.This introduces a
CANCELLING
state type and sets it up so that it maintains / occupies concurrency limit slots, both for tag-based task usage as well as work queues.Related to #7735
Checklist
<link to issue>
"fix
,feature
,enhancement