-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subprocess Killed Task is Not Marked as Failed #6024
Comments
I can confirm this issue: when an agent gets SIGKILL'ed (or one of its subprocesses receive SIGKILL), the agent has no chance to update the db. @madkinsz can server mark such flows as failed when it loses heartbeat from an agent? This is particularly common in a kubernetes environment, as k8s reserves the right to SIGKILL any subprocesses without warning to prevent the main process from going OOM. I also reported this here: #7948 (comment) Related to but different from #8270 |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
ping |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
ping |
Hi @desertaxle 👋 taking the liberty to ping you for this one :) There are two scenarios where a flow needs to be marked as failed:
Where in the code should this be orchestrated? Both up to the server? Or 1. up to the agent and 2. up to the server? |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
ping |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
ping |
This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment. |
ping |
@billpalombi could you triage this so it doesn't keep getting stale botted? @ddelange your best bet for moving this forward is to include a MRE |
Thanks @zanieb! Not sure how we kept missing this. |
Description
When prefect's subprocess is killed (in this case due to exceeding container limits), Prefect does not properly mark the task as failed.
Prefect appears to be aware the task has failed, but fails to mark it:
In this case, I would expect Prefect to mark the task as failed, otherwise it is left stuck in pending.
Reproduction / Example
Difficult to give repro. Issue is being encountered by exceeding container memory limits. Issuing a SIGKILL to the subprocess while executing should repro the issue.
The text was updated successfully, but these errors were encountered: