You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.
We have an older Dagr pipeline that has been run many times (updated to use 040d12e though).
In very rare non-reproducible cases we appear to hit a deadlock that causes the pipeline to halt or creep to a glacial pace.
Conditions that may relate to the issue, or could simply be coincidences:
Some tasks have been scheduled under a subsequent retry after failure, eventually succeeding
Some tasks have been started but others are unknown to the task manager
In one unbounded case, a job that was estimated to take a few hours, took days before we terminated it
Final logs (before prematurely cancelling the job) look like:
TaskManager | Warning] ********************************************************************************
TaskManager | Warning] A single step in execution was > 30s (31s). | Warning] Found 14 tasks with status: is unknown
TaskManager | Warning] Found 6 tasks with status: has been started
TaskManager | Warning] Found 49 tasks with status: has succeeded
Because this is rare, and we can enforce TTL policies on the running of this pipeline, it's not critical we fix any underlying issue.
Simply posting the issue in case anyone else hits something similar, and wants to feel less alone!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
We have an older Dagr pipeline that has been run many times (updated to use 040d12e though).
In very rare non-reproducible cases we appear to hit a deadlock that causes the pipeline to halt or creep to a glacial pace.
Conditions that may relate to the issue, or could simply be coincidences:
Final logs (before prematurely cancelling the job) look like:
Because this is rare, and we can enforce TTL policies on the running of this pipeline, it's not critical we fix any underlying issue.
Simply posting the issue in case anyone else hits something similar, and wants to feel less alone!
The text was updated successfully, but these errors were encountered: