-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: send alerts can wait forever and fail for broken workflows #9638
Conversation
✅ Deploy Preview for determined-ui canceled.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9638 +/- ##
=======================================
Coverage 52.99% 53.00%
=======================================
Files 1255 1255
Lines 152884 152884
Branches 3233 3234 +1
=======================================
+ Hits 81015 81029 +14
+ Misses 71718 71704 -14
Partials 151 151
Flags with carried forward coverage won't be shown. Click here to find out more. |
8ebd932
to
ff129d7
Compare
for w in workflows["items"]: | ||
if w["name"] in workflows_to_skip: | ||
continue | ||
|
||
workflow_id = w["id"] | ||
if not workflows_are_running and w["stopped_at"] is None: | ||
print(f"waiting for at least workflow {w['name']} to finish") | ||
workflows_are_running = True | ||
created_at = datetime.datetime.strptime(w["created_at"], "%Y-%m-%dT%H:%M:%SZ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about this
created_at = datetime.datetime.strptime(w["created_at"], "%Y-%m-%dT%H:%M:%SZ")
if created_at < earliest_accepted_time:
print(f"workflow {w['name']} timed out.")
# TODO: add support for reporting as a timeout or failure.
continue
print(f"waiting for at least workflow {w['name']} to finish")
workflows_are_running = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went to change it but then I thought about it again and IMO the existing one makes it easier to show no behavior has changed other than not setting the workflow as running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, it's just style at the end of the day
why don't you make an issue for this? the issue content should explain the problem or "why" you're doing this work. it's your current pr description.
|
thanks for the review! |
Ticket
https://hpe-aiatscale.atlassian.net/browse/RM-372
Description
adding a condition to skip over workflows that have elapsed their timeout instead of reporting them as active.
make sure send alerts does not:
why
send-alert job:
this job has been staying active for 5hours
https://app.circleci.com/pipelines/github/determined-ai/determined/57843/workflows/1a75f6bf-77b2-45e5-8155-9f2f5dcc96c4/jobs/2742238
testing the timeout https://app.circleci.com/pipelines/github/determined-ai/determined/58023/workflows/53afd482-abd0-403a-9346-0de85400e533/jobs/2756232
context https://hpe-aiatscale.slack.com/archives/C04C9JXB1C2/p1720448458815989
Test Plan
Checklist
docs/release-notes/
See Release Note for details.