Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: keep waiting on failing workloads for sending slack alerts #9371

Merged
merged 6 commits into from
May 15, 2024

Conversation

NicholasBlaskey
Copy link
Contributor

@NicholasBlaskey NicholasBlaskey commented May 15, 2024

Ticket

Description

Sometimes we could miss alerts due to the alert job stopping before all jobs are completed.

https://hpe-aiatscale.slack.com/archives/C9LFPNA3Y/p1715639036613519

Update the script to wait for failing state too.

Test Plan

see if we keep having this issue

Checklist

  • Changes have been manually QA'd
  • User-facing API changes need the "User-facing API Change" label.
  • Release notes should be added as a separate file under docs/release-notes/.
    See Release Note for details.
  • Licenses should be included for new code which was copied and/or modified from any external code.

@cla-bot cla-bot bot added the cla-signed label May 15, 2024
Copy link

netlify bot commented May 15, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit 1e47068
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/6644f8b61cb9bf0007631244

@NicholasBlaskey NicholasBlaskey changed the title test state ci: keep waiting on failing workloads for sending slack alerts May 15, 2024
Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.29%. Comparing base (3b1d0df) to head (1e47068).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9371   +/-   ##
=======================================
  Coverage   45.28%   45.29%           
=======================================
  Files        1227     1227           
  Lines      154048   154048           
  Branches     2404     2403    -1     
=======================================
+ Hits        69767    69773    +6     
+ Misses      84089    84083    -6     
  Partials      192      192           
Flag Coverage Δ
backend 41.76% <ø> (+0.01%) ⬆️
harness 64.07% <ø> (ø)
web 36.33% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 4 files with indirect coverage changes

@NicholasBlaskey NicholasBlaskey marked this pull request as ready for review May 15, 2024 15:52
@NicholasBlaskey NicholasBlaskey requested a review from a team as a code owner May 15, 2024 15:52
@@ -41,7 +41,7 @@ def send_alerts_for_failed_jobs(sent_alerts: Set[str]) -> bool:
continue

workflow_id = w["id"]
if not workflows_are_running and w["status"] == "running":
if not workflows_are_running and w["status"] in ["running", "failing"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe w["stopped_at"] might be better to use?

@NicholasBlaskey NicholasBlaskey enabled auto-merge (squash) May 15, 2024 18:10
@NicholasBlaskey NicholasBlaskey merged commit 93c8d81 into main May 15, 2024
81 of 95 checks passed
@NicholasBlaskey NicholasBlaskey deleted the alert_failing_state branch May 15, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants