Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ui] Fixes an issue where system jobs' status were set to Scaled Down when their allocs get garbage collected #24620

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

philrenaud
Copy link
Contributor

Description

Found a fun UI bug today: my system job had allocations fail, and then I worked on something else for several hours, and the garbage collector subsequently removed my terminal allocs.

When viewing the job in the UI, the job showed up as "Scaled Down". This is a new status we added this summer to help show when an operator manually set a job's group counts to 0 to let it sit dormant.

While there are problems with showing an alloc-less system job as "Failed" (it may in fact have been deliberately "scaled down" by virtue of taking all eligible nodes offline, for example), this seems like it's by far the most common reason you'd have an un-GC'd system job with no allocs, and so "Scaled Down" shouldn't be the default label for it. This PR makes it so system/sysbatch jobs cannot be labelled "Scaled Down" as such.

@philrenaud philrenaud self-assigned this Dec 6, 2024
@philrenaud philrenaud force-pushed the b-ui/scaled-down-status-fix-for-system-jobs branch from 1c77b2b to c521fc5 Compare December 6, 2024 21:40
@philrenaud philrenaud requested review from tgross and gulducat December 6, 2024 21:41
@philrenaud philrenaud marked this pull request as ready for review December 6, 2024 21:41
@philrenaud philrenaud requested review from a team as code owners December 6, 2024 21:41
@@ -246,7 +246,7 @@ export default class JobStatusPanelSteadyComponent extends Component {
}

const healthyAllocs = this.allocBlocks.running?.healthy?.nonCanary;
if (healthyAllocs?.length === totalAllocs) {
if (healthyAllocs?.length && healthyAllocs?.length === totalAllocs) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying that "0 allocs running of your 0 expected allocations!" doesn't mean "healthy"

// which is a best-guess-based-on-whats-running number. This means that if there are no current allocs,
// because they've been GC'd, we don't know if they were deliberately scaled down or failed.
// Safer in this case to show as failed rather than imply a deliberate scale-down.
if (totalAllocs === 0 && !this.hasClientStatus) {
Copy link
Contributor Author

@philrenaud philrenaud Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobs-index level status comes from this job model computed property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -224,7 +224,7 @@ export default class JobStatusPanelSteadyComponent extends Component {
};
}

if (this.totalAllocs === 0) {
if (this.totalAllocs === 0 && !this.job.hasClientStatus) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single-job-index-page level status comes from this component computed property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could add a feature for the user to configure an optimism rating, so we can show them what they want to believe in ambiguous situations 🙃

  • optimism=0 : it failed.
  • optimism=5 : it's scaled down!

(to be clear, this is just a joke)

@philrenaud philrenaud added the backport/1.9.x backport to 1.9.x release line label Dec 17, 2024
@philrenaud philrenaud merged commit 71e3716 into main Dec 17, 2024
23 of 26 checks passed
@philrenaud philrenaud deleted the b-ui/scaled-down-status-fix-for-system-jobs branch December 17, 2024 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.9.x backport to 1.9.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants