Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of CSI: no early return when feasibility check fails on eligible nodes into release/1.2.x #13276

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #13274 to be assessed for backporting due to the inclusion of the label backport/1.2.x.

The below text is copied from the body of the original PR.


Fixes #11778

As a performance optimization in the scheduler, feasibility checks
that apply to an entire class are only checked once for all nodes of
that class. Other feasibility checks are "available" checks because
they rely on more ephemeral characteristics and don't contribute to
the hash for the node class. This currently includes only CSI.

We have a separate fast path for "available" checks when the node has
already been marked eligible on the basis of class. This fast path has
a bug where it returns early rather than continuing the loop. This
causes the entire task group to be rejected.

Fix the bug by not returning early in the fast path and instead jump
to the top of the loop like all the other code paths in this method.
Includes a new test exercising topology at whole-scheduler level and a
fix for an existing test that should've caught this previously.


Note that the test here is potentially limited in reliability because of the random iterator, but:

$ go test -v ./scheduler -run TestServiceSched_CSITopology -count=100
...
--- PASS: TestServiceSched_CSITopology (0.00s)
PASS
ok      github.com/hashicorp/nomad/scheduler    0.424s

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/b-csi-feasibility-check/remotely-keen-chigger branch from b25cc95 to e8c332b Compare June 7, 2022 17:31
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit 7bca4f9 into release/1.2.x Jun 7, 2022
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/b-csi-feasibility-check/remotely-keen-chigger branch June 7, 2022 17:31
@github-actions
Copy link

github-actions bot commented Oct 7, 2022

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants