Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log client node plan rejection at INFO instead of DEBUG #11191

Closed
wants to merge 2 commits into from

Conversation

davemay99
Copy link
Contributor

When a Nomad client node rejects a plan to place an allocation, such as the case when a port collision has occurred, Nomad currently logs the plan rejection at DEBUG level. To improve visibility and debuggability this PR increases the logging level to INFO.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What action do you expect users to take in response to this log line?

Plan rejection should be an infrequent but unexceptional outcome of Nomad's optimistically concurrent scheduling: when 2 scheduler workers generate plans simultaneously they may have conflicts. The plan applier's job is to process plans one-to-at-a-time and reject conflicting plans (so the second one processed).

When a plan is rejected the evaluation is reprocessed against the new state of the cluster and a new plan is submitted.

Therefore there's really nothing user actionable about plan rejections: they're generated by Nomad and resolved by Nomad. A user has no control over them.

If a plan is being rejected erroneously due to a bug in the scheduler logic, we should fix that. If we wish to increase visibility into these bugs I think ticking a metric every time a plan is rejected would be easier to notice then increasing log verbosity. If the metric spikes, especially in the absence of high scheduler load, then the nomad operator debug or nomad monitor commands can be used to see exactly what is being rejected and why.

@schmichael
Copy link
Member

What action do you expect users to take in response to this log line?

Sadly we now know the answer to this: restart the node in the log line. By the time we verified this bug and discovered the workaround I had forgotten about this issue entirely and submitted basically the same thing in #11416. Closing this as dupe.

Dave was right! We'll get to the root cause of #9506 ASAP.

@schmichael schmichael closed this Nov 3, 2021
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants