-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: emit node evals only for sys jobs in dc #12955
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This feels like a good place to look for additional optimizations in reducing evals too!
@tgross I was hoping to just get a PR up, but an issue is probably a good idea: #12981 I'm tempted to base further work off this PR and get them all merged in a group (hence the Draft status)... but that might just be complicating things for no reason. Let me know if you have a preference for how to handle a batch of small eval improvements. |
Yeah sorry I may have jumped the gun in reviewing here knowing that you've got an active investigation underway. But batching them up seems fine. Ideally each fix is a discrete commit within the PR to help reviewability? I'll dismiss my review for now and re-review as needed. |
Whenever a node joins the cluster, either for the first time or after being `down`, we emit a evaluation for every system job to ensure all applicable system jobs are running on the node. This patch adds an optimization to skip creating evaluations for system jobs not in the current node's DC. While the scheduler performs the same feasability check, skipping the creation of the evaluation altogether saves disk, network, and memory.
9f70ede
to
e7d9773
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Hooray! Christmas in July! |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Whenever a node joins the cluster, either for the first time or after
being
down
, we emit a evaluation for every system job to ensure allapplicable system jobs are running on the node.
This patch adds an optimization to skip creating evaluations for system
jobs not in the current node's DC. While the scheduler performs the same
feasability check, skipping the creation of the evaluation altogether
saves disk, network, and memory.