-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
document expected evaluation counts #13468
Comments
Fantastic write up. Add in some charts and diagrams, then you've got a stew going. |
I've added some missing bits about deployments and periodic jobs. There's a lot more detail to some of these that we'll want to capture in the actual docs, but I want to make sure we're not missing any of the big ones here. |
Amazing write up Tim. I spent a quality 90 minutes carefully reading through it and comparing with actual cluster sizes/allocations/etc... Follow up on one item:
I was wondering if this applies to all our jobs given that we use the spread (instead of binpack) scheduler for our jobs. So, if we had a node with 62 allocations, and that node went down, and there were 10,000 nodes remaining in the cluster, does that mean that 62 * 10,000 or 620,000 evaluations will occur? And, for completeness, if we had 1000 nodes restart each with 62 allocations on them, would we see 620M evaluations occur if 10,000 nodes remained in the cluster? ChrisL asked:
|
No, they work differently and have different purposes:
This is why adding the Your question made me remember that we recently changed the behavior so that |
Fixed by #14750 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
The number of evaluations generated by Nomad is an important metric for operators to understand its performance. We should improve the documentation on the expected number of evaluations we create and what this means for how many nodes are processed by the scheduler.
Some example events that create Evaluations:
system
job in the cluster, plus...node_endpoint.go#L1445-L1548
For each Evaluation in raft:
spread
, the scheduler will evaluate (process) nodes until it finds 2 feasible nodes to score.spread
, the scheduler will evaluate (process) a number of nodes equal to the task group count, or 100, whichever is less.Some concrete examples:
The text was updated successfully, but these errors were encountered: