-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing affinity or spreads prevents in-place upgrade #6988
Comments
@michaeldwan not an edge case, we have the same usage :) |
Do you have any thoughts on this issue? We're considering a downgrade to 0.10.1 or running a fork, but if you're open to a fix I'm happy to help. Thanks! |
Hey @michaeldwan sorry for the delay.. The change to make changes to spreads/affinities and constraints not in-place was intentional. Currently on spread/affinity changes, we recompute scores and we don't correlate those with existing allocs so we can't currently just compute the diff and not reschedule running allocs. In 0.10.4 we are removing the penalty for the allocations previous node, that will bias us a bit towards keeping new allocs on the same node, but will unfortunately still be a new alloc. I think as a future improvement we can treat changes to spreads/affinity similar to count upgrades, so adhering allocs stay as in-place upgrade and only re-balance a few remaining allocs to achieve the goal / change to the job. Our team plans to discuss alternative placement algorithm design in the near future and we'll keep this ticket updated. |
@drewbailey This seems like a pretty big philosophical regression. One of the things that makes nomad easy to work with is its bias toward not disrupting jobs. |
This PR reverts changes introduced in #6703 that made changes to affinities and spreads cause destructive updates. #6988 outlines good reason to rethink this until we have scheduler functionality in place that treats and update to spread/affinity similar to how count changes are handled (rebalancing the difference instead of rescheduling all allocs).
Hi All, After discussing with the team, we have decided not to revert this functionality for 0.10.4. We understand that in its current form changes to spreads/affinities causing all allocations to be rescheduled is not ideal. Prior to #6703 changes to spreads/affinities completely ignored running allocations, which was incorrect behavior. Ideally Nomad's scheduling would take into account that a certain amount of allocations were already satisfied by a change to spread/affinity, and make the most minimal changes to running allocations necessary in order to be correct and complete. This is something the team is currently investigating and researching for a future release. We understand that this leaves a very valid use case in the not so great position of needing to have all allocations rescheduled. In the meantime would it be possible to manually spread the east/west groups by using two separate jobs? This is far from ideal but should allow you to tune the count of each job and not reschedule all allocations when adjusting small numbers. We would love to hear from the community about what changes to affinities and spreads you would like to see in a future release as we continue to think of an ideal solution for Nomad. Please comment below with your ideas and use cases. |
Hey there Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this. Thanks! |
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.10.2
Operating system and Environment details
ubuntu + custom firecracker task driver
Issue
0.10.2 stopped doing in-place upgrades when only spreads, affinity, or constraints change. The PR (#6703) and issue (#6334) for the change make sense, though for our use case it’s a regression.
We’re using spread+affinity+counts to help place allocs in regions near traffic and adjusting them as traffic changes. For example, a task group with a count of 100 is spread 50/50 between us-east and us-west. If we changed the spread to 49/51, the old behavior would upgrade 100 allocs in-place and prefer us-west for the next allocation, whereas now 100 allocs are stopped only to start 98 in likely the same place as before.
I’m fairly sure we’re an edge case, but the new behavior doesn’t seem ideal. Would you be open to making this behavior configurable? Are custom scheduler plugins possible?
The text was updated successfully, but these errors were encountered: