-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow wildcard datacenters to be specified in job file #11170
Conversation
Hi @jmwilkinson and thanks so much for this. I have marked this for our next major release meaning we won't review this immediately, but will perform this once that development cycle is open to merging into main. |
We would love to have this ASAP; is there a date set for 1.2 release? |
The Nomad team loves this feature but have some concerns that need to be addressed before we merge it: filepath.MatchWe should not use Another factor is that we will likely need to implement this same logic in the UI so we can render the valid datacenters for a job even if it is using globbing. What if we only allowed Client ValidationI think we should restrict the use of special characters in datacenter names. We could allow escaping them to disambiguate between Sadly we currently only restrict datacenter from using null bytes ( UpgradingWe need to inform users of the datacenter restrictions in our upgrade guide documentation: https://www.nomadproject.io/docs/upgrade/upgrade-specific While we normally try to avoid breaking backward compatibility in this way, I think it's very unlikely people use multiregion.region[*].datacentersSince this is an Enterprise Only feature the Nomad team would have to implement it. This isn't a problem and shouldn't prevent the merging of this improvement. TestingIn particular we should test that a system job scheduled against |
That's a good point about As an alternative, what about this linear time glob function which Russ Cox based the I've dropped it in an it works well. It's also easy enough to remove the single character matching block but I think that is useful, especially as it keeps it in line with the original glob behavior which devs are most familiar with. I added the validation as well. I'm trying to change as little as possible, but with respect to the validation, I think it would be valuable to report on precisely which characters are invalid. Of course, that requires more changes and more work. I've also tried to add some documentation, but I'm quite sure how it should look or what the process is for that, so I'll remove it or update it as necessary. |
This change is quite useful, could a specific target milestone be set to know when to expect it? :) |
Hey @dpogorzelski, sorry about the delay on this. Internally we agree its a good feature, but there are some tweaks we have to make to get this merged - the UI and client updates that @schmichael mentioned. Since this would include breaking changes, we'll need to include this in a major release, and unfortunately I don't think we'll be able to sneak it into 1.3. Once we ship 1.3, I'll revisit this to see if we can queue it up early in the 1.4 cycle. I'll update publicly if we're committing to it for that release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow replies. We still want to get this in, but probably only in a 1.X.0 release like 1.3.0 or 1.4.0. We need to make sure we have time to add UI support.
Is there any updates? |
Hey @kinnalru, this update is still the latest - #11170 (comment) Unfortunately, this won't make 1.3 (beta coming in a few weeks!), but we'll try to queue it up early in the 1.4 cycle so it can definitely make that major release. |
It seems like this one is still not part of the 1.4.0 milestone. Is that intentional? :) |
Hi folks! 👋 We've had a few requests to try to ship this PR. Because it's got some related backwards compatibility warnings, we're going to ship this in the upcoming Nomad 1.5.0. I've rebased this PR on main and today I'll be addressing the remaining comments to land the great work that @jmwilkinson has already contributed. |
I've updated the PR to pick up all the review comments, added documentation, fixed up some tests, and make sure that we can detect node updates correctly. I've had a look at the multiregion bits and it doesn't look like there's anything to actually do there, but I'll verify that once we've merged this PR and had the OSS->ENT sync run. There's one open discussion we're having internally about whether or not we should have a default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note on the upgrade path, but I'm not sure we need that much detail.
Overview
This PR addresses the issue #9024
It uses the
filepath.Match
golang std lib to accomplish the matching, which is somewhat simpler and more intuitive than using a regex. Because of that simplicity, it should be minimally impactful.It is not completely backwards compatible, as datacenters with "*", "]", "?" in their name may be impacted. I do not believe the requested feature could be implemented without a theoretical compatibility break, or an entirely new property (which feels worse).
Notes
The docs do not appear to be part of the project, at least not that I could see, so they would need to be updated as well.
The returned list of datacenters will no longer have keys with a node count of 0. This is because we cannot a priori determine the number of datacenters based on wildcard dc specs. So we have to allocate a map, then add and increment each dc as we match it.
I do not know what, if any, impact this will have.
Example
Interactions with other features
Spread
There was a comment in the feature request thread about the interaction with the spread stanza. As far as I can tell, the nodes are set on the stack based on the results of the
readyNodesInDCs
function, and the stack then uses those nodes when computing selections, both for spread and for bin packing. So it seems like it should just work, and my limited testing with spread has done what was expected, but I could well be missing something.multiregion.region[*].datacenters
There was another comment about parity with
multiregion.region[*].datacenters
for interpolation reasons. While I'd image the job spec that is received by the scheduler, which has theDatacenters
property used byreadyNodesInDCs
, has that property set to the applicable region value, or the default job value, by the time it reaches the function, I have been unable to confirm that.As multiregion requires Nomad Enterprise, I have also been unable to test it.