-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: merge reserved_ports into host_networks #13651
Conversation
7933add
to
d1b652c
Compare
// Node object handling by servers. Users should not be able to cause SetNode | ||
// to error. Data that cause SetNode to error should be caught upstream such as | ||
// a client agent refusing to start with an invalid configuration. | ||
func (idx *NetworkIndex) SetNode(node *Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to miss in the diff, so I want to call this refactoring out:
SetNode used to return (collide bool, reason string)
just like AddAllocs. Functionally returning an error really isn't any different, but I wanted to signal something different to developers: SetNode should never have a "collision!"
AddAllocs can have collisions. It's a normal part of trying to make placements and preemptions and failing. There's nothing "erroneous" about that.
SetNode on the the other hand should never return an error at runtime. A better way to put it is: anything that SetNode considers an error could have been caught upstream (like on the client agent) through validation. I thought the old call signature really sent the wrong message that SetNode collisions could Just Happen as a normal part of an optimistically concurrent scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I wonder now what kind of actions we can take when an error is returned here. Maybe force a leadership transition to see if a new leader is able to handle the node? Or have a way to flush node state and request new fingerprint data from the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implication of "SetNode on the the other hand should never return an error at runtime" is that the node itself is giving us bad data because of a programmer error, so I'd expect the node to give us the same bad fingerprint again. At that point our invariants are broken so I'm not sure it's a good idea to try to recover rather than throw an error that fails scheduling -- we want this to be noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a very sad +1 to Tim. Although in cases we've encountered such as #13505 and #11830 it is possible for an operator to change their configuration to fix or workaround the "collision." How to get folks from this obscure error message to inspecting things like reserved ports seems extremely difficult though. The silver lining is that folks are filing issues and these errors are much easier to track down than collisions caused by allocations.
@@ -51,9 +56,9 @@ type NetworkIndex struct { | |||
// NewNetworkIndex is used to construct a new network index | |||
func NewNetworkIndex() *NetworkIndex { | |||
return &NetworkIndex{ | |||
AvailAddresses: make(map[string][]NodeNetworkAddress), | |||
AvailBandwidth: make(map[string]int), | |||
HostNetworks: make(map[string][]NodeNetworkAddress), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was thinking it would be safer to create the TaskNetworks
and GroupNetworks
slices here too, but I'm realizing by leaving them out we effectively enforce that SetNode
has been called before use (on pain of panic). So 👍 here
NetIndex: netIdx.Copy(), | ||
Node: option.Node, | ||
}) | ||
iter.ctx.Metrics().ExhaustedNode(option.Node, "network: port collision") | ||
iter.ctx.Metrics().ExhaustedNode(option.Node, "network: invalid node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great: the refactoring makes the reasoning more obvious to us as developers, but changing the message here will also make this case stand out loudly in plan metrics when we try to debug it.
6c266f9
to
30584d4
Compare
I spent a long time playing with this today and wanted to share some findings: A key aspect of Nomad's networking is that ports are reserved by IPs, not "networks" whether defined with So I realized what probably would be the most correct approach to
This means port reservations may or may not overlap. For example:
The end result would be:
But this isn't how existing code works, and all of the comments and docs seem to treat That being said the output of |
// Node object handling by servers. Users should not be able to cause SetNode | ||
// to error. Data that cause SetNode to error should be caught upstream such as | ||
// a client agent refusing to start with an invalid configuration. | ||
func (idx *NetworkIndex) SetNode(node *Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I wonder now what kind of actions we can take when an error is returned here. Maybe force a leadership transition to see if a new leader is able to handle the node? Or have a way to flush node state and request new fingerprint data from the node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the must
in tests are fixed up this LGTM
// Node object handling by servers. Users should not be able to cause SetNode | ||
// to error. Data that cause SetNode to error should be caught upstream such as | ||
// a client agent refusing to start with an invalid configuration. | ||
func (idx *NetworkIndex) SetNode(node *Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implication of "SetNode on the the other hand should never return an error at runtime" is that the node itself is giving us bad data because of a programmer error, so I'd expect the node to give us the same bad fingerprint again. At that point our invariants are broken so I'm not sure it's a good idea to try to recover rather than throw an error that fails scheduling -- we want this to be noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes! The added comments in the structs are very helpful.
I added the release/1.1.x
and release/1.2.x
labels for backport.
Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR. Co-authored-by: Michael Schurter <[email protected]>
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #13505
This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (
client.reserved.reserved_ports
) "down" into more specific stanzas (client.host_networks[].reserved_ports
).As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:
reserved_ports
on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their-network-interface
andhost_network
addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
Sorry I snuck in a couple other refactorings. I really want to make this code more maintainable, so I tried to move it in that direction where I didn't think it would be a huge distraction (eg the
interface{} -> string
switch). I can back out any of that if you think it's best to keep this tight and focused.