Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: merge reserved_ports into host_networks #13651

Merged
merged 13 commits into from
Jul 12, 2022
Merged

core: merge reserved_ports into host_networks #13651

merged 13 commits into from
Jul 12, 2022

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Jul 8, 2022

Fixes #13505

This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).

As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:

  1. Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
  2. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.

So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.

Sorry I snuck in a couple other refactorings. I really want to make this code more maintainable, so I tried to move it in that direction where I didn't think it would be a huge distraction (eg the interface{} -> string switch). I can back out any of that if you think it's best to keep this tight and focused.

@schmichael schmichael added this to the 1.3.2 milestone Jul 8, 2022
@schmichael schmichael requested review from lgfa29 and tgross July 8, 2022 00:19
// Node object handling by servers. Users should not be able to cause SetNode
// to error. Data that cause SetNode to error should be caught upstream such as
// a client agent refusing to start with an invalid configuration.
func (idx *NetworkIndex) SetNode(node *Node) error {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to miss in the diff, so I want to call this refactoring out:

SetNode used to return (collide bool, reason string) just like AddAllocs. Functionally returning an error really isn't any different, but I wanted to signal something different to developers: SetNode should never have a "collision!"

AddAllocs can have collisions. It's a normal part of trying to make placements and preemptions and failing. There's nothing "erroneous" about that.

SetNode on the the other hand should never return an error at runtime. A better way to put it is: anything that SetNode considers an error could have been caught upstream (like on the client agent) through validation. I thought the old call signature really sent the wrong message that SetNode collisions could Just Happen as a normal part of an optimistically concurrent scheduler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I wonder now what kind of actions we can take when an error is returned here. Maybe force a leadership transition to see if a new leader is able to handle the node? Or have a way to flush node state and request new fingerprint data from the node?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implication of "SetNode on the the other hand should never return an error at runtime" is that the node itself is giving us bad data because of a programmer error, so I'd expect the node to give us the same bad fingerprint again. At that point our invariants are broken so I'm not sure it's a good idea to try to recover rather than throw an error that fails scheduling -- we want this to be noisy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a very sad +1 to Tim. Although in cases we've encountered such as #13505 and #11830 it is possible for an operator to change their configuration to fix or workaround the "collision." How to get folks from this obscure error message to inspecting things like reserved ports seems extremely difficult though. The silver lining is that folks are filing issues and these errors are much easier to track down than collisions caused by allocations.

@@ -51,9 +56,9 @@ type NetworkIndex struct {
// NewNetworkIndex is used to construct a new network index
func NewNetworkIndex() *NetworkIndex {
return &NetworkIndex{
AvailAddresses: make(map[string][]NodeNetworkAddress),
AvailBandwidth: make(map[string]int),
HostNetworks: make(map[string][]NodeNetworkAddress),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was thinking it would be safer to create the TaskNetworks and GroupNetworks slices here too, but I'm realizing by leaving them out we effectively enforce that SetNode has been called before use (on pain of panic). So 👍 here

scheduler/propertyset.go Outdated Show resolved Hide resolved
NetIndex: netIdx.Copy(),
Node: option.Node,
})
iter.ctx.Metrics().ExhaustedNode(option.Node, "network: port collision")
iter.ctx.Metrics().ExhaustedNode(option.Node, "network: invalid node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great: the refactoring makes the reasoning more obvious to us as developers, but changing the message here will also make this case stand out loudly in plan metrics when we try to debug it.

nomad/structs/network.go Outdated Show resolved Hide resolved
nomad/structs/network.go Outdated Show resolved Hide resolved
@schmichael
Copy link
Member Author

I spent a long time playing with this today and wanted to share some findings:

A key aspect of Nomad's networking is that ports are reserved by IPs, not "networks" whether defined with network_interface or host_networks. This makes sense as (IP:Port) tuples are what matter at the end of the day. A network device can't have a port reservation but IP addresses on that device can.

So I realized what probably would be the most correct approach to reserved_port handling is:

  1. client.reserved.reserved_ports reserves ports on the IP addresses associated with the default network (this is the network defined by network_interface)
  2. client.host_networks[].reserved_ports reserve ports on the IP addresses defined by their CIDR or device.

This means port reservations may or may not overlap. For example:

  • The default network is 192.168.1.1 and reserves port 22
  • Host network eth0 is 192.168.1.0/24 and reserves port 80

The end result would be:

  • 192.168.1.1 has ports 22 and 80 reserved!
  • The rest of the 192.168.1.0/24 subnet would have only port 80 reserved.

But this isn't how existing code works, and all of the comments and docs seem to treat client.reserved.reserved_ports the way this PR does: as the global port reservations. This is conceptually simpler but does mean you can't reserve a port on the default network and not on a host_network. "Fixing" that behavior would be a backward compatibility issue.

That being said the output of nomad node status -verbose lists Host Networks without their global port reservations! So if the intended behavior is that they inherit reserved ports, we probably need to fix that as well.

command/agent/agent.go Outdated Show resolved Hide resolved
command/agent/command.go Outdated Show resolved Hide resolved
nomad/structs/network.go Outdated Show resolved Hide resolved
// Node object handling by servers. Users should not be able to cause SetNode
// to error. Data that cause SetNode to error should be caught upstream such as
// a client agent refusing to start with an invalid configuration.
func (idx *NetworkIndex) SetNode(node *Node) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I wonder now what kind of actions we can take when an error is returned here. Maybe force a leadership transition to see if a new leader is able to handle the node? Or have a way to flush node state and request new fingerprint data from the node?

nomad/structs/network.go Outdated Show resolved Hide resolved
nomad/structs/network_test.go Outdated Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the must in tests are fixed up this LGTM

website/content/docs/configuration/client.mdx Outdated Show resolved Hide resolved
// Node object handling by servers. Users should not be able to cause SetNode
// to error. Data that cause SetNode to error should be caught upstream such as
// a client agent refusing to start with an invalid configuration.
func (idx *NetworkIndex) SetNode(node *Node) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implication of "SetNode on the the other hand should never return an error at runtime" is that the node itself is giving us bad data because of a programmer error, so I'd expect the node to give us the same bad fingerprint again. At that point our invariants are broken so I'm not sure it's a good idea to try to recover rather than throw an error that fails scheduling -- we want this to be noisy.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes! The added comments in the structs are very helpful.

I added the release/1.1.x and release/1.2.x labels for backport.

@lgfa29 lgfa29 added backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line labels Jul 12, 2022
@schmichael schmichael merged commit f998a2b into main Jul 12, 2022
@schmichael schmichael deleted the b-fix-res-ports branch July 12, 2022 21:40
lgfa29 pushed a commit that referenced this pull request Jul 13, 2022
Fixes #13505

This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).

As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:

Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.
So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
lgfa29 pushed a commit that referenced this pull request Jul 13, 2022
Fixes #13505

This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports).

As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility:

Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly.
Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me.
So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.

Co-authored-by: Michael Schurter <[email protected]>
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reserved port duplicates/conflicts block all allocations
3 participants