-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reserved port duplicates/conflicts block all allocations #13505
Comments
Hi @groggemans and thanks for raising this. I was able to reproduce this locally using Running the agent in dev mode using the following additional config snippet via
Attempting to run the following minimal jobspec via
|
@groggemans would you be able to expand a little on the use cases you have mentioned in the below quote?
|
Reproduced with the following agent configuration in client {
reserved {
reserved_ports = "22"
}
host_network "eth" {
cidr = "192.168.0.0/16"
reserved_ports = "22"
}
} Simply run with I think there are 2 possible fixes here:
I think the 1st option is the friendliest here, and I've done brief testing to see that both options seem viable. If there aren't any other ideas I'll probably push a PR tomorrow to do the 1st. |
I agree that just failing to start might not be the ideal solution and would also prefer the first option in that case. However something that isn't addressed in the 1st option is if multiple aliases for the same network define the same ports. The specific use case I see for multiple aliases for the same network is in a cluster with mixt network configurations. You might not want to adapt all the jobs. And on certain nodes there might be a separate management and lan network, on others it might be a single interface. From a configuration standpoint this would then result in the one interface being identified as both LAN and MGMT on that second node while the first has actual separate interfaces. I have a specific use case like that in my local cluster, where all/most machines have the same networks defined, but not all use the same network config, because some are on wifi or don't have multiple interfaces/vlan's etc. My use case is probably better solved by being able to attach "tags" to interfaces and being able to filter on those. But the current alias system is able to offer a similar experience although with a bit more config. So it's probably not worth the development effort at this point. |
Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
@groggemans fix will be going out in 1.3, 1.2, and 1.1 soon! The way the agent configuration maps to reserved ports on IPs is still a little harder to observe then I'd like, but we have plans for more networking work in the future! Feel free to open new issues with any bugs or ideas you might have. |
Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.
Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR. Co-authored-by: Michael Schurter <[email protected]>
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.3.1 (2b054e3)
Operating system and Environment details
Linux
Issue
Any duplicate port reservations or conflicts within the agent config will result in all job allocations failing with a port conflict. Even jobs that don't have any port specification will fail.
It can be debated if this is a bug, as there are only very specific use cases where you have multiple aliases for the same network or have reserved port ranges defined on a global and host_network specific level. You could argue that in this case it would be ok if people can only define the port reservation on one of the network aliases or on the global level and only have network specific ports in the alias config. But it feels rather unnatural to me and in that case should have some giant warnings in the documentation about this.
Reproduction steps
Either have a global port reservation overlap a host_network reservation or have multiple aliases for the same address range reserve the same ports.
Expected Result
In the past the port reservations merged and didn't cause any conflicts.
Actual Result
Allocations fail because of port conflicts, even if the job isn't reserving any ports
Job file (if appropriate)
Anny job will do, I've tested locally with the example job (
nomad job init
)agent config snipped
The conflict also happens if there's a
reserved
block including a port which is also declared in one of the host_network definitions.Nomad Server logs (if appropriate)
Task Group "cache" (failed to place 1 allocation):
The text was updated successfully, but these errors were encountered: