Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cni: add DNS set by CNI plugins to task configuration #20007

Merged
merged 3 commits into from
Feb 20, 2024
Merged

cni: add DNS set by CNI plugins to task configuration #20007

merged 3 commits into from
Feb 20, 2024

Conversation

tgross
Copy link
Member

@tgross tgross commented Feb 16, 2024

CNI plugins may set DNS configuration, but this isn't threaded through to the task configuration so that we can write it to the /etc/resolv.conf file as needed. Add the AllocNetworkStatus to the alloc hook resources so they're accessible from the taskrunner, which will prepend the DNS entries to any entries provided by the user.

Fixes: #11102, which I bumped up against while working on #10628

CNI plugins may set DNS configuration, but this isn't threaded through to the
task configuration so that we can write it to the `/etc/resolv.conf` file as
needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're
accessible from the taskrunner, which will prepend the DNS entries to any
entries provided by the user.

Fixes: #11102
@tgross tgross added backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line backport/1.7.x backport to 1.7.x release line labels Feb 16, 2024
Options: netStatus.DNS.Options,
}
} else {
dns.Servers = append(netStatus.DNS.Servers, dns.Servers...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the goal here, I am not sure if merging DNS is the proper way to go here. For instance if options has rotate set then there would be no guarantee that the first server is actually used. https://developer.hashicorp.com/nomad/docs/job-specification/network#dns also says that the defaults from the client host are used, so we don't necessarily know what is set etc… I fear that this might end up being a bit fragile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you've noted, there are three potential sources for DNS configuration here: the output of CNI plugins, the user-provided DNS config, and whatever is on the client. This code path only involves the first two. The task driver is then responsible for what happens after that... the built-in task drivers use GenerateDNSMount to create the /etc/resolv.conf. (So for example, you could have a third-party task driver that requires a particular CNI plugin be used, in which case it'd use that value alone and not get anything from the client host.)

All that being said, any DNS configuration from the CNI plugins is frankly likely to conflict with any jobspec network.dns configuration. Our default network configuration for bridge networking doesn't generate any DNS values. So if you're intentionally using a CNI plugin that does, you should know not to set conflicting values on the network.dns block. Otherwise yes, it's fragile... we want to give users knobs here but its up to them to use them responsibly.

Could definitely do with some warnings around network.dns usage in the docs though.

Copy link
Member Author

@tgross tgross Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having second thoughts on the order though here... lemme get some input from the team on that. (And happy to hear your thoughts as well!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since DNS is usually at fault when there are issues I'd prefer a "stable" configuration like the following:

  • By default use the resolv.conf like now (ie I have a custom DNS set in docker which nicely ends up in the resolv.conf). Now here being as in current master before this patch.
  • If DNS is set by CNI, override the default.
  • if DNS is set in the task, override also whatever was set before. I would consider setting DNS on the task level a rather specific and advanced configuration, people doing that can as well set a DNS server that knows how to handle .consul

In a perfect world I'd love to be able to customize the template for the default bridge CNI configuration (with and without tproxy) so I could globally override that for all bridge users without having to go into every task and change mode to cni/my_cni.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, that's pretty much what the rest of the Nomad team suggested we do here, so I was definitely off the rails a bit 😀

In a perfect world I'd love to be able to customize the template for the default bridge CNI configuration (with and without tproxy) so I could globally override that for all bridge users without having to go into every task and change mode to cni/my_cni.

Definitely an interesting idea. A little challenging from the perspective of fingerprinting, because if there's behavior the application needs from that specific bridge configuration there's no reasonable way for us to tell the server which clients have that configuration set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this would require fingerprinting. In it's easiest implementation one could just specify an alternate template in the client-configuration for

const nomadCNIConfigTemplate = `{
(we probably need templating to play nicely with subnets etc…).

We could even add cni_consul to that list by default (assuming nomad) ships it or so and have it activate/deactivate based on cni args. This way the same template could be used independent of whether tproxy is used or not.

Copy link
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only minor questions and nits. LGTM!

client/allocrunner/alloc_runner.go Outdated Show resolved Hide resolved
client/allocrunner/taskrunner/task_runner.go Outdated Show resolved Hide resolved
client/allocrunner/taskrunner/task_runner.go Outdated Show resolved Hide resolved
client/allocrunner/taskrunner/task_runner_test.go Outdated Show resolved Hide resolved
@tgross tgross merged commit 45b2c34 into main Feb 20, 2024
20 checks passed
@tgross tgross deleted the b-cni-dns branch February 20, 2024 15:17
tgross added a commit that referenced this pull request Feb 20, 2024
CNI plugins may set DNS configuration, but this isn't threaded through to the
task configuration so that we can write it to the `/etc/resolv.conf` file as
needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're
accessible from the taskrunner. Any DNS entries provided by the user will
override these values.

Fixes: #11102
tgross added a commit that referenced this pull request Feb 20, 2024
CNI plugins may set DNS configuration, but this isn't threaded through to the
task configuration so that we can write it to the `/etc/resolv.conf` file as
needed. Add the `AllocNetworkStatus` to the alloc hook resources so they're
accessible from the taskrunner. Any DNS entries provided by the user will
override these values.

Fixes: #11102
tgross added a commit that referenced this pull request Mar 22, 2024
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174
tgross added a commit that referenced this pull request Mar 22, 2024
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174
tgross added a commit that referenced this pull request Mar 22, 2024
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174
tgross added a commit that referenced this pull request Mar 22, 2024
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174
tgross added a commit that referenced this pull request Mar 22, 2024
…rd into release/1.6.x (#20192)

In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174

Co-authored-by: Tim Gross <[email protected]>
tgross added a commit that referenced this pull request Mar 22, 2024
…rd into release/1.5.x (#20191)

In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174

Co-authored-by: Tim Gross <[email protected]>
philrenaud pushed a commit that referenced this pull request Apr 18, 2024
In #20007 we fixed a bug where the DNS configuration set by CNI plugins was not
threaded through to the task configuration. This resulted in a regression where
a DNS override set by `dockerd` was not respected for `bridge` mode
networking. Our existing handling of CNI DNS incorrectly assumed that the DNS
field would be empty, when in fact it contains a single empty DNS struct.

Handle this case correctly by checking whether the DNS struct we get back from
CNI has any nameservers, and ignore it if it doesn't. Expand test coverage of
this case.

Fixes: #20174
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line backport/1.7.x backport to 1.7.x release line theme/cni type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CNI DNS config ignored for Docker task
3 participants