-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS responses are cached #135888
Comments
Do you have systemd-resolved or unbound enabled? How does your /etc/nsswitch.conf and /etc/resolv.conf look like?
|
I'm testing it in a qemu VM with a minimal NixOS config, with no systemd-resolved/unbound: { lib, ... }: {
boot.initrd.availableKernelModules = [ "virtio_net" "virtio_pci" "virtio_mmio" "virtio_blk" "virtio_scsi" ];
boot.initrd.kernelModules = [ "virtio_balloon" "virtio_console" "virtio_rng" ];
boot.growPartition = true;
boot.loader.grub.device = "/dev/vda";
fileSystems."/".device = "/dev/disk/by-label/nixos";
fileSystems."/".fsType = "ext4";
fileSystems."/".autoResize = true;
services.getty.autologinUser = lib.mkForce "root";
} /etc/nsswitch.conf:
/etc/resolv.conf:
|
cc @arianvp (who did dig into nscd and its caching behaviour) |
Confirmed. When changing nameserver in /etc/resolv.conf, nslookup and dig resolve through new server but other applications return a cached result. No systemd-resolved/local DNS services or entries in /etc/hosts Restarting nscd yields the new entry.
|
Thanks for the digging! In that case, one more reason to work on #55276 :-) |
NSS modules are now globally provided (by providing a `/run/nss-modules` symlink), similar to how we handle OpenGL drivers. This removes the need for nscd as a proxy for all NSS requests, and avoids DNS requests leaking across network namespaces. While doing this upgrade, existing applications need to be restarted, so they know how to pick up NSS modules from `/run/nss-modules`. If you want to defer application restart to a later time, explicitly enable `nscd` via `services.nscd.enable` until the application restart. We can mix NSS modules from any version of glibc according to https://sourceware.org/legacy-ml/libc-help/2016-12/msg00008.html, so glibc upgrades shouldn't break old userland loading more recent NSS modules (and most likely, NSS modules are already loaded) Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment)
NSS modules are now globally provided (by providing a `/run/nss-modules` symlink). See the text added to `rl-2111.section.md` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment)
NSS modules are now globally provided by a symlink in `/run`. See the description in `add-extra-module-load-path.patch` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment) Co-authored-by: Erik Arvstedt <[email protected]>
NSS modules are now globally provided by a symlink in `/run`. See the description in `add-extra-module-load-path.patch` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment) Co-authored-by: Erik Arvstedt <[email protected]>
A quick note as we have a similar issue: the caching is so bad that we're affected by negative caches with an unknown but longish (at least multiple minutes) TTL and have to restart nscd. For us, we're debugging some annoyance where one upstream DNS server sometimes (but rarely) does something bad and responds either with a negative answer or times out and I'm a bit suspicious that nscd is caching that timeout as a negative entry (although that doesn't correspond with the timeout glibc fallback behaviour). |
I spent some time reading the glibc/nscd code. There was a "recent" change (5e74e6f85842892bc25da8e8c70d8dadd485941a) where the shared cache made a problem. I'm going to try running with a disabled shared cache... |
Enabling and disabling the shared flag did not change anything. However, I noticed that doing a negative lookup actually is not cached, but positive values are (based on our current nscd.conf). I wonder what value it is assuming. The code internally has some defaults like 3600 for positive values. Digging deeper. |
Check The semantics might have changed recently, though… |
Yeah, I'm aware of that. Any specific thing that you think I'm missing? I'm a bit worried that we do not have tests for this behaviour and that either the previous change was bogus or glibc changed its behaviour without us noticing. |
I'm a bit worried that we do not have tests for this behaviour and that either the previous change was bogus or glibc changed its behaviour without us noticing.
Yes, that.
|
Alright. I set up a test case that shows how to reproduce this and I based it on the original commit where nscd caching was supposedly disabled. It doesn't work even back then: f9a5a65801889df5848eff0d90b2edeee0fe390a I guess next step would be debugging nscd?!? Le sigh. Anyone got a better idea? |
I've been iterating a bit with @erikarvstedt on how to accomplish #55276 in a non-breaking fashion, which would put nscd out of the loop for most of the requests. This is still WIP though. |
Yeah, I've seen that. We're seeing some relevant breakage and I need to come up with a short term fix, though. |
If you don't need any custom NSS modules (not even nss-systemd for dynamic user resolution), you should be able to disable it already today - with the caveats mentioned in the docs.
|
Yeah, unfortunately we just started using the container integration with mymachines ... perfect timing ;) |
I did some more digging and the whole dance of how nscd works with timeouts and pruning the cache just seems off. I found a piece of code in the |
Ok, so here's a patch to glibc to just simply deactivate the cache function in NSCD completely. We likely would not want to ship it this way to the general userbase, but I could run it this way on our platform and if we're interested to use this upstream until you r work for ripping nscd out is done then we could add this as a configuration option to nscd or so. |
Here's the commit, I forgot that this is in a separate repo and won't be picked up through the issue id references: |
https://udrepper.livejournal.com/16362.html suggests:
i.e. perhaps modifying the As an aside, the default |
From my perspective (I'm interested in the |
NSS modules are now globally provided by a symlink in `/run`. See the description in `add-extra-module-load-path.patch` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment) Co-authored-by: Erik Arvstedt <[email protected]>
NSS modules are now globally provided by a symlink in `/run`. See the description in `add-extra-module-load-path.patch` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment) Co-authored-by: Erik Arvstedt <[email protected]>
NSS modules are now globally provided (by providing a `/run/nss-modules` symlink). See the text added to `rl-2111.section.md` for further details. Fixes: NixOS#55276 Fixes: NixOS#135888 Fixes: NixOS#105353 Cc: NixOS#52411 (comment)
@Stale not stale |
Me and @NinjaTrappeur took a closer look at the nscd protocol and codebase. It's not really possible to run it in a pure "no caching mode. However, we found a good replacement: nsncd. Try the following snippet to switch your system(s) to a version containing all the PRs: https://gist.github.com/flokli/b1b0a1d2c0b7ba6e73101e1447812114 I hope this gets included soon upstream. On top of #194916, I also added an integration test for nsncd to It would be nice if more people could test this! |
I've been running this patch today. So far I did not hit any bug. I don't use any dodgy/segfaulting NSS module that being said. The post-boot/resume firefox name resolution issues I was experiencing are gone. |
I sent a proper PR to nixpkgs, see #196917. |
Enabling nsncd through the new option fixed it for me too! I hated that issue... |
Good to hear! Let's see if we here any negative reports, otherwise we should probably default to this after some more testing… |
I think that this issue can be closed, nsncd is now used by default: #214153 |
Yes, thanks for the ping. |
According to #89274, NixOS uses nscd only for dispatching nss modules, and caching functionality of nscd is disabled by default. But when I run any application that resolves the same DNS name in a loop on a clean NixOS system, I observe that DNS packets are not sent on each request, they are only sent after ttl elapses. It means that other requests are served from some local cache. Only if I stop nscd service I see the packets being sent on each request.
A simple script to reproduce this:
while true; do getent ahosts github.com; sleep 1; done
Is there some component other than nscd that does this caching, or does nscd itself needs some extra configuration to actually disable caching?
cc @flokli
The text was updated successfully, but these errors were encountered: