-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad 1.4.0-rc1 user lookup fails with NSS #14737
Comments
It looks like in 1.3.x this issue was impacting the exec driver only #13047, but now in 1.4 it is impacting all jobs from what I can see. |
From #13047 it talked about https://pkg.go.dev/os/user#Lookup being used.
It looks like the Makefile is setting the
|
Hi @jdoss so the reason we switched to Switching off the CGO implementation was intended to get us shipping but we never circled back to it (see my unfortunate comment here: #14235 (comment) 😀 ). We'd done some smoke-testing with some typical "production" distros and didn't run into any problems, but obviously we've missed some cases. This leaves us with a few options:
Exposing users to crashes isn't going to be an option, but I'd rather not ship 1.4.0 with broken support for folks with NSS either. Seeing as how we're in the release candidate window for Nomad 1.4.0, I'm going to spend a bit of time today seeing how feasible option (3) is on short notice. |
@jdoss we've just merged #14742 with what we think will fix the underlying bug, and will remove the Thanks for trying out the RC and catching this before we went GA! |
Thanks for the very clear bug report, @jdoss! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Output from
nomad version
Nomad v1.4.0-rc.1 (6aa153c)
Operating system and Environment details
Fedora CoreOS 36.20220906.3.2
Issue
Nomad 1.4 can no longer allocate tasks if the
nobody
user is missing from/etc/passwd
. Nomad 1.3.x works just fine on Fedora CoreOS. When launching a job on 1.4 I get this error on the task:Fedora CoreOS uses NSS with the https://github.com/aperezdc/nss-altfiles module so it can split the machine state of
/etc
which is owned by users and the OS controlled state in/usr/lib/
.You can read more context on the FCOS issue here coreos/fedora-coreos-tracker#1197 (comment)
You can see that the nobody user in fact does exist:
I am not sure what has changed in 1.4 that would cause Nomad to not use NSS. Manually copying nobody entries from
/usr/lib/passwd
to/etc/passwd
gets things working as expected.Nomad should use NSS rather than just reading
/etc/passwd
directly when checking for users.Job file (if appropriate)
https://juicefs.com/docs/csi/csi-in-nomad/
The text was updated successfully, but these errors were encountered: