Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20.09] nixos/acme regression: chown: Operation not permitted #115976

Closed
bjornfor opened this issue Mar 11, 2021 · 12 comments · Fixed by #116369
Closed

[20.09] nixos/acme regression: chown: Operation not permitted #115976

bjornfor opened this issue Mar 11, 2021 · 12 comments · Fixed by #116369
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@bjornfor
Copy link
Contributor

Describe the bug
An recent change on release-20.09 broke acme on my system:

acme-bforsman.name-start[22707]: + mkdir -p /var/www/challenges//.well-known/acme-challenge
acme-bforsman.name-start[22707]: + chown acme:acme /var/www/challenges//.well-known /var/www/challenges//.well-known/acme-challenge
acme-bforsman.name-start[22710]: chown: changing ownership of '/var/www/challenges//.well-known': Operation not permitted
systemd[1]: acme-bforsman.name.service: Main process exited, code=exited, status=1/FAILURE

It's trying to chown a path that's currently owned by root:

$ ls -ld /var/www/challenges/.well-known
drwxr-xr-x 3 root root 4096 May 28  2019 /var/www/challenges/.well-known

I track nixpkgs versions in my setup, so the working state is 2394284 and the broken state is 8d82c86.

To Reproduce
(Not sure if this is easily reproducible, but...)

Have a config like

  security.acme.certs = {
    "${domainName}" = {
      email = adminEmailAddr;
      webroot = "/var/www/challenges/";
      extraDomainNames = [ /* redacted */ ];
    };
  };

and do an upgrade from NixOS 20.09 @ git commit 2394284 to 8d82c86.

Expected behavior
That services that used to work don't get permission error and fail on the stable branch.

Additional context
I rolled back to the known good version, that failed now(!), I rolled forward again and manually fixed up the permissions. Now I'm getting this error:

Could not create client: get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get "https://acme-v02.api.letsencrypt.org/directory": dial tcp: lookup acme-v02.api.letsencrypt.org: device or resource busy

And later: I manually restarted the acme service. Now everything seems fine.

Notify maintainers

CC @NixOS/acme
CC @m1cr0man due to their acme commits on release-20.09.

Metadata

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.4.100, NixOS, 20.09.git.2394284 (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.10`
 - nixpkgs: `/etc/current-nixpkgs`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
  - nixos/modules/security/acme.nix
@bjornfor bjornfor added the 0.kind: bug Something is broken label Mar 11, 2021
@bjornfor
Copy link
Contributor Author

1f05492 seems to have removed a log message that I would have found helpful here:

Please fix the permissions under ${data.webroot}/.well-known/acme-challenge

That changes the error from a "this module is broken" to "this module is guiding me to do a manual data migration". That's a big deal.

Luckily 1f05492 did add set -x, which was very helpful.

@m1cr0man
Copy link
Contributor

I'm glad you found the old error message - that is indeed the issue here. .well-known + .well-known/acme-challenge need to be owned by acme + the configured group.

I will add back in the echo in my next PR. Thanks for pointing that out.

@bjornfor
Copy link
Contributor Author

In a way it's weird that the code tries to do chown as non-root user, because it'll always fail if the ownership is not already correct. But maybe the point is just to verify the ownership?

Thanks, adding back that message will help. Maybe even print the command the user should run ( sudo chmod ...)?

@m1cr0man
Copy link
Contributor

In a way it's weird that the code tries to do chown as non-root user, because it'll always fail if the ownership is not already correct. But maybe the point is just to verify the ownership?

Verifying the owner is correct, but setting the group is the main intention since it can be changed between rebuilds (and will work in normal circumstances). I guess chgrp would be clearer here now that I think about it, and would still raise and appropriate error if the owner is not acme. I'll add it to the list 👍 Thanks!

@bjornfor
Copy link
Contributor Author

From an end-user perspective, I wonder why the data migration isn't happening automatically. Why cannot NixOS run the needed command as root? Is there risk of data loss?

@veprbl veprbl added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 0.kind: regression Something that worked before working no longer labels Mar 12, 2021
@Patagonicus
Copy link
Contributor

I stumbled upon this today. I'm slightly confused: Why does the module need ownership of ${data.webroot}/.well-known? Shouldn't ownership of ${data.webroot}/.well-known/acme-challenge be enough?

I had the ownership of the acme-challenge directory correct, but .well-known was owned by root, so it broke for me. But my understanding of the challenge is that it should only write/delete files in the acme-challenge subdirectory.

@m1cr0man
Copy link
Contributor

The reason .well-known is also chown'ed is that lego (the ACME client that the module uses) tries to create acme-challenge every time, and will throw a permissions error before checking if it exists. I thought I had a copy of the error in #106857 comments somewhere but I can't find it now, so I will test it once again to make sure.

As for a more permanent fix, I think I will move these ownership fixes into the acme-fixperms.service which runs as root.

@m1cr0man
Copy link
Contributor

After some testing and some source code reading, I found that I don't actually need to fix .well-known any more. I'm almost certain this wasn't the case at one point, but alas removing it will resolve issues here.

I have stripped the problematic part of the renewal script down to its bare minimum - it now only runs the following commands:

mkdir -p '${data.webroot}/.well-known/acme-challenge' && chgrp '${data.group}' ${data.webroot}/.well-known/acme-challenge

The chgrp is required as the configured group can be changed between runs. Arguably this could be removed too if you assume that acme will always own this folder, and the changes I've made to solve #114751. (Set UMask to 0022 for the service) make it world readable. In an abundance of caution, I am going to keep it around for now.

m1cr0man added a commit to m1cr0man/nixpkgs that referenced this issue Mar 15, 2021
With the UMask set to 0023, the
mkdir -p command which creates the webroot
could end up unreadable if the web server
changes, as surfaced by the test suite in NixOS#114751
On top of this, the following commands
to chown the webroot + subdirectories was
mostly unnecessary. I stripped it back to
only fix the deepest part of the directory,
resolving NixOS#115976, and reintroduced a
human readable error message.
@arianvp
Copy link
Member

arianvp commented Mar 18, 2021

I completely wiped my /var/lib/acme directory but starting acme fails with:

Mar 18 18:10:38 arianvp.me acme-arianvp.me-start[17709]: + mkdir -p /var/lib/acme/acme-challenge/.well-known/acme-challenge
Mar 18 18:10:38 arianvp.me acme-arianvp.me-start[17710]: mkdir: cannot create directory ‘/var/lib/acme/acme-challenge’: Permission denied
Mar 18 18:10:38 arianvp.me systemd[1]: acme-arianvp.me.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 18:10:38 arianvp.me systemd[1]: acme-arianvp.me.service: Failed with result 'exit-code'.
Mar 18 18:10:38 arianvp.me systemd[1]: Failed to start Renew ACME certificate for arianvp.me.

This is on 20.09. /var/lib/acme does not exist yet when this is run.

Is this the same issue in a dfiferent flavor? i don't have any acme related state on the machine and this is the entire config:

  services.nginx = {
    enable = true;
    virtualHosts = {
      "arianvp.me" = {
        forceSSL = true;
        enableACME = true;
        locations."/".root = pkgs.arianvp-website;
      };
    };
  };

@bjornfor
Copy link
Contributor Author

Let's leave this open until the fix is backported to release-20.09.

@bjornfor bjornfor reopened this Mar 24, 2021
m1cr0man added a commit to m1cr0man/nixpkgs that referenced this issue Apr 2, 2021
With the UMask set to 0023, the
mkdir -p command which creates the webroot
could end up unreadable if the web server
changes, as surfaced by the test suite in NixOS#114751
On top of this, the following commands
to chown the webroot + subdirectories was
mostly unnecessary. I stripped it back to
only fix the deepest part of the directory,
resolving NixOS#115976, and reintroduced a
human readable error message.

(cherry picked from commit 920a3f5)
@m1cr0man
Copy link
Contributor

m1cr0man commented Apr 2, 2021

Sorry it took me so long... busy week. PR is open. If someone like @arianvp could give it a test and see if it fixes their issues that would be great.

@m1cr0man
Copy link
Contributor

Backports merged, closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 0.kind: regression Something that worked before working no longer 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants