Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/acme: force-renewing certificates is unreasonably difficult #81634

Closed
emilazy opened this issue Mar 3, 2020 · 11 comments
Closed

nixos/acme: force-renewing certificates is unreasonably difficult #81634

emilazy opened this issue Mar 3, 2020 · 11 comments
Labels
0.kind: bug Something is broken 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@emilazy
Copy link
Member

emilazy commented Mar 3, 2020

Right now, the only way I've found that works is to set validMinDays = 999; to force the renewal. This wouldn't matter so much (it's usually only necessary if you e.g. tweak certificate options like OCSP Must-Staple) if not for the fact that Let's Encrypt screwed up and now a bunch of people have to do it by tomorrow, which they won't. Oops.

I'm filing this as an issue rather than a PR in part because I'm not really sure what the a good interface would be here; the best thing I can imagine is something like nix run -f '<nixpkgs/nixos>' something -c force-renew-certs, which seems weird. Does anyone know of prior precedent for interfaces here?

cc @aanderse @arianvp @m1cr0man @yegortimoshenko

If you just want to know how to force-renew your certificates this time

Add security.acme.validMinDays = 999; to your configuration and run a nixos-rebuild switch. This may or may not automatically renew the certificate depending on your nixpkgs version; to make sure, do systemctl start 'acme-*.service'. Make sure to remove the validMinDays option and run nixos-rebuild switch again afterwards, or you'll hammer the Let's Encrypt servers for a renewal every day!

@emilazy emilazy added the 0.kind: bug Something is broken label Mar 3, 2020
@emilazy
Copy link
Member Author

emilazy commented Mar 3, 2020

(maybe systemctl start acme-force-renew? But that feels like a hack.)

@emilazy
Copy link
Member Author

emilazy commented Mar 3, 2020

Interface/implementation sketch, after discussing with @yegortimoshenko: acme-force-renew-${domain}.service omits the validMinDays option handling so that lego always renews the certificate. We could then also hook things up so that it automatically runs when options like ocspMustStaple are changed.

This would require some reorganization in the acme module to avoid duplicating the service logic, so it would probably be a good idea to clean it up at the same time. I'll try and get around to doing it if nobody else does, but don't want to block anyone who feels like taking it on themselves.

@emilazy
Copy link
Member Author

emilazy commented Mar 4, 2020

Would also be a good idea to check for revocation with OCSP on the timer and do a force-renewal if so; this would mitigate the impact of future mass revocations in that the certificate would only be invalid for a day or so.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Mar 4, 2020
@lukateras lukateras self-assigned this Mar 9, 2020
@dhess
Copy link
Contributor

dhess commented Mar 13, 2020

There's another issue related to this: if you add an extraDomain (e.g., you just added a CNAME for the server), the acme service doesn't seem to notice the change, and it won't automatically renew as it should.

emilazy added a commit to emilazy/nixpkgs that referenced this issue Mar 22, 2020
Previously, the NixOS ACME module defaulted to using P-384 for
TLS certificates. I believe that this is a mistake, and that we
should use P-256 instead, despite it being theoretically
cryptographically weaker.

The security margin of a 256-bit elliptic curve cipher is substantial;
beyond a certain level, more bits in the key serve more to slow things
down than add meaningful protection. It's much more likely that ECDSA
will be broken entirely, or some fatal flaw will be found in the NIST
curves that makes them all insecure, than that the security margin
will be reduced enough to put P-256 at risk but not P-384. It's also
inconsistent to target a curve with a 192-bit security margin when our
recommended nginx TLS configuration allows 128-bit AES. [This Stack
Exchange answer][pornin] by cryptographer Thomas Pornin conveys the
general attitude among experts:

> Use P-256 to minimize trouble. If you feel that your manhood is
> threatened by using a 256-bit curve where a 384-bit curve is
> available, then use P-384: it will increases your computational and
> network costs (a factor of about 3 for CPU, a few extra dozen bytes
> on the network) but this is likely to be negligible in practice (in a
> SSL-powered Web server, the heavy cost is in "Web", not "SSL").

[pornin]: https://security.stackexchange.com/a/78624

While the NIST curves have many flaws (see [SafeCurves][safecurves]),
P-256 and P-384 are no different in this respect; SafeCurves gives
them the same rating. The only NIST curve Bernstein [thinks better of,
P-521][bernstein] (see "Other standard primes"), isn't usable for Web
PKI (it's [not supported by BoringSSL by default][boringssl] and hence
[doesn't work in Chromium/Chrome][chromium], and Let's Encrypt [don't
support it either][letsencrypt]).

[safecurves]: https://safecurves.cr.yp.to/
[bernstein]: https://blog.cr.yp.to/20140323-ecdsa.html
[boringssl]: https://boringssl.googlesource.com/boringssl/+/e9fc3e547e557492316932b62881c3386973ceb2
[chromium]: https://bugs.chromium.org/p/chromium/issues/detail?id=478225
[letsencrypt]: https://letsencrypt.org/docs/integration-guide/#supported-key-algorithms

So there's no real benefit to using P-384; what's the cost? In the
Stack Exchange answer I linked, Pornin estimates a factor of 3×
CPU usage, which wouldn't be so bad; unfortunately, this is wildly
optimistic in practice, as P-256 is much more common and therefore
much better optimized. [This GitHub comment][openssl] measures the
performance differential for raw Diffie-Hellman operations with OpenSSL
1.1.1 at a whopping 14× (even P-521 fares better!); [Caddy disables
P-384 by default][caddy] due to Go's [lack of accelerated assembly
implementations][crypto/elliptic] for it, and the difference there seems
even more extreme: [this golang-nuts post][golang-nuts] measures the key
generation performance differential at 275×. It's unlikely to be the
bottleneck for anyone, but I still feel kind of bad for anyone having
lego generate hundreds of certificates and sign challenges with them
with performance like that...

[openssl]: mozilla/server-side-tls#190 (comment)
[caddy]: https://github.com/caddyserver/caddy/blob/2cab475ba516fa725d012f53ca417c3e039607de/modules/caddytls/values.go#L113-L124
[crypto/elliptic]: https://github.com/golang/go/tree/2910c5b4a01a573ebc97744890a07c1a3122c67a/src/crypto/elliptic
[golang-nuts]: https://groups.google.com/forum/#!topic/golang-nuts/nlnJkBMMyzk

In conclusion, there's no real reason to use P-384 in general: if you
don't care about Web PKI compatibility and want to use a nicer curve,
then Ed25519 or P-521 are better options; if you're a NIST-fearing
paranoiac, you should use good old RSA; but if you're a normal person
running a web server, then you're best served by just using P-256. Right
now, NixOS makes an arbitrary decision between two equally-mediocre
curves that just so happens to slow down ECDH key agreement for every
TLS connection by over an order of magnitude; this commit fixes that.

Unfortunately, it seems like existing P-384 certificates won't get
migrated automatically on renewal without manual intervention, but
that's a more general problem with the existing ACME module (see NixOS#81634;
I know @yegortimoshenko is working on this). To migrate your
certificates manually, run:

    $ sudo find /var/lib/acme/.lego/certificates -type f -delete
    $ sudo find /var/lib/acme -name '*.pem' -delete
    $ sudo systemctl restart 'acme-*.service' nginx.service

(No warranty. If it breaks, you get to keep both pieces. But it worked
for me.)
@reanimus
Copy link
Contributor

reanimus commented May 3, 2020

Changing the ACME server endpoint is also a scenario leading to a forced reload due to acme not noticing the change, too.

emilazy added a commit to emilazy/nixpkgs that referenced this issue May 24, 2020
Previously, the NixOS ACME module defaulted to using P-384 for
TLS certificates. I believe that this is a mistake, and that we
should use P-256 instead, despite it being theoretically
cryptographically weaker.

The security margin of a 256-bit elliptic curve cipher is substantial;
beyond a certain level, more bits in the key serve more to slow things
down than add meaningful protection. It's much more likely that ECDSA
will be broken entirely, or some fatal flaw will be found in the NIST
curves that makes them all insecure, than that the security margin
will be reduced enough to put P-256 at risk but not P-384. It's also
inconsistent to target a curve with a 192-bit security margin when our
recommended nginx TLS configuration allows 128-bit AES. [This Stack
Exchange answer][pornin] by cryptographer Thomas Pornin conveys the
general attitude among experts:

> Use P-256 to minimize trouble. If you feel that your manhood is
> threatened by using a 256-bit curve where a 384-bit curve is
> available, then use P-384: it will increases your computational and
> network costs (a factor of about 3 for CPU, a few extra dozen bytes
> on the network) but this is likely to be negligible in practice (in a
> SSL-powered Web server, the heavy cost is in "Web", not "SSL").

[pornin]: https://security.stackexchange.com/a/78624

While the NIST curves have many flaws (see [SafeCurves][safecurves]),
P-256 and P-384 are no different in this respect; SafeCurves gives
them the same rating. The only NIST curve Bernstein [thinks better of,
P-521][bernstein] (see "Other standard primes"), isn't usable for Web
PKI (it's [not supported by BoringSSL by default][boringssl] and hence
[doesn't work in Chromium/Chrome][chromium], and Let's Encrypt [don't
support it either][letsencrypt]).

[safecurves]: https://safecurves.cr.yp.to/
[bernstein]: https://blog.cr.yp.to/20140323-ecdsa.html
[boringssl]: https://boringssl.googlesource.com/boringssl/+/e9fc3e547e557492316932b62881c3386973ceb2
[chromium]: https://bugs.chromium.org/p/chromium/issues/detail?id=478225
[letsencrypt]: https://letsencrypt.org/docs/integration-guide/#supported-key-algorithms

So there's no real benefit to using P-384; what's the cost? In the
Stack Exchange answer I linked, Pornin estimates a factor of 3×
CPU usage, which wouldn't be so bad; unfortunately, this is wildly
optimistic in practice, as P-256 is much more common and therefore
much better optimized. [This GitHub comment][openssl] measures the
performance differential for raw Diffie-Hellman operations with OpenSSL
1.1.1 at a whopping 14× (even P-521 fares better!); [Caddy disables
P-384 by default][caddy] due to Go's [lack of accelerated assembly
implementations][crypto/elliptic] for it, and the difference there seems
even more extreme: [this golang-nuts post][golang-nuts] measures the key
generation performance differential at 275×. It's unlikely to be the
bottleneck for anyone, but I still feel kind of bad for anyone having
lego generate hundreds of certificates and sign challenges with them
with performance like that...

[openssl]: mozilla/server-side-tls#190 (comment)
[caddy]: https://github.com/caddyserver/caddy/blob/2cab475ba516fa725d012f53ca417c3e039607de/modules/caddytls/values.go#L113-L124
[crypto/elliptic]: https://github.com/golang/go/tree/2910c5b4a01a573ebc97744890a07c1a3122c67a/src/crypto/elliptic
[golang-nuts]: https://groups.google.com/forum/#!topic/golang-nuts/nlnJkBMMyzk

In conclusion, there's no real reason to use P-384 in general: if you
don't care about Web PKI compatibility and want to use a nicer curve,
then Ed25519 or P-521 are better options; if you're a NIST-fearing
paranoiac, you should use good old RSA; but if you're a normal person
running a web server, then you're best served by just using P-256. Right
now, NixOS makes an arbitrary decision between two equally-mediocre
curves that just so happens to slow down ECDH key agreement for every
TLS connection by over an order of magnitude; this commit fixes that.

Unfortunately, it seems like existing P-384 certificates won't get
migrated automatically on renewal without manual intervention, but
that's a more general problem with the existing ACME module (see NixOS#81634;
I know @yegortimoshenko is working on this). To migrate your
certificates manually, run:

    $ sudo find /var/lib/acme/.lego/certificates -type f -delete
    $ sudo find /var/lib/acme -name '*.pem' -delete
    $ sudo systemctl restart 'acme-*.service' nginx.service

(No warranty. If it breaks, you get to keep both pieces. But it worked
for me.)

(cherry picked from commit 62e34d1)
@stale
Copy link

stale bot commented Oct 30, 2020

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 30, 2020
@m1cr0man
Copy link
Contributor

This issue is effectively solved now. With #91121, you can run systemctl clean acme-$domain.service to clear all certificates, and the next start of the service will acquire new ones. It is also documented in the manual too: https://nixos.org/manual/nixos/stable/index.html#module-security-acme-regenerate

@supermarin
Copy link
Contributor

Ran into this recently, luckily on a personal website that's being rebuilt right now.
Not sure if I'm doing something wrong, but can't seem to get this working:

[root@nix:/home/git]# find / -name supermar.in
/var/www/supermar.in
/var/lib/acme/supermar.in
/var/lib/acme/.lego/supermar.in
^C

[root@nix:/home/git]# ls -la /var/lib/acme/supermar.in/
total 24
drwx------ 2 nginx nginx 4096 Nov 21 19:19 .
drwxr-xr-x 6 root  root  4096 Nov 21 19:25 ..
lrwxrwxrwx 1 nginx nginx   13 Nov 21 19:19 cert.pem -> fullchain.pem
-rw------- 1 nginx nginx 1680 Nov 21 19:19 chain.pem
-rw------- 1 nginx nginx 3262 Nov 21 19:19 fullchain.pem
-rw------- 1 nginx nginx 3489 Nov 21 19:19 full.pem
-rw------- 1 nginx nginx  227 Nov 21 19:19 key.pem

[root@nix:/home/git]# systemctl clean acme-supermar.in.service
Failed to clean unit acme-supermar.in.service: No matching resources found.
[root@nix:/home/git]# systemctl status acme-supermar.in.service
● acme-supermar.in.service - Renew ACME Certificate for supermar.in
   Loaded: loaded (/nix/store/srj75amm6pbgmxnil6r3yab66vp9gwjj-unit-acme-supermar.in.service/acme-supermar.in.service; enabled; vendor preset:>
   Active: inactive (dead) since Fri 2020-12-04 20:48:20 UTC; 6s ago
  Process: 23860 ExecStart=/nix/store/2n4f9v9bzv95hg3zip258g2nf5ahv5cs-acme-start (code=exited, status=0/SUCCESS)
  Process: 23866 ExecStartPost=/nix/store/h75f7si22692znj500v3gwbbzia08rr8-acme-post-start (code=exited, status=0/SUCCESS)
 Main PID: 23860 (code=exited, status=0/SUCCESS)
       IP: 4.6K in, 978B out
      CPU: 66ms

Dec 04 20:48:20 nix systemd[1]: Starting Renew ACME Certificate for supermar.in...
Dec 04 20:48:20 nix 2n4f9v9bzv95hg3zip258g2nf5ahv5cs-acme-start[23860]: 2020/12/04 20:48:20 [supermar.in] The certificate expires in 76 days, >
Dec 04 20:48:20 nix systemd[1]: acme-supermar.in.service: Succeeded.
Dec 04 20:48:20 nix systemd[1]: Started Renew ACME Certificate for supermar.in.
Dec 04 20:48:20 nix systemd[1]: acme-supermar.in.service: Consumed 66ms CPU time, received 4.5K IP traffic, sent 978B IP traffic.

@pkern
Copy link
Contributor

pkern commented Dec 5, 2020

Same here. No matching resources when trying to clean.

@supermarin
Copy link
Contributor

@pkern just for quickly unblocking, rm -rf /var/lib/acme* and nixos-rebuild should get you going

@m1cr0man
Copy link
Contributor

m1cr0man commented Dec 6, 2020

Hey folks! Sorry for not responding sooner. I read over the generated service config and the systemctl docs, and I believe you need to add the --what=state argument like so:

systemctl clean --what=state acme-m1cr0man.com.service

I believe this isn't highlighted in the nixos acme docs so I'll fix that in my next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

8 participants