-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restructure acme module #91121
Restructure acme module #91121
Conversation
ping @NixOS/acme |
Everything is working now. Hash-based cert folders works really nicely for triggering cert refreshes when configuration is changed. Before I take the PR out of WIP I want to write the new tests. I would also like to know if anyone with more experience can tidy up the submodule (security.acme.certs.) changed + removed option assertions. I couldn't find a way to make use of functions in lib.modules so I had to partially reimplement them and I'm personally not happy with it. |
Tests have now been rewritten. I incorporated some practical cert tests utilising openssl, and actually fixed a couple of my own bugs in the process :) Left to do would be update the docs holistically, in particular adding instructions for complete cert renewal with systemctl clean. Beyond that, I would like feedback on whether I should do the following:
I'm opening this PR for review now, since it's in a production ready form and I have made use of it myself already. |
I have updated the PR description with some more change summaries. |
nixos/modules/security/acme.nix
Outdated
certToConfig = cert: data: let | ||
acmeServer = if data.server != null then data.server else cfg.server; | ||
useDns = data.dnsProvider != null; | ||
keyName = builtins.replaceStrings ["*"] ["_"] data.domain; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we slugify this only here, and not for cert
?
Is this there to make bash happy? Or is this some standard used by other clients as well? If it's only the former, I'd rather see our shell script being able to handle * (as we put it in double quotes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a "feature" of minica and lego. They replace the star with an underscore, so in order to reference the output files I need to do this myself too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An end user will never see this either, we rename all the certs in /var/lib/acme/ to something more easily referenced. I might add a test which covers having a star in the cert
variable to ensure that the scripts always work in that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Please add a amsll comment to it next to keyName
, so this is understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a bit of testing, I discovered that if ${cert}
contains a * it breaks things in multiple places, most crucially it broke the compilation of the systemd units. I have added an assertion recommending users to use the domain
argument as we do in the tests.
I thought of a few more cases where things may go wrong, and added tests to cover them. In particular, the web server reload services were depending on the target - which stays alive, meaning that the renewal timer wouldn't be triggering a reload and old certs would stay on the web servers. I encountered some problems ensuring that the reload took place without accidently triggering it as part of the test. The sync |
I encountered an issue today in a production environment with NixOS where the certs for my mail server (dovecot and postfix) expired even though they were up to date. Obviously, this was because there was no trigger to reload them. It got me thinking, is a reload service on a per-dependency basis the right solution? For most users, simply specifying a postRun command was sufficient. Maybe we add support for running the postRun scripts as root in a separate service rather than as acme:${group} in the renewal service. Even better, we could add a reloadOnRenew array of services to reload, that way we can append it from various other modules (nginx, httpd, etc) when one becomes dependent on certs. |
Thanks for the review @Mic92 . If I could get someone else to give it a test run, and maybe if someone could help with the commented out assertions (aka assertions on submodules) this should be good for a merge. |
- Allow for key reuse when domains are the only thing that were changed. - Fixed systemd service failure when preliminarySelfsigned was set to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been using several iterations of this locally and I feel confident that this is a big step forward. Thanks @m1cr0man!
While switching a few of my systems to 20.09-alpha I experienced some weird behavior with this module: apparently my network crashed during a deploy (most likely not related to this change) and then I got the following error in my journald for each
Whenever I performed a retry with
This could be fixed by (1) erasing the existing account with Tbh I don't know the code enough to understand what exactly went wrong which is why I'm posting this observation here. We may want to document this in our manual (or catch this issue if possible in our code), but not sure if any of those suggestions is a good idea. |
That definitely looks like a bug in Lego rather than in our script, given the accounts data was corrupt/broken. All the script/service does is create symlinks and copy files. It might be worth posting that in an issue over on go-acme/lego. |
It's a known problem it seems: go-acme/lego#1006 |
@m1cr0man I forgot to update the dns records first so it failed to enable ssl for three subdomains (letsencrypt). After I fixed that I got "Error creating new order :: too many failed authorizations recently: see https://letsencrypt.org/docs/rate-limits/"
This doesn't make sense. Do you know whether there are multiple attempts and if how you can disable that? |
Can you check how many times the acme-$domain.service was run, by looking at |
script = with builtins; concatStringsSep "\n" (mapAttrsToList (cert: data: '' | ||
for fixpath in /var/lib/acme/${escapeShellArg cert} /var/lib/acme/.lego/${escapeShellArg cert}; do | ||
if [ -d "$fixpath" ]; then | ||
chmod -R 750 "$fixpath" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this makes regular files (e.g. /var/lib/acme/hydra.nixos.org/fullchain.pem
) executable, which is a bit ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chmod -R u=rwX,g=rX,o= "$fixpath"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback + the hint. I'll open a PR if no one beats me to it :)
ckSJ/EkxuwT/ZYLqCAKSFGMlFhad9g1Zyvd67XgfZq5p0pJTtGxtn5j8QHy6PM6m | ||
NbjvWnP8lDU8j2l3eSG58S14iGs= | ||
-----END CERTIFICATE----- | ||
''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I care a lot anymore, but note that there is a reason why these were added here instead of being generated: If you substitute the test-certs
derivation but build the ca-certificates.crt
on a different host, you'll get mismatched certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't really follow you on this one? The CA certificates and the test certs are generated in the same derivation now, so they can't drift from each other. My understanding of the cache is that for any one execution of the tests all nodes will use the same nix store and cache and thus it doesn't matter which it chooses from (local build or cache) when setting the security.pki.certificateFiles. Unless I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the CA certs are copied, so while your machine might have test-certs
version A, it might use ca-certificates.crt
built (or in this case substituted) using test-certs
version B.
edit: To illustrate this a bit better, consider this situation:
local store:
/nix/store/1234-test-certs
- hash: abcd9876
remote store:
/nix/store/1234-test-certs
- hash: dcba9182/nix/store/5678-ca-certificates.crt
So if you're going to build the test, you substitute /nix/store/5678-ca-certificates.crt
from the remote store and thus get the CA certificate from the test-certs
derivation with content hash dcba9182
while the test is actually generating the server certificates from abcd9876
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I follow you now. I had to revise my understanding of derivations, and as the docs highlight the hash is based on the buildInputs, name and output name. And then the chaos you have explained can happen since the output isn't deterministic based on those properties, and caCertificates can drift.
I'm not keen to hard code the certs back in, but there really is no other solution right? I already read back on your commits to see how you tried to solve it before. I was trying to think of something involving builtins.mkHash + builtins.readFile but that still depends on the "broken" certs derivation. If only outputHashMode + outputHashAlgo could be set without outputHash itself, resulting in a generated outputHash based on the cert files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hang on - cacerts has preferLocalBuild set to true, so surely setting preferLocalBuild for test certs too will ensure consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that's good to know. Putting them in separate files in the repo sounds like a good solution. I'll do that tonight, and I'll keep the minica based generator there as a solution for generating the SSL certs incase they must be regenerated in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m1cr0man: Hm, one way to address that would be if minica
would be fully deterministic when generating keys/certs, which would help in a lot of other scenarios when testing with TLS enabled.
edit: Lemme give this a try and make a PoC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, here is the working PoC (checked several times with nix-build --check
). However implementing this in a sane™ way is not so easy, since Go's crypto library tries to prevent determinism via randutil.MaybeReadByte
.
So we either need to copy & paste a deterministic version (which just doesn't use randutil.MaybeReadByte
) of rsa.GenerateMultiPrimeKey
into minica
or "simply" patch go (which I did in the PoC). The latter has the advantage that it would also work if we'd want to have eg. ECDSA certs, while we'd need to create another copy for the former.
Maybe a better way to implement this would be to find a crypto library which doesn't try to prevent determinism and use that to generate the certs/keys.
Of course, another question here is the way we expose this, since we certainly don't want people to use this for other purposes than testing but on the other side I think it's useful to have this generally available for all sorts of tests.
And the last issue - which certainly is a bigger one that's even more prevalent with the hardcoded certificates - is certificate expiry, which minica
sets to 100 years for the CA root certificate and to 2 years and 30 days for every server certificate. One way out of this would be if run our VM tests with a fixed time in general, but that's something for another pull request... ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really just go back to having a .nix
to regenerate the private keys and storing them in the repository. Lots of other tests do the same already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
Motivation for this change
Closes #86184, closes #62958, closes #18440, closes #89502
Triggered by #84633
Rewriting the acme module since there was too much stuff carried over from when simp-le was used and the requirements have changed dramatically for lego.
acme-finished-${cert}.target
which can be relied on as an indicator that cert renewal is completed.TODO:
Solve for the fact that root currently owns all certsAdded a migration service to fix permissionsCheck validMinDays in the start script against cert mtimeBad idea because selfsigned certs might be thereAdd a note about multiple account creation and emailsDoneMigrate extraDomains to a listDoneDeprecate user optionDone, along with activationDelaySet up selfsigned certs, consider using minicaDone. Way nicer than the old method. Split into 2 services to avoid race conditions.Update testsThings done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)