-
-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACME fails with JWS verification error #101445
Comments
I hit the same bug as well and fixed it on my machines with the following workaround: #91121 (comment) |
I wiped /var/lib/acme/.lego/accounts and now all certificate services fail:
|
Right - the tests for whether to run |
Do you mean that I should nuke all of /var/lib/acme ? |
Pretty much, yeah. Personally I would move it somewhere else instead of flat-out delete ;) Then run |
It works. |
Awesome. I have opened a PR there which will resolve this issue. In particular, I made it check if the accounts directory is empty, which hopefully negates the need to wipe out your entire /var/lib/acme directory. I also added some documentation on the process. |
I am also running into this problem. The workaround works, but the account key was generated just yesterday and nothing looks corrupted (JSON and EC key). Today, I was trying to enroll a new domain using the same account. There may be something more fishy around this. Unsure how to debug this. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/release-notes-20-03-20-09-instructions-may-need-an-update-acme/9787/2 |
#102387 prevents renewal services to be run at the same time. I'm unsure as to how to test if it fixes the issue. |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
This issue has been mentioned on NixOS Discourse. There might be relevant details there: |
@symphorien FWIW I don't think that renewing more than one cert at a time is a factor in this bug. I only had a single cert fail when upgrading to NixOS 20.09 and running the single systemd service by itself still failed. It appears this will be fixed by #102862 That said, linking all of the renewal services together would be nice so that if one fails the others won't be tried. If you have 5 or more failures you basically lock yourself out of LE for an hour. |
This PR does not do that. If a certificate fails, the next ones will be tried as well. |
This reverts commit a77d1a5.
@symphorien and others, I have done some work in #106857 to prevent multiple simultaneous account creations, and reduced the usage of tmpfiles so that file creation is more dependable/reliable. If you could give it a shot and report back I'd be interested to hear if it works :) You may need to run |
The issue still exists on 20.09 (I could not find any backport), and it happened to me this very morning. |
Also happened to me just now on 20.09. The workaround from #91121 (comment) did not fix it for me. |
Ok! I figured out how to reproduce this from a working setup 🔥
TL;DR If you end up in a situation where you have the wrong key for a given account, you get the exact error being shared around here.
Why this is happening is what really confuses me. This has been demonstrated to happen naturally even when you only have 1 certificate involved, which rules out a multi-cert race condition. By virtue of the hashed folders, you can never really end up with stale data in your accounts folder. Anyone that has an accounts directory that has this issue - can you please cat the key in your accounts folder and check it is not malformed? Also, can you compare the modify times of the account.json and the key file (they should be fairly close)? That's my best guess right now as to what the cause is. What's nicer is that I have found an elegant solution: If your key is not corrupt, then you can re-request the belonging account ID. To quote the Let's Encrypt FAQ:
To do this, simply delete the |
Deleting the Here is the account directory before deleting the file:
Before deleting the
After deleting the file and running the service again:
|
Also did not work for me. |
Just straight up deleting Fortunately though, it did. |
That actually did work for me, though hopefully I was in the correct situation:
After this, For some background, this is on EC2 and we provisioned this server today. We activated a configuration that requested 10 certificates in one go, and all went well there. Later, we had to change one of the nginx virtual hosts, so caused a new certificate request there. That one failed with the JWS verification error. So ACME configuration was only changed twice: in the initial activation, and the later change. (Other activations did cause the services to rerun, just do nothing, because certs were up-to-date.) Unfortunately, this is not a 1-certificate-scenario, and so doesn't rule out parallelism. This machine is currently on nixpkgs aa5b9cd on branch nixos-20.09. I would expect an actually corrupt key on disk would cause a different error, because Lego would not be able to sign the request? It's my guess that we can rule out a corrupt key file. |
Thanks for the background info. Fortunately I have definitely fixed any parallelism issues in #106857 :) If that was the cause in your case, it should be resolved soon. The fact that your ID was 3 lower confirms it IMO.
I would suspect the same too, but I haven't looked into how Lego uses that key exactly. I would hope it gives a different error. @pjones I came up with another possible scenario where this solution may not work (at least, deleting the account.json would not work). AFAIK the account keys are generated by lego. If the key is generated, written to disk, then account creation fails, then you end up with a key with no account associated with it. This could be the issue in your case. Try removing the key instead of the account.json and report back what happens :) |
Actually come to think of it there's no way for lego to get the key. I suspect it will still work, but your account ID will change. That's what happened when I tried it. |
@m1cr0man If that last comment was directed to me... If my account ID is going to change should I just archive |
@pjones I guess so, emphasis on the archive since right now I don't know what else to investigate as a possible cause. |
@m1cr0man My server with multiple certs, the one where all the acme services seemed to resolve themselves, started failing again. Nothing on this server changed.
So I decided to archive the
I've started receiving Let's Encrypt emails stating that some of my certs are about to expire. At this point all of my servers are failing to renew all certifications because of acme service errors. At this point I don't know what to do and I'm only a few days away from a production outage due to expired certs. Please advise. |
Can you send the logs from the last good renewal of one of your certs? Also can you indicate:
Also you need to run |
@m1cr0man The services have been failing long enough where I no longer have logs that go back that far. As far as accounts go, I use one email address across all of my certs. Using Thank you for your quick response. |
Glad you got sorted. I am constantly surprised by people's encounters with this bug. I personally have a lot of hosts using the module; one with over 20 certs all using one account, and I have never hit some of these common issues - which I hope is a bit of good news. Once you rebuild your /var/lib/acme it tends to go away for good but I've not found a good reason why it breaks in the first place. Hopefully the work in the other PR will help people. |
Workaround for NixOS/nixpkgs#101445
Is there any solution or workaround for this "JWS verification error"? Even removing
Any help would be appreciated. |
Will there be any fix for this problem in NixOS 20.09.? @zopieux Thanks. Can you tell me how to patch this? |
This doesnt make sense. I removed all tmpfiles usage in #106857 so that couldn't have an impact on any resolution. However, if that patch does solve the issue then I will look into getting a backport merged |
Sorry if this is just a coincidence then. FYI I was running |
This is fixed now! |
To whom it may concern: |
There's not much that can be done about this, but I am hopeful that the latest set of PRs will stop the issue occurring again. Out of interest though, did you get this error specifically (JWS verification) or something else? |
Just had this issue today on unstable. To get it working again I did the following
Lots of good hints in this thread here. Thanks! |
To anyone else that comes across this thread: I also wrote docs on the manual which detail only deleting the accounts folder to resolve this issue. From previous testing in this thread, this was usually all that was required. Failing that, could you please try these commands before the above options in the thread as I am curious if it will work:
This will delete the certs + accounts data for the failing renewal. It's a little less destructive if you're only having a problem with one cert than recreating the whole of /var/lib/acme |
Describe the bug
When upgrading from 20.03 to 20.09, acme started failing for 5 of my domains (but not all of them).
To Reproduce
Expected behavior
No failure
Additional context
Add any other context about the problem here.
Notify maintainers
cc @NixOS/acme
Metadata
"x86_64-linux"
Linux 5.8.14, NixOS, 20.09beta1083.51aaa3fa1b6 (Nightingale)
yes
yes
nix-env (Nix) 2.3.7
"nixos-20.09beta1346.05334ad7852, nixos-unstable-21.03pre246543.24c9b05ac53"
"home-manager-20.09"
/nix/var/nix/profiles/per-user/root/channels/nixos
Maintainer information:
The text was updated successfully, but these errors were encountered: