-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/acme: Backport account missing fixes #107769
Conversation
(cherry picked from commit 76401c9)
(cherry picked from commit 79ecf06)
This means that all systems running from master will trigger new certificate creation on next rebuild. Race conditions around multiple account creation are fixed in NixOS#106857, not this commit. (cherry picked from commit e312039)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in ACME, but from #101482 (comment) I can confirm that applying these backports fixed the problem for me on 20.09.
Ah, will do. I was quick to just commit the suggested changes. Will revert that for commit consistency. |
(cherry picked from commit f71e439)
8fed28b
to
15822fb
Compare
Okay, this broke my ACME again with the following error:
Fix was - again - to recreate |
Just for the record, that's the same error messgae which this PR fixed for me. |
I can second @Ma27's result.. It just broke for me and when switching system generations it just hangs there :/
|
@andir Does @Ma27's fix also fix it for you? @m1cr0man Do you know if it's possible to make it such that we can somehow handle this case automatically? It's unfortunate that what fixes some system breaks it for others (at least once) -- and I imagine that even undoing the backport won't help, as people will still get the problem in the next NixOS release. |
I don't have a good guess as to how this is occurring. It could be an instance of the account creation race conditions I suppose (do you have multiple certs with the same account email?). The code in this PR explicitly checks if your account registration files exist - so unless they got deleted between the conditional and lego running then it should just be working. Hmm. I wonder if lego is creating files in the accounts folder even if it fails to create an account? This could happen if your very first renewal fails. Could you take a quick look at the accounts directory for the affected cert (if you still have it)?
I'm hopeful that #106857 will cover these cases, but failing that I don't have a root cause for this issue right now, so I can't fix it automatically Blowing away /var/lib/acme or any cert related files automatically would not be a sensible thing to do. |
Came here to report yet another ACME bug. After I finally had it all working upgrading to the latest NIxOS 20.09 broke my certs again!
Is there anything I can provide from the logs or whatnot that will help? |
Yes, can you please list the contents of |
This machine has two certificates for two different domains using the same email address. On this machine I also use DNS to verify with ACME and not HTTP. |
I half expect this to happen with multiple accounts using the same name. Since the check for accounts is a basic "ls" of the accounts folder, if multiple renews run at once (with the same account config) then one of them is bound to create something in the folder and make the other think the account already exists, even if it doesn't really. You might think a more advanced check for the account.json and keys would fix it but still you run into race conditions which have been recorded plenty of times in other tickets. The real solution is the "accounts" targets created in #106857 If you run the failed renew service again, does it work? |
@m1cr0man It looks like running the service again fixes the issue. I walked away from my computer for a bit and when I returned the services were fine. The normal timer must have kicked them off and resolved the issue. I'd like to offer whatever help I can provide to get this all nailed down. For my own sanity I really need updates from NixOS 20.09 to be stable. I'm happy to write tests, help work on the ACME module, or whatever else needs to be done. |
Honestly? I don't think there's much to worry about. The test suite is very comprehensive and in #106857 I have added a race condition test, which I'm now certain was the cause of your issue. At this point I fully plan to back port that PR too, so that the likes of yourself can benefit from it. I would appreciate a 👍 on that PR if you can :) |
I figured that since this ACME issue was annoying but harmless I could deploy to my production servers. Of course that was a mistake. On a server with a single certificate I'm getting this:
On servers that have more than one cert they are failing with the "account is not registered" issue. Trying to run them one at a time does not resolve the error. What info do you need to help me get this sorted out? |
JWS verification error is being tracked in #101445, and I just discovered a graceful fix (see here). Please check that your account key is not corrupt first, and that its modify time is close to the account.json, as it will help us investigate this issue further. As for the not registered error - if you are able to reproduce it (aka run one of the services and still get the same error) then please send me the contents of the same folders as last time. |
On the server with multiple certificates, trying to run a single renewal fails with the "account not registered error". This is what the account directory looks like for that service:
|
Motivation for this change
Backport of #101482
It was requested by at least one user, and the module is still causing issues for users so this will probably not be the last backport of acme-related changes.
Things done
sandbox
innix.conf
on non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
./result/bin/
)nix path-info -S
before and after)