-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(AutoTLS): opt-in WSS certs from p2p-forge at libp2p.direct #10521
Conversation
…-forge and libp2p.direct
3d6cb4d
to
ecfc8d9
Compare
this matches convention from `./client/acme.go` in p2p-forge itself
Updated this PR, aiming at Kubo 0.32. It uses the latest commit from go-libp2p master branch that includes libp2p/go-libp2p#2854. |
512b48a
to
4889612
Compare
Cleanup to unify the way p2p-forge related things are logged. Debugging can be enabled by setting environment variable `GOLOG_LOG_LEVEL="error,p2p-forge/client=debug"`
Pushed:
Next steps:
|
Switched to go-libp2p 0.37 from master branch and resolved conflicts, I'll check on CI tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, left some feedback on naming. I can submit more suggestions thorough PR tomorrow.
Co-authored-by: Daniel Norman <[email protected]>
addressing UX/DX feedback from Daniel, switching to self-explanatory configuration section
core/node/libp2p/addrs.go
Outdated
p2pforge.WithCAEndpoint(cfg.CAEndpoint.WithDefault(config.DefaultCAEndpoint)), | ||
p2pforge.WithForgeAuth(cfg.RegistrationToken.WithDefault(os.Getenv(p2pforge.ForgeAuthEnv))), | ||
p2pforge.WithUserAgent(version.GetUserAgentVersion()), | ||
p2pforge.WithCertificateStorage(&certmagic.FileStorage{Path: storagePath})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a bug in cluster logs ("error while checking storage for updated ARI; updating ARI now"):
Oct 28 08:44:33 collab-cluster-sv15-1 ipfs[99759]: 2024-10-28T08:44:33.576Z ERROR p2p-forge/client.maintenance [email protected]/maintain.go:521 error while checking storage for updated ARI; updating ARI now {"identifiers": ["*.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct"], "cert_hash": "ac1b9af9ec6e49e5e445593e02620a25a4f6bf5f8013ac0ed7a0206cb9c5f9bd", "ari_unique_id": "kydGmAOpUWiOmNbEQkjbI79YlNI.A_zqjUpBrD6AJHDUgsoK9qj6", "cert_expiry": "2025-01-23T11:57:29.000Z", "error": "loading cert metadata: open /home/ipfs/.local/share/certmagic/certificates/acme-v02.api.letsencrypt.org-directory/wildcard_.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct/wildcard_.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct.json: no such file or directory"}
Oct 28 08:44:33 collab-cluster-sv15-1 ipfs[99759]: 2024-10-28T08:44:33.576Z DEBUG p2p-forge/client.acme_client acme/ari.go:134 getting renewal info {"names": ["*.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct"]}
Oct 28 08:44:33 collab-cluster-sv15-1 ipfs[99759]: 2024-10-28T08:44:33.656Z DEBUG p2p-forge/client.acme_client acme/http.go:275 http request {"method": "GET", "url": "https://acme-v02.api.letsencrypt.org/draft-ietf-acme-ari-03/renewalInfo/kydGmAOpUWiOmNbEQkjbI79YlNI.A_zqjUpBrD6AJHDUgsoK9qj6", "headers": {"User-Agent":["CertMagic acmez (linux; amd64)"]}, "response_headers": {"Cache-Control":["public, max-age=0, no-cache"],"Content-Length":["121"],"Content-Type":["application/json"],"Date":["Mon, 28 Oct 2024 08:44:33 GMT"],"Link":["<https://acme-v02.api.letsencrypt.org/directory>;rel=\"index\""],"Retry-After":["21600"],"Server":["nginx"],"Strict-Transport-Security":["max-age=604800"],"X-Frame-Options":["DENY"]}, "status_code": 200}
Oct 28 08:44:33 collab-cluster-sv15-1 ipfs[99759]: 2024-10-28T08:44:33.656Z INFO p2p-forge/client.acme_client acme/ari.go:215 got renewal info {"names": ["*.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct"], "window_start": "2024-12-23T12:16:59.333Z", "window_end": "2024-12-25T12:16:59.333Z", "selected_time": "2024-12-24T21:30:45.000Z", "recheck_after": "2024-10-28T14:44:33.656Z", "explanation_url": ""}
Oct 28 08:44:33 collab-cluster-sv15-1 ipfs[99759]: 2024-10-28T08:44:33.656Z ERROR p2p-forge/client.maintenance [email protected]/maintain.go:181 updating ARI {"error": "got new ARI from acme-v02.api.letsencrypt.org-directory, but failed loading stored certificate metadata: loading cert metadata: open /home/ipfs/.local/share/certmagic/certificates/acme-v02.api.letsencrypt.org-directory/wildcard_.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct/wildcard_.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct.json: no such file or directory"}
I suspect this is another race condition where certmanager things (like location .local/share/certmagic/
) are initialized with default config before our config is applied (same problem as the logger).
Will look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential fix in e51d907 and ipshipyard/p2p-forge@f599f48.
TLDR is that certmagic.NewDefault
should be used only for prototyping, or if one is ok with default storage location and logger. Using dedicated cache and avoiding calling certmagic.NewDefault
should remove surface for race condition in setting storage paths AFTER maintenance job is started.
I'm going to deploy to collab cluster, let it run over night and see if ARI error is gone.
Update: so far the fix looks good, renewal check does not produce error anymore:
Oct 29 00:21:19 collab-cluster-sv15-1 ipfs[2893066]: 2024-10-29T00:21:19.893Z INFO autotls.acme_client acme/ari.go:215 got renewal info {"names": ["*.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct"], "window_start": "2024-12-23T12:16:59.333Z", "window_end": "2024-12-25T12:16:59.333Z", "selected_time": "2024-12-24T14:39:17.000Z", "recheck_after": "2024-10-29T06:21:19.893Z", "explanation_url": ""}
Oct 29 00:21:19 collab-cluster-sv15-1 ipfs[2893066]: 2024-10-29T00:21:19.929Z INFO autotls [email protected]/maintain.go:584 updated ACME renewal information {"identifiers": ["*.k51qzi5uqu5dljgcjt37azmfvj1u9b1v9okfd0vhd4gkbcs75cmy0shqjoml6q.libp2p.direct"], "cert_hash": "ac1b9af9ec6e49e5e445593e02620a25a4f6bf5f8013ac0ed7a0206cb9c5f9bd", "ari_unique_id": "kydGmAOpUWiOmNbEQkjbI79YlNI.A_zqjUpBrD6AJHDUgsoK9qj6", "cert_expiry": "2025-01-23T11:57:29.000Z", "selected_time": "2024-12-25T07:38:42.000Z", "next_update": "2024-10-29T06:21:19.893Z", "explanation_url": ""}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the error seems to be gone. Considering this resolved, but keeping this open for visibility.
this should fix race condition caused by using certmagic.Default directly before.
docs/config.md
Outdated
If enabled, it will detect when `.../tls/sni/.../ws` is present in [`Addresses.Swarm`](#addressesswarm) | ||
and SNI is matching `Swarm.AutoTLS.DomainSuffix`, and set up a trusted TLS certificate matching the domain name used in Secure WebSockets (WSS) listener. | ||
|
||
If you want to test this, add `/ip4/0.0.0.0/tcp/4082/tls/sni/*.libp2p.direct/ws` to [`Addresses.Swarm`](#addressesswarm). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is port 4082
used here?
Can you use the same port as the /ws
address: /ip4/0.0.0.0/tcp/4002/ws
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, we need libp2p/go-libp2p#2984 for that.
The idea is that TCP port sharing will land in go-libp2p before we enable AutoTLS by default.
For opt-in, using separate port is fine, just need to remember to remind people that they need to safelist additional port on their firewall.
we have AutoNAT there, and this feature could be extended in the future to cover user-provided domains and Gateway, so moving it out of Swarm makes it more future-proof.
include catch-all multiaddrs as prominent examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been running this in collab cluster and no new errors were found.
Configuration and docs are in place. This is likely as good as it can be without asking wider community for feedback.
Merging to include it in tomorrow's Kubo 0.32.0-rc1 (#10547)
This PR is WIP but the intention is to allow any publicly reachable node on the network to be reachable via Secure WebSockets and thereby improve reachability from browsers. For anyone who reads this and is unfamiliar with this work, don't worry plenty of documentation to come before we're ready to merge.
TLDR
TODO
Some outstanding items here are:
the libp2p.direct service needs to be ready for production usetracked in Enabling AutoTLS feature by default #10560multiplex WSS on the same port as TCP (e.g. by muxing on the ALPN in the TLS cert) so people don't need to open new portsERROR p2p-forge/client.maintenance [email protected]/maintain.go:521 error while checking storage for updated ARI; updating ARI now
(fromipfs
atcollab-cluster-sv15-1
)replace github.com/libp2p/go-libp2p
from all threego.mod