-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
security: figure out cert/key rotation #1674
Comments
Throwing a basic outline of requirements and future improvements: Requirements:
Future improvements:
Specifying multiple CA certs or cert/key pairs can be done in a number of ways:
Reloading certs is probably best done through issuing the |
Cc @bdarnell |
Let's separate CA cert rotation from the other cases - it's significantly easier because no cockroach process needs to touch the CA private key (or should even have access to it), and it's trivial to use multiple CA certs at once. In fact, when we have multiple CA certs, I don't think we ever need to use multiple keypairs at any node or client. On the other hand, clients need the CA cert too, so we must consider clients in the key rotation story. The rotation process (assuming all the certs are expiring at the same time, although pieces of this process also work on their own) would be:
|
Yeah, I think we may want to decouple CA and server/client cert lifetimes, it's insane to have them expire at the same time, potentially forcing a refresh of everything. Ultimately, we won't be able to monitor client CA expiration, the best we can do is figure out which CA was used to sign the client cert. I propose we keep things reasonably simple for now. Specifically: Cert expiration:
The actual expiration time can be set at cert creation-time, but the defaults should differ for CA/server/client certs. CA rotation:
Admins can cleanup old CAs quickly, but don't have to. Server cert rotation:
I don't think we want to go much more complex than this. Any sort of cluster-wide gating system will fail if certs expire and nodes can't talk to each other. Concurrent CA/Server cert rotation:If this really needs to happen, there's very little we can do to make sure clients have the updated CA. Even gating server cert rollout is tricky, any type of global switch requires functional communication between nodes. We can signal a node to load new certs but we can't control when a node restarts, picking up new certs in the process. The gating process is effectively sighup/restart after pushing new certs. Loading multiple certs:We really want to support multiple files, especially for CA certs, having more than one is perfectly reasonable and allows for slow rollout. Multiple node cert files is debatable. If we always use the more recent valid one, we could overwrite the file, but if we want to allow not-yet-valid certs, we need more than one file. How to specify multiple files is open for debate. Both globs and auto-detection within a directory are fragile in the face of odd filenames, but still doable. We could make things simpler by having our PEM contain both certificate and private key, we currently move both around together anyway. Rotating certs for new connections:Refreshing CAs and node certs are easy through the standard tls.Config:
We need to make sure that those are properly used by grpc. We may need to go through grpc's |
Draft RFC for online certificate rotation can be found in #14254 |
This is the first step towards #1674. Introduces two new security objects: * certificate_loader: scans certsDir for certs/keys * certificate_manager: long-lived: builds tls.Config objects The certificate_manager is currently only used by the `cert debug-list` command and the current cert system remains in place. A few small notes: * embedded certs were renamed to the new naming scheme * most test changes are to use the new embedded assets setter
This is the first step towards #1674. Introduces two new security objects: * certificate_loader: scans certsDir for certs/keys * certificate_manager: long-lived: builds tls.Config objects The certificate_manager is currently only used by the `cert debug-list` command and the current cert system remains in place. A few small notes: * embedded certs were renamed to the new naming scheme * most test changes are to use the new embedded assets setter
Part of #1674. The `--certs-dir` flag replaces all the certs flags, with a naming scheme used to determine what type each file is. The user (needed for `client.<user>.{crt,key}`) comes from `base.Config.User`.
Part of #1674. The `--certs-dir` flag replaces all the certs flags, with a naming scheme used to determine what type each file is. The user (needed for `client.<user>.{crt,key}`) comes from `base.Config.User`.
Part of #1674 Watch for SIGHUP and rotate embedded server tls config.
Part of #1674. * use `--certs-dir` for all commands accepting certs * accept old certs flags for `start` command only, but log deprecation warning * rework certs creation logic * make `--insecure` always default to false and independent of `--host`
The certificate directory can now be re-read by issuing a SIGHUP signal to the server. Part of #1674.
Back in CockroachDB v1.1 (v17.2 in the new calver scheme), we introduced a certificate rotation mechanism. To help teach/troubleshoot that feature, we also provided a way for the operator to view the certificate details in the DB Console (expiration time, addresses, etc.) This work was done in PR cockroachdb#16087, to solve issues cockroachdb#15027/cockroachdb#1674. However, as part of that PR, the implementation of the back-end API also included the *data* of the cert (including the cert signature and the signature chain) in the response payload. This additional payload was never used in a user-facing feature: the DB Console does not display it nor does it contain a link to "download the cert file". The back-end API is not public either, so we are not expecting end-users to have legitimate uses for this feature. Meanwhile, leaking cert data through an API runs dangerously close to violating PCI guidelines (not quite, since keys are not exposed, but still...). So in order to avoid a remark on this during PCI review cycles, and to remove the chance this will be misused, this patch removes the data payload from the cert response. The DB Console screen corresponding to the original work remains unaffected. Release note: None
Back in CockroachDB v1.1 (v17.2 in the new calver scheme), we introduced a certificate rotation mechanism. To help teach/troubleshoot that feature, we also provided a way for the operator to view the certificate details in the DB Console (expiration time, addresses, etc.) This work was done in PR cockroachdb#16087, to solve issues cockroachdb#15027/cockroachdb#1674. However, as part of that PR, the implementation of the back-end API also included the *data* of the cert (including the cert signature and the signature chain) in the response payload. This additional payload was never used in a user-facing feature: the DB Console does not display it nor does it contain a link to "download the cert file". The back-end API is not public either, so we are not expecting end-users to have legitimate uses for this feature. Meanwhile, leaking cert data through an API runs dangerously close to violating PCI guidelines (not quite, since keys are not exposed, but still...). So in order to avoid a remark on this during PCI review cycles, and to remove the chance this will be misused, this patch removes the data payload from the cert response. The DB Console screen corresponding to the original work remains unaffected. Release note: None
83616: sql, sqlstats: create temporary stats container for all txns r=xinhaoz a=xinhaoz # Commit 1 ### sql: add txn_fingerprint_id to node_statement_statistics This commit adds the `txn_fingerprint_id` column to `crdb_internal.node_statement_statistics`. Release note (sql): `txn_fingerprin_id` has been added to `crdb_internal.node_statement_statistics`. The type of the column is NULL or STRING. ------------------------------ # Commit 2 ### sql, sqlstats: create temporary stats container for all txns Fixes: #81470 Previously, the stats collector followed different procedures for stats collection depending on whether or not the txn was explicit. For explicit transactions, all the stmts in the txn must be recorded with the same `transactionFingerprintID`, which is only known after all stmts in the txn have been executed. In order to record the correct txnFingerprintID, a temporary stats container was created for stmts in the current transaction. The `transactionFingerprintID` was then populated for all stmts in the temp container, and the temp container was merged with the parent. For implict transactions, the assumption was there would only be a single stmt in the txn, and so no temporary container was created, with stmts being written directly to the application stats. This assumption was incorrect, as it is possible for implicit txns to have multiple stmts, such as stmts sent in a batch. This commit ensures that stats are properly collected for implicit txns with multiple stmts. The stats collector now follows the same procedure for both explicit and implicit txns, creating a temporary container for local txn stmts and merging on txn finish. Release note (bug fix): Statement and transaction stats are now properly recorded for implicit txns with multiple stmts. 83762: docs: add MVCC range tombstones tech note r=sumeerbhola,jbowens,nicktrav a=erikgrinaker [Rendered version](https://github.com/erikgrinaker/cockroach/blob/mvcc-range-tombstones-tech-note/docs/tech-notes/mvcc-range-tombstones.md) --- This describes the current state, but the details will likely change as we iterate on the implementation. Resolves #83406. Release note: None 83873: tenantsettingswatcher: remove the version gate r=yuzefovich a=yuzefovich This commit removes the version gate for the tenant settings. Release note: None 83902: server: remove TLS cert data retrieval over HTTP r=catj-cockroach a=knz Back in CockroachDB v1.1 (v17.2 in the new calver scheme), we introduced a certificate rotation mechanism. To help teach/troubleshoot that feature, we also provided a way for the operator to view the certificate details in the DB Console (expiration time, addresses, etc.) This work was done in PR #16087, to solve issues #15027/#1674. However, as part of that PR, the implementation of the back-end API also included the *data* of the cert (including the cert signature and the signature chain) in the response payload. This additional payload was never used in a user-facing feature: the DB Console does not display it nor does it contain a link to "download the cert file". The back-end API is not public either, so we are not expecting end-users to have legitimate uses for this feature. Meanwhile, leaking cert data through an API runs dangerously close to violating PCI guidelines (not quite, since keys are not exposed, but still...). So in order to avoid a remark on this during PCI review cycles, and to remove the chance this will be misused, this patch removes the data payload from the cert response. The DB Console screen corresponding to the original work remains unaffected. For reference here's how the console screen looks: ![image](https://user-images.githubusercontent.com/642886/177591040-f554fdf0-2b04-48f6-af36-0b94c0bcaf4c.png) Co-authored-by: Xin Hao Zhang <[email protected]> Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Back in CockroachDB v1.1 (v17.2 in the new calver scheme), we introduced a certificate rotation mechanism. To help teach/troubleshoot that feature, we also provided a way for the operator to view the certificate details in the DB Console (expiration time, addresses, etc.) This work was done in PR cockroachdb#16087, to solve issues cockroachdb#15027/cockroachdb#1674. However, as part of that PR, the implementation of the back-end API also included the *data* of the cert (including the cert signature and the signature chain) in the response payload. This additional payload was never used in a user-facing feature: the DB Console does not display it nor does it contain a link to "download the cert file". The back-end API is not public either, so we are not expecting end-users to have legitimate uses for this feature. Meanwhile, leaking cert data through an API runs dangerously close to violating PCI guidelines (not quite, since keys are not exposed, but still...). So in order to avoid a remark on this during PCI review cycles, and to remove the chance this will be misused, this patch removes the data payload from the cert response. The DB Console screen corresponding to the original work remains unaffected. Release note: None
Back in CockroachDB v1.1 (v17.2 in the new calver scheme), we introduced a certificate rotation mechanism. To help teach/troubleshoot that feature, we also provided a way for the operator to view the certificate details in the DB Console (expiration time, addresses, etc.) This work was done in PR cockroachdb#16087, to solve issues cockroachdb#15027/cockroachdb#1674. However, as part of that PR, the implementation of the back-end API also included the *data* of the cert (including the cert signature and the signature chain) in the response payload. This additional payload was never used in a user-facing feature: the DB Console does not display it nor does it contain a link to "download the cert file". The back-end API is not public either, so we are not expecting end-users to have legitimate uses for this feature. Meanwhile, leaking cert data through an API runs dangerously close to violating PCI guidelines (not quite, since keys are not exposed, but still...). So in order to avoid a remark on this during PCI review cycles, and to remove the chance this will be misused, this patch removes the data payload from the cert response. The DB Console screen corresponding to the original work remains unaffected. Release note: None
We need a good way to handle cert/key rotation without having to restart all the nodes.
Pretty low priority.
The text was updated successfully, but these errors were encountered: