security: support certificate revocation #29641

mberhault · 2018-09-06T14:22:39Z

Certificate are not currently checked for revocation, the only ways to revoke a specific certificate are:

drop the user, for client certs only (we don't have a login attribute either)
rotate the CA, re-issue all "desired" certificates and re-distributed

The former means that the user cannot be reused. The latter is incredibly disruptive.

We should support checking all certificates against CRLs (certificate revocation list, see RFC 5280, section 5).

There are a few issues:

the cockroach cert commands do not keep track of issued certificates, making it impossible to revoke them
online certificate (OCSP) checking is prone to many issues, including: latency, reliance on OCSP server being responsive
certificate revocation lists (CRL) must still be pushed/fetched and periodically renewed.
any system for revocation should work with certificates not issued using the cockroach cert tools
fail-open (assume certificate is still valid) when revocation status cannot be checked is incredibly dangerous. fail-hard (assume certificate is invalid) requires reliable mechanisms and good monitoring/alerting

Summary: RFC required.

edit: RFC here 50602

CC: @kannanlakshmi, @bdarnell

The text was updated successfully, but these errors were encountered:

bdarnell · 2018-09-09T23:54:02Z

We probably need to support the standard protocols for compatibility with arbitrary user CAs, but we could also build in a system table to store revoked cert IDs similar to the web_sessions table.

knz · 2020-04-01T14:42:19Z

It is time to revive this issue since we're planning on this for 20.2.

Here are the step forwards:

@aaron-crl to provide advice on best practices, and minimum requirements coming from the compliance roadmap.
@DuskEagle @vladdy to provide recommendations on how this feature should best look & feel from the CC SRE's perspective. Do we prefer reading from files? From a URL?

Strawman: if we place the rev certs in the same dir as --certs and require a server restart to re-load the revocation list, is that sufficient? If not, why not?
@vladdy to describe any additional monitoring / metric requirements needed for SLI purposes.

Once we have some input we can move forward with a preliminary RFC.

knz · 2020-04-01T14:43:43Z

@vladdy says: "I want my certs inside the cluster's logical data. If the revocation data becomes unavailable, it's OK if it doesn't work. I prefer not to use k8s configmaps, and I prefer ~~not~~ to fetch rev lists from a URL."

vladdy · 2020-04-01T14:47:13Z

"prefer to fetch rev lists". The rest clearly captures my thoughts on the subject. Additionally, it is required to have a metric around how long ago the successful rev list update was (if the db will handle that) so that I can be aware if there are any problems with regular updates.

knz · 2020-04-01T14:53:21Z

The danger with URL-based revocation is that the HTTP request to retrieve the revocation list must be secured too. Then there's a chicken-and-egg problem. How does the server ascertain if the http request is secure?

Consider the hypothetical scenario:

rev list server is XXX
a first request retrieves a (mistaken) rev certificate against XXX's TLS cert
everything after that gets blocked because now all requests against XXX are refused as insecure

The option to disable TLS on that url is unpalatable because it introduces security vulnerabilitites. I wonder what the best practices are?

DuskEagle · 2020-04-01T14:56:52Z

I have a couple of issues with fetching the revocation list from a URL:

Whichever services are used to host this revocation list become critical dependencies for the database. This increases the complexity of deploying CockroachDB.
If the URL hosting the certificate revocation list is unavailable, CockroachDB can only rely on the most recent revocation list it fetched from that URL. New certificates cannot be revoked until access is restored.

I would prefer CockroachDB expose a SQL interface for revoking certificates, and stores that data within the cluster.

knz · 2020-04-01T15:03:06Z

@DuskEagle assuming the SQL interface was available in either case, any motivation to prefer in-cluster storage over files in a directory outside of the cluster?

DuskEagle · 2020-04-01T15:04:46Z

Strawman: if we place the rev certs in the same dir as --certs and require a server restart to re-load the revocation list, is that sufficient? If not, why not?

A server restart is too heavy-handed. We support online certificate reloading (#1674), so let's suppose we used that. What are the potential problems with this approach?

Each node's certs directory isn't guaranteed to contain the same contents. Even if CRDB is deployed using K8s, we cannot rely on a single configmap to store the certificate revocation list, because a multi-region CRDB cluster is deployed across multiple K8s clusters, each of which will have their own separate configmap. You will need to merge the certificate revocation lists in each individual node's into one master list within CockroachDB.
This solution has increased operator complexity. Even if the certificate revocation lists from each node are merged within CockroachDB, an operator intent on keeping the --certs directory the same on each node will have to write each certificate revocation to multiple places.

vladdy · 2020-04-01T15:05:04Z

@knz "The danger with URL-based revocation is that the HTTP request to retrieve the revocation list must be secured too" I don't think so. The list is signed by CA.

I like the idea of having the list inside of the db as well as that allows us to have one process which handles the update for entire cluster instead of each node being updated separately.

@DuskEagle the problem with storing static data reliable has been more or less solved. To begin we could start with some object storage as the hosting.

DuskEagle · 2020-04-01T15:19:04Z

@DuskEagle assuming the SQL interface was available in either case, any motivation to prefer in-cluster storage over files in a directory outside of the cluster?

I'm not sure I understand this. Are you proposing using the SQL interface to write data to a directory outside of the cluster? Whatever this data directory location is, it needs to be available to all nodes, including nodes that are launched in the future. That seems complicated compared to storing it within the cluster.

@DuskEagle the problem with storing static data reliable has been more or less solved. To begin we could start with some object storage as the hosting.

I'm thinking not just of the Cockroach Cloud team, but also of regular users of CockroachDB, who will also have to set up this static hosting for their CRL. This is also more complicated for them. CockroachDB is already a database for storing data, what's the advantage of relying on blob storage for this?

vladdy · 2020-04-01T15:23:57Z

I'm thinking not just of the Cockroach Cloud team, but also of regular users of CockroachDB, who will also have to set up this static hosting for their CRL. This is also more complicated for them. CockroachDB is already a database for storing data, what's the advantage of relying on blob storage for this?

If you host a real CA (i.e. an enterprise environment), you already publish your CRL anyway. This usually happens via HTTP(S). That's also the point why CRDB needs to just fetch it instead of relying on something what would be updating the db. This approach is easier to implement and integrate with.

vladdy · 2020-04-01T15:29:18Z

Just wanted to add that for something without the distribution point, we could implement something like nodelocal://my.crl as the source of the list to be picked up locally regularly.

DuskEagle · 2020-04-01T16:30:04Z

Okay, that makes sense. How would you feel if CRDB supported fetching the CRL from an HTTP(S)/s3/nodelocal endpoint, and wrote the contents to a system table like @bdarnell suggested? This handles integrating with existing CA setups, "split-brain" problems where different nodes see different CRLs, and allows us, if needed, to revoke certificates even when the CRL location is unavailable?

vladdy · 2020-04-01T16:34:46Z

@DuskEagle that's what I fully support, if it is possible.

aaron-crl · 2020-04-02T19:46:16Z

@aaron-crl to provide advice on best practices, and minimum requirements coming from the compliance roadmap.

From a industry standard position, the most widely deployed case is that of web browsers which generally reach out for CRLs but fail open if they don't hear back in time.

Chrome addressed this issue by pushing CRLSets (https://dev.chromium.org/Home/chromium-security/crlsets) via application updates to ensure they could ensure key certificate revocation occurs in a manner that does not depend on the certs that might themselves be revoked.

I like the approach @DuskEagle collected with the caveat that the operator should be able to configure if they want to always query the CRL or have that occur as a scheduled event to reduce operational load by shedding some security.

knz · 2020-04-17T12:29:33Z

Example Go code found here:
https://github.com/cloudflare/cfssl/blob/master/revoke/revoke.go

knz · 2020-06-24T12:52:08Z

RFC here: #50602

This adds support for certificate revocation for organizations who run an OCSP server for their CA. Closes cockroachdb#29641 Release note (security update): Certificate revocation is now supported via OSCP and the setting `security.ocsp.mode`.

53218: security: Add support for OCSP r=knz a=bdarnell This adds support for certificate revocation for organizations who run an OCSP server for their CA. Closes #29641 Release note (security update): Certificate revocation is now supported via OSCP and the setting `security.ocsp.mode`. Still a WIP: includes the plumbing and the implementation, but I'm still working on tests. Co-authored-by: Ben Darnell <[email protected]>

mberhault self-assigned this Sep 6, 2018

mberhault added C-wishlist A wishlist feature. A-security labels Sep 6, 2018

bdarnell mentioned this issue Apr 10, 2019

pgwire: verifying user passwords is too aggressive #36160

Closed

mberhault assigned rolandcrosby and unassigned mberhault Dec 18, 2019

rolandcrosby removed their assignment Jan 16, 2020

knz added A-cli C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-wishlist A wishlist feature. labels Apr 1, 2020

bdarnell mentioned this issue Aug 21, 2020

security: Add support for OCSP #53218

Merged

craig bot closed this as completed in 56c0cb7 Aug 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: support certificate revocation #29641

security: support certificate revocation #29641

mberhault commented Sep 6, 2018 •

edited by knz

Loading

bdarnell commented Sep 9, 2018

knz commented Apr 1, 2020 •

edited

Loading

knz commented Apr 1, 2020 •

edited

Loading

vladdy commented Apr 1, 2020

knz commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

knz commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020 •

edited

Loading

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020

vladdy commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020

aaron-crl commented Apr 2, 2020

knz commented Apr 17, 2020

knz commented Jun 24, 2020

security: support certificate revocation #29641

security: support certificate revocation #29641

Comments

mberhault commented Sep 6, 2018 • edited by knz Loading

bdarnell commented Sep 9, 2018

knz commented Apr 1, 2020 • edited Loading

knz commented Apr 1, 2020 • edited Loading

vladdy commented Apr 1, 2020

knz commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

knz commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020 • edited Loading

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020

vladdy commented Apr 1, 2020

DuskEagle commented Apr 1, 2020

vladdy commented Apr 1, 2020

aaron-crl commented Apr 2, 2020

knz commented Apr 17, 2020

knz commented Jun 24, 2020

mberhault commented Sep 6, 2018 •

edited by knz

Loading

knz commented Apr 1, 2020 •

edited

Loading

knz commented Apr 1, 2020 •

edited

Loading

vladdy commented Apr 1, 2020 •

edited

Loading