Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

security: support certificate revocation #29641

Closed
mberhault opened this issue Sep 6, 2018 · 17 comments · Fixed by #53218
Closed

security: support certificate revocation #29641

mberhault opened this issue Sep 6, 2018 · 17 comments · Fixed by #53218
Labels
A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@mberhault
Copy link
Contributor

mberhault commented Sep 6, 2018

Certificate are not currently checked for revocation, the only ways to revoke a specific certificate are:

  • drop the user, for client certs only (we don't have a login attribute either)
  • rotate the CA, re-issue all "desired" certificates and re-distributed

The former means that the user cannot be reused. The latter is incredibly disruptive.

We should support checking all certificates against CRLs (certificate revocation list, see RFC 5280, section 5).

There are a few issues:

  • the cockroach cert commands do not keep track of issued certificates, making it impossible to revoke them
  • online certificate (OCSP) checking is prone to many issues, including: latency, reliance on OCSP server being responsive
  • certificate revocation lists (CRL) must still be pushed/fetched and periodically renewed.
  • any system for revocation should work with certificates not issued using the cockroach cert tools
  • fail-open (assume certificate is still valid) when revocation status cannot be checked is incredibly dangerous. fail-hard (assume certificate is invalid) requires reliable mechanisms and good monitoring/alerting

Summary: RFC required.

edit: RFC here 50602

CC: @kannanlakshmi, @bdarnell

@mberhault mberhault self-assigned this Sep 6, 2018
@mberhault mberhault added C-wishlist A wishlist feature. A-security labels Sep 6, 2018
@bdarnell
Copy link
Contributor

bdarnell commented Sep 9, 2018

We probably need to support the standard protocols for compatibility with arbitrary user CAs, but we could also build in a system table to store revoked cert IDs similar to the web_sessions table.

@mberhault mberhault assigned rolandcrosby and unassigned mberhault Dec 18, 2019
@rolandcrosby rolandcrosby removed their assignment Jan 16, 2020
@knz knz added A-cli C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-wishlist A wishlist feature. labels Apr 1, 2020
@knz
Copy link
Contributor

knz commented Apr 1, 2020

It is time to revive this issue since we're planning on this for 20.2.

Here are the step forwards:

  • @aaron-crl to provide advice on best practices, and minimum requirements coming from the compliance roadmap.

  • @DuskEagle @vladdy to provide recommendations on how this feature should best look & feel from the CC SRE's perspective. Do we prefer reading from files? From a URL?

    Strawman: if we place the rev certs in the same dir as --certs and require a server restart to re-load the revocation list, is that sufficient? If not, why not?

  • @vladdy to describe any additional monitoring / metric requirements needed for SLI purposes.

Once we have some input we can move forward with a preliminary RFC.

@knz
Copy link
Contributor

knz commented Apr 1, 2020

@vladdy says: "I want my certs inside the cluster's logical data. If the revocation data becomes unavailable, it's OK if it doesn't work. I prefer not to use k8s configmaps, and I prefer not to fetch rev lists from a URL."

@vladdy
Copy link
Contributor

vladdy commented Apr 1, 2020

"prefer to fetch rev lists". The rest clearly captures my thoughts on the subject. Additionally, it is required to have a metric around how long ago the successful rev list update was (if the db will handle that) so that I can be aware if there are any problems with regular updates.

@knz
Copy link
Contributor

knz commented Apr 1, 2020

The danger with URL-based revocation is that the HTTP request to retrieve the revocation list must be secured too. Then there's a chicken-and-egg problem. How does the server ascertain if the http request is secure?

Consider the hypothetical scenario:

  1. rev list server is XXX
  2. a first request retrieves a (mistaken) rev certificate against XXX's TLS cert
  3. everything after that gets blocked because now all requests against XXX are refused as insecure

The option to disable TLS on that url is unpalatable because it introduces security vulnerabilitites. I wonder what the best practices are?

@DuskEagle
Copy link
Member

I have a couple of issues with fetching the revocation list from a URL:

  • Whichever services are used to host this revocation list become critical dependencies for the database. This increases the complexity of deploying CockroachDB.

  • If the URL hosting the certificate revocation list is unavailable, CockroachDB can only rely on the most recent revocation list it fetched from that URL. New certificates cannot be revoked until access is restored.

I would prefer CockroachDB expose a SQL interface for revoking certificates, and stores that data within the cluster.

@knz
Copy link
Contributor

knz commented Apr 1, 2020

@DuskEagle assuming the SQL interface was available in either case, any motivation to prefer in-cluster storage over files in a directory outside of the cluster?

@DuskEagle
Copy link
Member

Strawman: if we place the rev certs in the same dir as --certs and require a server restart to re-load the revocation list, is that sufficient? If not, why not?

A server restart is too heavy-handed. We support online certificate reloading (#1674), so let's suppose we used that. What are the potential problems with this approach?

  1. Each node's certs directory isn't guaranteed to contain the same contents. Even if CRDB is deployed using K8s, we cannot rely on a single configmap to store the certificate revocation list, because a multi-region CRDB cluster is deployed across multiple K8s clusters, each of which will have their own separate configmap. You will need to merge the certificate revocation lists in each individual node's into one master list within CockroachDB.

  2. This solution has increased operator complexity. Even if the certificate revocation lists from each node are merged within CockroachDB, an operator intent on keeping the --certs directory the same on each node will have to write each certificate revocation to multiple places.

@vladdy
Copy link
Contributor

vladdy commented Apr 1, 2020

@knz "The danger with URL-based revocation is that the HTTP request to retrieve the revocation list must be secured too" I don't think so. The list is signed by CA.

I like the idea of having the list inside of the db as well as that allows us to have one process which handles the update for entire cluster instead of each node being updated separately.

@DuskEagle the problem with storing static data reliable has been more or less solved. To begin we could start with some object storage as the hosting.

@DuskEagle
Copy link
Member

@DuskEagle assuming the SQL interface was available in either case, any motivation to prefer in-cluster storage over files in a directory outside of the cluster?

I'm not sure I understand this. Are you proposing using the SQL interface to write data to a directory outside of the cluster? Whatever this data directory location is, it needs to be available to all nodes, including nodes that are launched in the future. That seems complicated compared to storing it within the cluster.

@DuskEagle the problem with storing static data reliable has been more or less solved. To begin we could start with some object storage as the hosting.

I'm thinking not just of the Cockroach Cloud team, but also of regular users of CockroachDB, who will also have to set up this static hosting for their CRL. This is also more complicated for them. CockroachDB is already a database for storing data, what's the advantage of relying on blob storage for this?

@vladdy
Copy link
Contributor

vladdy commented Apr 1, 2020

I'm thinking not just of the Cockroach Cloud team, but also of regular users of CockroachDB, who will also have to set up this static hosting for their CRL. This is also more complicated for them. CockroachDB is already a database for storing data, what's the advantage of relying on blob storage for this?

If you host a real CA (i.e. an enterprise environment), you already publish your CRL anyway. This usually happens via HTTP(S). That's also the point why CRDB needs to just fetch it instead of relying on something what would be updating the db. This approach is easier to implement and integrate with.

@vladdy
Copy link
Contributor

vladdy commented Apr 1, 2020

Just wanted to add that for something without the distribution point, we could implement something like nodelocal://my.crl as the source of the list to be picked up locally regularly.

@DuskEagle
Copy link
Member

Okay, that makes sense. How would you feel if CRDB supported fetching the CRL from an HTTP(S)/s3/nodelocal endpoint, and wrote the contents to a system table like @bdarnell suggested? This handles integrating with existing CA setups, "split-brain" problems where different nodes see different CRLs, and allows us, if needed, to revoke certificates even when the CRL location is unavailable?

@vladdy
Copy link
Contributor

vladdy commented Apr 1, 2020

@DuskEagle that's what I fully support, if it is possible.

@aaron-crl
Copy link

@aaron-crl to provide advice on best practices, and minimum requirements coming from the compliance roadmap.

From a industry standard position, the most widely deployed case is that of web browsers which generally reach out for CRLs but fail open if they don't hear back in time.

Chrome addressed this issue by pushing CRLSets (https://dev.chromium.org/Home/chromium-security/crlsets) via application updates to ensure they could ensure key certificate revocation occurs in a manner that does not depend on the certs that might themselves be revoked.

I like the approach @DuskEagle collected with the caveat that the operator should be able to configure if they want to always query the CRL or have that occur as a scheduled event to reduce operational load by shedding some security.

@knz
Copy link
Contributor

knz commented Apr 17, 2020

@knz
Copy link
Contributor

knz commented Jun 24, 2020

RFC here: #50602

bdarnell added a commit to bdarnell/cockroach that referenced this issue Aug 21, 2020
This adds support for certificate revocation for organizations who run
an OCSP server for their CA.

Closes cockroachdb#29641

Release note (security update): Certificate revocation is now
supported via OSCP and the setting `security.ocsp.mode`.
bdarnell added a commit to bdarnell/cockroach that referenced this issue Aug 23, 2020
This adds support for certificate revocation for organizations who run
an OCSP server for their CA.

Closes cockroachdb#29641

Release note (security update): Certificate revocation is now
supported via OSCP and the setting `security.ocsp.mode`.
craig bot pushed a commit that referenced this issue Aug 24, 2020
53218: security: Add support for OCSP r=knz a=bdarnell

This adds support for certificate revocation for organizations who run
an OCSP server for their CA.

Closes #29641

Release note (security update): Certificate revocation is now
supported via OSCP and the setting `security.ocsp.mode`.

Still a WIP: includes the plumbing and the implementation, but I'm still working on tests. 

Co-authored-by: Ben Darnell <[email protected]>
@craig craig bot closed this as completed in 56c0cb7 Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-security C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants