Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certs: Rotate certs in 1-year #1856

Closed
stevesloka opened this issue Nov 5, 2019 · 11 comments
Closed

certs: Rotate certs in 1-year #1856

stevesloka opened this issue Nov 5, 2019 · 11 comments
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure.

Comments

@stevesloka
Copy link
Member

We should have Contour rotate the certs generated automatically before they expire if users are using the provided job today to create gRCP secure communication between Envoy<>Contour.

@annismckenzie
Copy link

annismckenzie commented Jan 10, 2020

To start, it'd be really nice if the job could be re-run without breaking anything when one upgrades to a newer Contour version. We had to disable the job after the first apply because it won't succeed again. It'd be reasonable to assume that one upgrades Contour at least a couple times a year so that job could transparently rotate the certs.

@davecheney
Copy link
Contributor

We had to disable the job after the first apply because it won't succeed again.

@youngnick could you please look into this. AFAIR this is is not the expected behaviour.

@youngnick
Copy link
Member

youngnick commented Jan 12, 2020

So, there were actually two problems there:

  • Jobs have an immutable spec, so when we did an upgrade, if you didn't delete the job, the apply would fail.
  • Deleting the secret is not enough to have the certificates be regenerated, you also need to delete the job and reapply.

@stevesloka has addressed these already in #2084. The certgen job will not rotate the certs at the moment, because it currently requires restarting all the Envoys and Contour to do it.

It turns out that this is pretty tricky to resolve and requires a clean drain mechanism for Envoy (#145) as well as storing the full CA keypair inside the cluster, which needs to be done carefully to ensure it's not easily reachable.

Right now the current rotation process if you're running Kubernetes 1.12+ is:

  • Delete the secret that holds the gRPC TLS keypair
  • Delete the contour-certgen job if it exists
  • Reapply the contour-certgen job (either by itself or as part of the whole YAML)
  • Restart all Contour pods
  • Restart all Envoy pods

Kubernetes will delete the completed job with the newest spec after it finishes.

There's also #2020 covering documenting this process properly as the first step towards automating it.

@jpeach jpeach added the area/deployment Issues or PRs related to deployment tooling or infrastructure. label Feb 9, 2020
@stevesloka
Copy link
Member Author

What if we move the cert generation to the Contour serve process as a new workgroup item. It could generate certs automatically (which is already in the Contour binary) if certs do not exist in the current namespace. Something would need to be a flag to tell Contour to generate certs or not. Maybe an arg like --generate-secret-name=myname and it would generate secrets named myname automatically, otherwise it would wait for the secret to exist.

It could also handle regenerating certs automatically before the 1-year window. Contour has support for automatically rotating certs thanks to #2198, but still needs work to allow Envoy to rotate its certs (#2143 (comment)).

  1. The old secret should be backed up in the event the regeneration fails, we don't want a broken Contour, but re-using the name should auto-refresh
  2. Until the Envoy rotate is complete, Contour could generate the certs for Envoy under a new name and change the Envoy DS to use this new secret, which would cause a rollout of Envoy cleanly
  3. We'd want a way to control when this change happens such that it fits into specific windows that users could buy into

@youngnick
Copy link
Member

If we do that, we are building a poor copy of cert-manager though. Adding this functionality into Contour is not necessary, there are other solutions for it, and, if you have clusters running for a year, you should be using something else.

This should not be automated by Contour.

@davecheney
Copy link
Contributor

Strongly seconded.

@stevesloka
Copy link
Member Author

Yup I don't disagree short of complexity with the other system getting it integrated.

Ok to close this issue then? Or should this issue really be about the integration with certmanager to solve?

@youngnick
Copy link
Member

I think between the changes @tsaarni has brought in and the doc updates you've already landed, this issue is actually redundant.

@davecheney
Copy link
Contributor

Personally I think we should document how to rotate a cert generated by cert-gen. That might be "delete the certificate, job, secret, and pods" noting that there may be a brief outage.

Then as future work, we document how to integrate contour into certs generate by vault or cert manager.

@youngnick
Copy link
Member

@stevesloka has a PR #2282 already for the rotation docs..

@stevesloka
Copy link
Member Author

I'm going to close this issue then. I think we have the other bits documented and called out in other PR's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deployment Issues or PRs related to deployment tooling or infrastructure.
Projects
None yet
Development

No branches or pull requests

5 participants