Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add info about client side certificate rotation best practices. #1168

Open
bwplotka opened this issue Jan 2, 2019 · 13 comments
Open
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@bwplotka
Copy link

bwplotka commented Jan 2, 2019

Hi and Happy New Year All!

Thanks for great product. We use it on production for long time, but we want to focus to improve automation and avoid manual intervention during certificate renewal for our services. How to ensure Pod's server will actually reload certificate? Particuraly:

  • Certificate is close to expiry time.
  • Cert-manager renews and updates Kubernetes secret.
  • Kubernetes refresh Secret in desired pods, but they are still using the old certificate.

It's definitely not cert-manager issue, but it would be nice for cert-manager to incldue potential solutions to this problem as best practices.

There are multiple options like:
A) Ensure application can reload it "hitless"/non-distruptive. E.g you can implement that for Golang HTTP server, or hope that your service you use allows that (mostly they don't). For example envoy recently added that option: envoyproxy/envoy#1194
B) Some generic cert-rotate operator that will rolling restart stateless deployments to load new certificates? Maybe logic like this in cert-manager makes sense?
C) Have your rollout tools handle that? (ensure pods are restarted frequently)

What is common way of solving this problem? I guess A for less distruptive rotation possible, but what if it's 3rd party tool that does not support hot reload? I have searched gh issues, but haven't found relevant response.

Do you agree that some docs for best practices for this would be suitable in cert-manager documention?

Environment details (if applicable):

  • Kubernetes version (e.g. v1.10.2): v1.9+
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): GKE

/kind feature

@jetstack-bot jetstack-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 2, 2019
@munnerz
Copy link
Member

munnerz commented Jan 10, 2019

Happy new year! 🎉

I think this sort of thing would be great to add to our documentation - or at least notes summarising what you've put above, so that users can understand what they need to do and what their options are 😄

/kind documentation

@jetstack-bot jetstack-bot added the kind/documentation Categorizes issue or PR as related to documentation. label Jan 10, 2019
@munnerz munnerz added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed kind/feature Categorizes issue or PR as related to a new feature. labels Feb 7, 2019
@paultiplady
Copy link
Contributor

This is one of those subtle issues that isn't apparent from reading the intro docs, and will cause a full outage when it bites you. I think it's worth at least calling out as a "here be dragons" kind of message; whatever your chosen solution, if you haven't picked one, then you are probably going to have an outage at some point (usually coinciding with when your team is all on vacation, since that's when the code/deploy velocity will have dropped off).

(Not being overly-specific because this is exactly what happened to me or anything like that...) :)

@bwplotka
Copy link
Author

bwplotka commented Mar 4, 2019

I don't get @paultiplady what is the actual outcome of your comment (: Are you just ranting about fact that nothing works for 100%? Sure but can we focus on fixing this issue, to recommend or explain solution that will be closer to 100% than others?

@paultiplady
Copy link
Contributor

I'm adding a user use-case emphasizing that this is important to document, as it produces outages if it's not handled.

@rmb938
Copy link
Contributor

rmb938 commented Mar 6, 2019

So I just found this https://github.com/pusher/wave. It will watch for changes on configmaps and secrets for deployments and perform a rolling deploy when they get updated. So to go off of the example from the initial issue the following would happen:

  1. Create a Certificate
  2. Create a deployment with the wave annotation and use the certificate's secret in the deployment
  3. Cert-manager renews and updates Kubernetes secret
  4. Wave sees that the secret was update and performs a rolling deployment.

@bwplotka
Copy link
Author

bwplotka commented Mar 6, 2019

Nice, if that is production rdy then it looks really promising!

@retest-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2019
@retest-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

@jetstack-bot jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 4, 2019
@retest-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

@jetstack-bot
Copy link
Contributor

@retest-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PHameete
Copy link

Currently have to implement a solution for this as well and saw the recommendation for Wave above. I also ran into https://github.com/stakater/Reloader which does the same things but has more stars and looks easier to install.

@munnerz
Copy link
Member

munnerz commented Mar 20, 2020

/reopen
/remove-lifecycle rotten
/lifecycle frozen

@jetstack-bot
Copy link
Contributor

@munnerz: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jetstack-bot jetstack-bot reopened this Mar 20, 2020
@jetstack-bot jetstack-bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Mar 20, 2020
@munnerz munnerz added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

7 participants