-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm Chart changes recreate gateway/ingress resources causing downtime #12554
Comments
Hi @slim-bean thanks for the interest. I wonder if it is worth doing something here since if we roll it back again I would have to face some small downtime again but I am open to discussion since I can be quick with the upgrade and a few seconds of downtime are all good for me. |
I'm trying to understand why it recreated the ingress, there wasn't actually any recent changes to that template. What version of the chart were you upgrading from? |
oh you explained why.... sorry still drinking coffee |
@tete17 what is your Release.Name in helm? I'm wondering what we should do here, on the one hand I think it's more consistent/correct for us to use this However, I don't know if there is any graceful way to make a change like this 😬 |
I render out the yaml using The ingresses name switched from I kind of agree that the consistency is a bonus and is good to have it. For me the bare minimum would be to amend the release notes/upgrade guide to reference this change so people can expect it, but I understand writing into a guide that this upgrade requires downtime on the most popular ingress controller out there is though. The problem to me here is that even if you disable the validation webhook for a moment for this upgrade it is still not a 0 downtime as you need to get at least one deployment pod ready before applying the ingress or there won't be any pod serving traffic to the new service. Alternatively if you are like me and use ArgoCD you can selectively sync/apply only the new deployment and then perform the full sync btu that is a complicated option and not one many people will be able to as selectively syncing helm pieces is probably complicated and you still need to instruct people to momentarily disable the admission webhook. Not a single easy choice if you ask me. |
@tete17 how did it originally fail, you applied the chart and got an error applying the ingress? were things still working at that point. Then the problem is you have to do a delete and recreate which when you do a delete causes 404's which are very bad. But there is a way to disable the validation webhook which would allow for the change without a delete? Working on some updates to the upgrade guide now. |
As far as I understand there could probably be a solution like this:
Yes, fore sure this is a big issue and needs introducing the "waitperiod" thing for all components which is (i suppose) big. This is just an idea but perhaps the experts might be interested :-) |
This changed 79b876b#diff-89f4fd98934eb0f277b921d45e4c223e168490c44604e454a2192d28dab1c3e2R4 forced the recreation of all the gateway resources:
Deployment
,Service
,PodDisruptionBudget
and most criticalIngress
.This is problematic for 2 reasons:
Originally posted by @tete17 in #12506 (comment)
The text was updated successfully, but these errors were encountered: