-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] unexpected removing all kafka resources when upgrade using helm3 #3877
Comments
The error (404 Not Found) suggests that your CRDs are removed. The Helm Charts are mostly contributed by users, so I'm not sure what is the problem. |
I’m using strimzi helm repo without any changes from my side. |
I just encountered the same situation. I'm running AWS EKS with Kubernetes 1.18 and when running update using helm, all the Strimzi CRDs were removed and not installed back. |
same here, using the chart from: https://strimzi.io/charts/ Looks related to this: I don't see a resolution to this that doesn't involve a kafka outage. Perhaps drop doing this install with helm is the best approach for me. |
Well, in 0.19.0 everyone complained that the Helm Chart index had the BTW: The Helm2 chart ( |
I've just hit the same issue on k3s single-node deployment. The CRDs seem to be managed by the helm chart so I guess there's something off there. Below the upgrade logs from the helm-operator:
Afterward, there's no CRDs to be found but it's weird that helm doesn't throw any errors and straight off starts removing all kafka components. So I think this is more for the helm chart maintainers than anything else and these upgrades have to be thoroughly tested as no one wants to inadvertently kill their entire kafka clusters when upgrading the operator. |
Deleting and re-creating the HelmRelease object helps with this, but the biggest problem here is that upgrades are not just broken but actually cause all kafka objects to be deleted, thus having quite an impact.
|
flux user here as well. My solution was to stop using the helmrelease operator and switch to using kustomize with the strimzi release yaml. The only pain is you have to patch 5 different ClusterRoleBinding / RoleBinding resources to use a non-default namespace. This could be made a bit easier by providing a set of kustomize manifests to users, but the overall process wasn't that bad. You WILL take an outage doing this, but at least the change can be pushed with flux to multiple clusters. Not so with removing helmrelease and then re-adding. You have to wait for every cluster to quiesce. That's a big outage depending on how large your setup is / how many clusters you have. This isn't really a Strimzi issue, but a Helm one. How hard is it to change helm3 to do the CRD check AFTER removing the templated resources from the previous install? I don't understand why this can't be done, and I've read a ton of helm tickets. |
as I understand from related issues there is no way to upgrade using helm3? |
I assume you can use the Helm2 Chart from https://github.com/strimzi/strimzi-kafka-operator/releases/tag/0.20.0 which probably doesn't change the CRDs? |
But Helm2 marked as deprecated, what will happen when we try to migrate to helm3 or upgrade to strimz 0.2x in the future |
Well, I assume that is what happened now. I do not know how Helm plans or does not plan to solve it TBH. |
It's not Helm that has to address this, it's strimzi helm3 implementation. While I understand that most of the chart is contributed by users, it's important to first acknowledge where the problem lies. So far evidence points to the helm3 strimzi charts and not helm itself. Also, until this is fixed, I don't see how anyone can really run it via helm in production, knowing that an upgrade might erase their entire cluster. It should be treated as a high prio or adoption will slow down. Sure, one can manage the deployment differently, but it defeats the purpose a little bit. |
@aneagoe I think the previous comments and linked issues suggested something else. But if this is a bug in the chart as you say, can you then explain what the fix is then or open a PR? |
I've reviewed the linked issues and the behavior observed does seem to be inline with helm functionality. It was an informed decision to move the CRD definitions from templates/ to crds/ in the Strimzi helm3 chart but unfortunately, this had quite an adverse effect on people upgrading. However, contrary to my initial assumption, this is a "one off" incident that won't happen going forward. The ideal scenario would be one where upgrade from versions that have CRDs defined in templates/ to versions that have them defined in crds/ would fail hard and require explicit consent, pointing out that all resources would be wiped. Or maybe simply remove to upgrade 0.19 or earlier to 0.20. My helm3 chart implementation is minimal so I can't help with a PR I'm afraid. |
There is no hands off way to do this unfortunately, and I really feel this is a helm3 issue to resolve. Certainly strimzi is not the only helm chart to manage CRD's in this way and require migration? |
Any updates? |
A quick workaround we found with our team:
|
There is also the option to edit the data in the helm secret instead of deleting it
The remove the CRD data inside the
Then the upgrade to 0.20.0 will leave the CRDs alone.. |
That's more precise, thanks! |
After the update, I get an unsupported cluster and operator, since I use the new lister, the old crd with version 0.19 remain. Tried updating with the 2to3 plugin but nothing came out. Link |
Just replace the old crd with the new ones. They should be compatible.
|
@driosalido Thanks, this helped to solve the problem. |
Yes, that's the problem of removing the CRD from the yaml manifest. Helm not longer controls what to do with them. |
Should it provide some backward compatible helm chart between helm2 and helm3?. in link https://www.infracloud.io/blogs/helm-2-3-migration/ it just add extra |
The CRDs are still part of the Helm Chart: https://github.com/strimzi/strimzi-kafka-operator/tree/main/helm-charts/helm3/strimzi-kafka-operator/crds They should not be needed to be installed separately and at least in case of clean install it is not needed. |
@scholzj But I tried with a clean install of 0.20, updating the operator from 0.20-> 0.21-0.22, but making the output kubectl get crd kafkas.kafka.strimzi.io -o yaml has not changed relative to version 0.20 |
But helm3 treats CRDs as external resources https://helm.sh/docs/chart_best_practices/custom_resource_definitions/ I think, the helm chart needs to follow these methods? |
I guess that explains why upgrade does not upgrade the CRDs (and makes me wonder even more than before why are people using Helm). But not sure what does it expect us to do - it sounds like all we can do is update the docs and add a note that people should update the CRDs manually? |
Probably it should be documented in Helm3 README |
Have you removed it manually? Still not clear for me. |
hotifx snippet:
then just upgrade helm release 0.19->0.20.1 |
Signed-off-by: Jakub Scholz <[email protected]>
* Update Helm upgrade docs - closes #3877 Signed-off-by: Jakub Scholz <[email protected]> * Apply suggestions from code review Signed-off-by: Jakub Scholz <[email protected]> Co-authored-by: PaulRMellor <[email protected]> * Review comments Signed-off-by: Jakub Scholz <[email protected]> Co-authored-by: PaulRMellor <[email protected]>
Describe the bug
I'm using strimzi operator v0.19.0 and tried upgrade to 0.20.0. When I've ran helm upgrade procedure all my resources (users, topics, clusters) was removed.
I try to reproduce problem with fresh installed cluster and situation was reproduced again.
To Reproduce
Steps to reproduce the behavior:
After the steps above my cluster and users/topics was removed. The operator pod try to start and crashed with the following error:
Expected behavior
The operator should be updated without removing resources.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: