-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terraform helm provider in extremely slow to read helm_releases when used more #1156
Comments
Please suggest any workaround atleast if there is no solution as of yet Regards, |
Thanks for opening this @pranavnateri. Refreshing the state can take some time if you have many |
hi @jrhouston Thats exactly what i mentioned in the steps to reproduce section. I am doing the target apply which i do not want to. Also, The issue is same with or without -refresh=false while applying the terraform. Increasing the timeout value of the terraform helm provider resource to 600 is also not helping. So taking too much time when multiple helm releases are there doesnt help because it will timeout after a certain while which is a bug. Please suggest if there are any alternatives atleast until its fixed(PS: im already doing target apply because of this issue) Regards, |
hi @jrhouston ... can you please update on this? why this is an issue? was this not tested? Regards, |
@jrhouston ?? |
@jrhouston @pranavnateri are there any updates on this? I am hitting the same issue. |
Yes.. i still have the issue and no one is replying. there is no proper support @oscardalmau-r3 |
I also noticed that this has changed at some version. It used to be fast. I can also see that until 30-40 seconds it is not even creating a namespace. Hard to say what is causing it - it is also not installing any CRD and the kubernetes cluster is in local network (AWS EKS). Currently using version 2.10.1. |
I did some testing: How I called the resourse:
The helm_release uses a helm chart that is in the local filesystem (no chart registry download). Also the package does not have any CRD installation. The chart does have chart dependencies. The time described for resources appearing is the resource creation message time shown. For example: The time shown as total is the total time for terraform(init, apply, destroy...). It had a few AWS resources(took 1-2 seconds) and time needed to download the providers. Provider 2.1.2 took ~20s for namespace+pods to appear. (total 153s apply+destroy) Since version 2.5.0 it went from 20s to 2minutes for any resources to appear in kubernetes. As a workaround I am now using 2.4.1. If you have upgraded to higher version and can't delete the resource then I am not sure how to downgrade it. I hope finding the version where it broke helps to locate the issue. For sure there is some sort of problem with this. |
Massive +1 for this issue with more background why reverting to 2.4.1 is painful or impossible: bundled Helm client is ancient there and makes modern charts incompatible (e.g. Traefik requires Helm > 3.9.0 since November 2022, yet so old Helm seems not to work with modern k8s version for some reason). I use just a couple of charts, but my cluster is very remote, so the slowness makes it time out very often. I'd be happy to help with the issue, but the size of diff between 2.4.1 and 2.5.0 is so large I have no idea where to start. If needed, I can try to tweak my deployment, so I get rid of everything that requires newer Helm and verify the problem lies with this particular version. For the moment, I can confirm that BTW, Traefik Helm chart v24.0.0 seems to be a good test candidate as it creates a large amount of CRDs. |
+1 |
@pranavnateri have you seen/tried the option exposed in this PR? It specifically mentions slowness due to excessive CRDs from Crossplane, but may be worth trying even if it's not your specific issue. I'll be trying this option out as well. |
Thank you! This solved it for us. We do have crospslane and many other CRD-s. Tested with provider 2.11.0.
|
It worked for me too! That default value 100 from Helm itself seems comically low for any real deployment to me. However, it looks like the Terraform provider cannot interpret valid throttling messages from the Helm library and crashes instead of providing useful information (or attempting to retry the operation). |
+1 for this issue, i m getting timeouts for only 2 helm releases being created the first time.
|
+1 in my case it seems to be quite random as behaviour, sometimes it takes just a few seconds, others minutes. |
this seemed to be caused by large number of CRDs on the server. I notice that
We are running into this issue in vcluster, so I am linking it to this issue: loft-sh/vcluster#1588 |
A configurable timeout parameter would be nice to address issues when /openapi/v3 endpoint takes > 30 seconds to respond: #463 Edit: unfortunately, this seems to not be possible in base helm library: helm/helm#9805 My current workaround for this issue is to use terragrunt as a wrapper, and use the auto-retry feature: https://terragrunt.gruntwork.io/docs/features/auto-retry/, usually on the 2nd pass, the /openapi/v3 endpoint is fast to respond |
This is becoming more of a major issue even with less than 80 CRDs in the cluster and setting |
Terraform, Provider, Kubernetes and Helm Versions
Affected Resource(s)
Terraform Configuration Files
Below for example. i have posted a few.. but put around 40 helm releases and check. A single variable change in tfvars to apply, the provider just takes its own time to refresh, eventually to just timeout.
Debug Output
NOTE: In addition to Terraform debugging, please set HELM_DEBUG=1 to enable debugging info from helm.
Panic Output
Steps to Reproduce
Actual Behavior
Important Factoids
NO
References
NO
Community Note
The text was updated successfully, but these errors were encountered: