-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing state of dependent providers (partial fix for progressive apply) #27728
Comments
Hi @dak1n1! Thanks for sharing this use-case and proposal. Based on your write-up so far I'm afraid I'm having trouble following how this proposal differs in the implementation requirements from the existing partial apply proposal, but I suspect that my imagination is being limited by my understanding of the original proposal I wrote. If you'd be willing, it would be helpful to see a fuller example of what the user workflow would look like in the situation you're considering were we to implement your proposal, including what new configuration the user might write (if any) and what sequence of Terraform commands the user would run in order to achieve the goal you've described of replacing an existing Kubernetes cluster. Thanks again! |
Thanks for looking at my proposal! I'll show you what I have in mind. My proposal differs from progressive apply in that it only solves a part of the issue, specifically for cases where you have stacked resources, like an EKS cluster with Kubernetes resources stacked on top. Since Kubernetes users are hitting this issue so often, I wanted to support their use case, since it seems like a smaller subset of the larger problem, which may have a simpler solution than the one proposed in the Progressive Apply issue. My goal is to achieve this in the simplest manner possible, without making changes to the user's workflow, or any big architectural changes to Terraform. I didn't think to impose my own implementation idea here... But I'll give it a try. Though, disclaimer: I don't know the Terraform Core architecture at all. I'm only familiar with provider development. Here is the scenario: A user has two modules: The user makes an update to their eks-cluster module, which will cause the cluster to be recreated, which in turn causes the But if we were to establish a kind of dependency between the Kubernetes provider and the underlying cluster, such as in the example below, we could tell Terraform to delete the state of this provider any time the dependent resource is re-created.
This new field would trigger the deletion of the state for this provider when the resource it depends is marked for deletion. By deleting the state of the Kubernetes resources, the provider will never initialize with outdated credentials. This will solve the issue for anyone who is replacing a Kubernetes cluster. The apply will succeed the same as if it were the initial apply. Right now, the initial apply works reliably. But if you try to replace the underlying cluster, it will always fail, because during the plan phase, Terraform tries to read the Kubernetes resources using outdated credentials. My proposal is to create the conditions of an "initial apply" during resource deletion. This will also solve the problem of deleting a cluster that has outdated credentials. By deleting the state of the dependent resources, we can cover an edge case that is not solved by #27408. (Specifically, the case of long-running applies where the token expires before the Kubernetes resources can be deleted). The idea of stacking resources is also used here by AWS CloudFormations.
This proposal is a similar idea to this CloudFormation Stack (though not identical). It's just giving users the option to tell Terraform "don't worry about deleting these resources. They will disappear when the underlying cluster is deleted". Similar to deleting an RDS instance that hosts a database, the database itself will automatically disappear once the underlying RDS VM is removed. There's no need to call delete on all those dependent resources explicitly. Removing them from state is adequate. I know this isn't a fix for the whole issue faced with Progressive Apply, but I figured it might bring some faster relief to users who are struggling. Thanks for reading! |
I actually would like to close this issue, after doing some further reflection about this approach. While it seemed useful to break down this large, complex problem into a tiny chunk that could be solved, I'm not happy with the approach I'm proposing here. Specifically, it's because I ran into a scenario where simply deleting the Kubernetes provider's state prior to deleting the EKS cluster wasn't an adequate solution. One of the Kubernetes resources had created other cloud infrastructure, which was left orphaned with this approach. (Potentially, both Load Balancers and cloud storage volumes could be orphaned when the associated Kubernetes resources are not deleted properly). So I have a different idea that I think could be more effective. But it will take quite some time to prioritize collecting the information needed. TL;DR: I'll be back with better data at a later time. Thanks! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Current Terraform Version
Use-cases
I'm one of the maintainers of the Kubernetes and Helm providers. All Kubernetes and Helm resources are built on top of a Kubernetes cluster. I would like to be able to express this dependency in a Terraform config, so that users have a more intuitive experience with our providers. Many users want to do this in a single apply, which is a currently not possible.
A Kubernetes module or resource depends on the up-to-date credentials from cloud providers such as EKS, GKE, or AKS. The goal is to have all Kubernetes resources created on the new cluster when the underlying cluster is replaced, and to avoid initializing the provider with outdated credentials during this replacement process.
Attempted Solutions
I have some example Terraform modules for AKS, EKS, and GKE. Any one of them can be used as a reproducer for this issue. They each have a README which describes how to replace the cluster.
In order to successfully replace a cluster, without hitting a progressive apply issue, the user needs to manually run
terraform state rm module.kubernetes-config
. This removes all resources owned by the Kubernetes and Helm providers. Starting with a clean state like this allows theterraform apply
to replace the underlying cluster and create all the Kubernetes/Helm resources from scratch on the new cluster. Without this work-around, credentials from the old cluster are loaded into the Kubernetes/Helm providers (or perhaps omitted entirely), and the apply fails with authentication-related errors. This is one of the most common issues faced by our users.Currently, the work-around mentioned above is the only way to get a single-apply scenario to work. Alternatively, users can apply the Kubernetes/Helm changes separately from the underlying cluster. But targeting the underlying cluster module during apply does not seem to be adequate (
terraform apply -target=module.aks-cluster
) when there are Kubernetes/Helm resources in state already. It doesn't work for cluster replacement, specifically.We also can't work around it by adding a "recreate trigger" like the null provider has, because the problem comes into existence the moment the provider is initialized with the outdated credentials. So basically we're looking for a way to completely defer reading the Kubernetes/Helm resources until the new cluster exists. Removing the Kubernetes/Helm resources from state has been the only way to accomplish this so far.
Proposal
Have an option to remove all existing state for a provider or module.
My goal with this proposal is to make sure that the information being passed into the dependent provider is fully up-to-date before initializing the dependent provider. If that is impossible, then removing the state for the dependent provider seems to be a sufficient mechanism for accomplishing the same thing.
The new config option could look like replace_on_change. This would allow us to mark the Kubernetes resources as having been destroyed by the change to the EKS/GKE/AKS cluster. (This is literally what happens on the cluster... the delete/recreate of the cluster destroys all dependent resources on that cluster). So in effect, the Kubernetes provider wouldn't initialize until after the new cluster exists. It would remove all of the dependent resources from state, so that they can be re-created on the new cluster.
Some amount of provider dependency exists already. That's how we can create the Kubernetes cluster and then pass the authentication credentials from the cloud provider (aws, google, azure, etc) into the Kubernetes provider during the initial create.
This also works for destroys, but only if the information being passed into the dependent provider hasn't changed (host, certs, token, etc). Otherwise, we actually see failures with that too (the GKE or EKS token expires, and then
terraform destroy
fails). So having a way to express this dependency would also benefit us during destroys.References
This seems like an issue that impacts a lot of users. Here's what I found by just browsing for a couple hours. I'm sure there's many more:
helm_resource
state refresh terraform-provider-helm#315 (comment)The text was updated successfully, but these errors were encountered: