-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research alternative lifecycles to support resources which can't work with create_before_destroy
#35763
Comments
Hi @natedogith1, There's going to be some problems we first need to overcome to do something like this, which would probably require major changes to not just Terraform, but providers and the protocol they use. The biggest hurdle is that since there's no value to use in the first Given the amount of work that it would take to implement a new resource lifecycle in core, and create a new protocol to implement across providers and the frameworks they use, it's generally far less friction to update providers to work best with the current protocol. With few exceptions, resources should all be designed with the ability to handle |
I would expect most resources have a user-specified unique id (e.g. google_compute_instance's name) that would cause |
Many resources do require a unique name, but have an input like |
From my relatively limited experience, it seems only a few resources have something like name_prefix. And switching over to a randomly generated name because some dependent might need create_before_destroy isn't ideal. I don't know Terraform's internals, but something like create_before_destroy that instead only applies on deletion, and thus shouldn't need to propagate, might be easier to implement and would likely handle most of the use cases. Actually, update -> destroy instead of the current default of destroy -> update seems logically sounds at first glance, so it might be possible to have that as a default. Though I don't know if there'd be issues with re-creation causing the order of update/destroy to be swapped. |
The orderings we have for dependency operations in Terraform are not arbitrary, they are the result of requiring a deterministic, consistent ordering with no dependency cycles. If you make the destroy operation depend on the update, then you have a cycle because the create also depends on the destroy. This is why you need I made some notes here with simplified examples to explain the basics of destroy ordering: https://github.com/hashicorp/terraform/blob/main/docs/destroying.md |
I don't think there would be a cycle if destroy depending on update was conditional on if it was a re-deploy. Though it would mean the ordering would be dependent on if it's a redeploy, which might not be desirable as a default. Could there be an option that behaves like create_before_destroy except it also removes the dependency between the creation and deletion of the replaced resource? That seems like it would allow consistent edges of the dependency graph without introducing cycles and without requiring propagating the change. |
Having an order of operations which is conditional is not desirable specifically because it is not consistent. The ordering is important for correct resource behavior, and if that was different based on combinations of changes it would not be possible to write a configuration with predictable results. You can't just rely on example situations with only a couple operations, you have to imagine these combined with hundreds of other arbitrary operations on resources, locals, variables, outputs, modules, providers, data sources, etc, all interconnected and all working in all valid permutations. Removing the dependency between the creation and deletion isn't possible, both because we guarantee that order, and removing the order means we don't know which one will happen first. That leaves things in a nondeterministic state, where what you want may work, but only some of the time while possibly breaking something else. |
Terraform currently guarantees an ordering between creation and deletion, but lets the user specify what the ordering is between them. I don't see what the issue is with not knowing the order, if the resource supports create_before_destroy, I would expect any order would work, similar to how I assume resources without dependencies between them are already handled. |
All related actions need a defined order, because they do not happen in isolation. Not being ordered might work in small examples, but causes nondeterministic behavior in complex ones. As I mentioned before, you have to incorporate this within arbitrarily large configurations, where there are other things dependent on both the destroy and the create operations, and using any generic resource type designed with whatever constraints were possible within the ordering we have guaranteed. |
If all actions need a defined order, how does Terraform order things that have no detected dependencies between each other? I would have assumed they have no defined order and Terraform would do them in parallel. |
Yes, if there is no dependency, direct or transitive, then the operations are concurrent. |
If Terraform supports running actions concurrently, I'm not sure what the problem would be with letting the user tell Terraform that a specific resource can be created and destroyed at the same time (similar to how the user can already tell Terraform that a resource can be created before it's destroyed). |
Working through an a <- b <- c dependency graph where b has "create_before_destroy except unlink b's create and destroy so it doesn't need to propagate), I see how that could cause a loop (c -> b -> a -> a_destroy -> b_destroy -> c). I think lifeycle.replace_triggered_by could be used to work around this issue without propagating the changes. I still think splitting update into optional pre-delete and post-create would be a good idea, rather than requiring a redeploy (the provider could either assume all removals are deletions and need to happen in pre-delete, or it could be informed what terraform is deleting (identified using something like the import id) so it could handle removal of non-destroyed items in post-create). |
For google_compute_backend_service, it looks like there's an in-progress issue in the provider to work around it by updating the url map from the backend service resource: hashicorp/terraform-provider-google#4543. |
If you make the destroy operation depend on the update, then you have an impossible ordering, the cycle is kind of the side effect of the undefined order and most of the time just "breaking" that cycle is usually not the correct resolution. If you then remove the edge between the create and destroy, while that may be unordered in isolation, you have implicitly recreated the same Changing the order also has ramifications on all upstream resource operations too, because it's reversing an edge between destroy and create subgraphs in the same way as There may be other valid patterns to use for certain resources, but Terraform needs to work with any resource which follows the lifecycle we've defined. Use cases outside of the defined lifecycle may also be possible, but that requires redefining what Terraform can do and the contract it has with providers. |
I think the issue is that update is logically a combination of create and destroy, but Terraform doesn't include the destroy dependencies. I expect it can get away with this either because dependencies are often to produced values rather than to the underlying resource or because providers are working around the problem making changes to resources not mentioned in the plan (e.g. google_compute_disks seems to look for all VMs that use it and detaches itself from them). |
I'm not sure where you are going now, but it sounds like we are back in the realm of "redesign terraform" rather than add a specific enhancement. We have an API contract with providers to maintain the order we have defined already, changing that is a breaking change to all providers, so relatively small conceptual changes may actually require major architectural work. There will always be some use cases which can't fit into a generically defined workflow, but we've found that most resources can be made to fit this workflow which has enabled Terraform to interoperate across so many provider types. Also take into consideration that this is a system which has been refined over the course of nearly a decade, so there is an immense body of work which has already been thoroughly considered (sometimes even mistakingly implemented and then removed). Not to say that there aren't changes that can still be made (I have #35553 and #35481 in mind here), but as time goes on, novel alterations which fit into the existing system become very few and far between. |
Could an optional pre-destroy update be added? That doesn't seem like it should be a breaking change unless all API changes are breaking changes.
Is it an issue if they always re-write such changes? The provider might even be able to detect if a change needs to be re-written to happen in pre-destroy if it's told what resources are being destroyed (e.g. google_compute_instance could only detatch disks in pre-destroy that terraform indicates the disks are going to be destroyed). |
We don't have a mechanism to plan for multiple independent updates. You can see our replies to a similar question starting here: #31707 (comment) The issue with providers being re-written to adopt such a change is getting developers to rewrite the providers to support a new protocol. As we've learned from supporting other complex use cases which can already be done with the current lifecycle, providers will be written via the path of least resistance, and complex code to support more rare edge cases via a new part of the protocol is unlikely to be adopted en masse (a new part of the protocol which is strictly hypothetical at this point and not proven to be possible with any appreciable number of resources). This is all hinging on the argument that some resources don't support |
I think lots of resource couldn't support |
While in our experience most resources can be made to work with This means that to make use of something like this in practice, it would require multiple conditions:
So yes, I agree there may be an issue for a small subset of resources, but it's not a simple problem, and there is no way to just drop in a new order of operations. Rewriting large portions of the most intricate parts of Terraform, along with any of the resources which could be downstream from problematic resources in all the providers is a major undertaking. We can leave this here as an open request for research ideas, but there is unlikely to be anything we can do in the near term. |
create_before_destroy
Terraform Version
Use Cases
Some resources need to be removed from their dependents before they can be deleted.
I've encountered this issue with google_monitoring_alert_policy referencing google_monitoring_notification_channel and google_compute_region_url_map referencing google_compute_region_backend_service. There are also several github issues from encountering this problem.
Attempted Solutions
create_before_destroy has frequently been offered as a solution. This can work if there's not name conflicts. The problem is that, by design, this change propagates to the entire dependency graph, which means it can't be used if any dependency doesn't support create_before_destroy.
Proposal
Add support for removing resources from dependents before destroying. The create_before_destroy documentation should probably also be updated to mention how it affects ordering with other resources being updated.
References
The text was updated successfully, but these errors were encountered: