-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null resources are an anti-pattern #373
Comments
I'm happy to have a discussion on this, but please do some background research on the many failures which have happened before. See these issues to start: https://github.com/terraform-google-modules/terraform-google-project-factory/blob/master/docs/TROUBLESHOOTING.md#common-issues Terraform will not fail cleanly in cases where preconditions are not met. You can easily be left in a catastrophic and unrecoverable state where the project is half-created but cannot be. (In fact, based on your email thread, I'm pretty sure you've already encountered this.) Preconditions exist for a reason to prevent you from getting to that state. This is the only module we have it on because creating projects is the most likely to encounter these. |
Following up after offline discussion: If partial state on creation is not possible then the next best alternative is to move as many checks to the provider as possible. Billing account is one thing that could be easily checked there before the project create call is made. However in this case the precondition script should still be kept as checks like API checks are not possible in the provider. Whether or not a skip preconditions bool is added I will leave up to the maintainers, but it could be a good option to add with a default to false (i.e. run preconditions by default but allow user to explicitly bypass if they wish). |
I'm also of the belief that these It would make a lot more sense to add additional code into |
Agreed @kuwas. After discussing offline the current script is important in preventing irrecoverable states but hopefully we can resolve this in the provider. I have begun engaging the provider team on internal forums around this. cc @danawillow for posterity By gcloud I am guessing you mean the calls that get made to disable/delete the default compute engine service account? It is useful but I agree a non-gcloud solution would be ideal (perhaps along side the I do wish instead of being default to 'disable' that it defaulted to 'keep'. That would give users an option to use the functionality if needed while reducing the provisioners being run on the default path (even if 'disable' is a better default in principle). Not a severe issue though as it's easy enough to just set the field to whatever value you want. |
We'd definitely prefer a non-gcloud approach, but unfortunately the Terraform resource module is ill-suited to many of the things we want to accomplish. The main cases we have where
That being said, I don't think it's impossible. There is some precedent for adding resources which partially update/manage a cloud resource (ex. iam_member resources), though this is a bit different. @danawillow I'd be curious on your thoughts here. Unfortunately given the current state this is the best compromise we have. |
Both of those bullet points also have precedent in the provider. For deleting a resource created outside of Terraform, we have:
For updating/editing a resource which was created outside Terraform:
For GKE there are a few open issues that I know we're starting to take seriously and plan out how to make them happen safely: hashicorp/terraform-provider-google#1480 and hashicorp/terraform-provider-google#2373 are the ones that come to mind- I don't think there are any others open at the moment. Every single one of these examples is something we would prefer to just exist naturally in the API, but it's not always a priority for the product teams. Granted, we'd also have to prioritize them against our other open FRs, but I wouldn't discount them completely from getting in the provider. |
@danawillow Thanks for chiming in, glad to hear you're open to adding these to the provider directly. One question: do you think it's better to add these as parameters on an existing resource (ex. I'll try to take a stab at converting these over this quarter, if you're open to it. |
My concern with an approach like that is that it's hard to build the right dependency graph. It's natural to reference a |
That's true that it is slightly less intuitive but presumably any such resources will have a |
For longer term solutions is this something we can engage the core Terraform team in? I can't imagine multiple steps on creation being something unique to the google provider... The problematic behaviour seems to be the 'tainting' of the resource when an error is returned in creation after the id is set. So as long as we can signal to Terraform after the id is set to stop tainting the resource on any following errors we should be OK. |
If/when the core/sdk team allows providers to emit warnings to users instead of the only states being success and failure, then that would be another way of solving the problem. From my understanding, that's a long ways out though. We've engaged them in the past for other issues wrt taint (hashicorp/terraform#21652), but that specific issue ended up getting worked around in a different way. We can certainly bring this to their attention though. |
Just want to echo the removal of python and gcloud calls. We are ramping up on TF on GCP and would like to use Terraform Cloud, which has a limited runner where python and gcloud is not installed by default. Although this project handles gcloud to a certain extent, other projects in the terraform-google-modules org do not. Possibly Hashicorp including these as a default in Terraform cloud would help, or a better set of lower-level apis (or in the Google provider) to accomplish goals in this repo without the need for gcloud would help. And thanks for these patterns. It's very helpful for end users to see some form of best and recommended practices. |
Thanks for the data point. We definitely want to support Terraform Cloud, either by building native resources or (as you've seen) referencing the One ask: as you encounter other modules which don't work in Terraform Cloud please do file issues. At the very least, we can update them to use the gcloud module. |
@morgante thanks for being open to this I assume rather large refactor. As a data point, we're a company that would like to apply the cloud foundation toolkit to enable clients with a production grade GCP cloud quickly. This relies on a number of the modules in this github org, many of which again rely on this module. So it seems the project-factory is deeply nested in all of this. I ran into several issues already:
So while I really love the CFT initiative and would like to fully support it, this special behavior seems like a painful crack in the otherwise elegantly provided best-practice IaC toolkit. So solution oriented: Can you provide a plan on how to get the local exec code out of the module? I'm happy to help out on this but I guess it will require a decent amout of coordination from the google provider team, you guys etc. If you can provide an overview / plan and a set of tasks (maybe via a shared github issues tag), I'm happy to pick some of these up |
Since this issue was opened, I added a check for verifying billing account permission before a project is created in the Google provider itself. This was the main issue that I would run into that this module prevented. GCP also now has an org policy to deprivelege the default service account. Thus, I think the only remaining issue is a required API not being enabled. Given the above, I think we can safely remove a lot of the current checks and recommend the native solutions. For the API checks, further investigation is needed to implement a native solution. Perhaps it makes sense to move the preconditions check out of the core project factory module into its own module that is optionally called from the main module and can be bypassed by a flag? |
We could potentially move it into a separate module, but I think the priority should be on removing the need for it entirely. |
@morgante can you point me to some tasks that I could pick up? I don’t have the broad vision of all dependencies yet but I’m happy to pick up a piece or two if you guide me. |
I've opened #407 to track work needed to deprecate the script. |
The preconditions script tries to preventitively check for pre-requisites to avoid running into errors around permissions, unenabled apis, etc.
However, this is an anti-pattern in Terraform. First, if a user does not have access to a resource or an API is not enabled, then the apply command will fail and the GCP api will return a clear error indicating so. This is how every resource and vast majority of modules work (including CFT's other modules).
For APIs, the required APIs can just be enabled automatically if they are free and
disable_on_destroy
set to false to avoid them being disabled in the future if the module is removed/changed.The preconditions script also uses provisioners. Per Terraform docs the first heading on the provisioners page is "Provisioners are a Last Resort":
And none of the recommended use cases fit provisioners needs.
Preconditions script also uses Python, which adds extra complexity and dependencies in the underlying system. When something goes wrong it is not clear why without significant overhead to investigate (see #372).
Based on the arguments above, I recommend one of the following:
There is one genuinely very useful check done by the script currently, which is a check for access to the billing account being used. I have raised this issue with the provider team to see how this can be fixed in the provider directly.
The text was updated successfully, but these errors were encountered: