Null resources are an anti-pattern #373

umairidris · 2020-02-07T21:53:47Z

The preconditions script tries to preventitively check for pre-requisites to avoid running into errors around permissions, unenabled apis, etc.

However, this is an anti-pattern in Terraform. First, if a user does not have access to a resource or an API is not enabled, then the apply command will fail and the GCP api will return a clear error indicating so. This is how every resource and vast majority of modules work (including CFT's other modules).

For APIs, the required APIs can just be enabled automatically if they are free and disable_on_destroy set to false to avoid them being disabled in the future if the module is removed/changed.

The preconditions script also uses provisioners. Per Terraform docs the first heading on the provisioners page is "Provisioners are a Last Resort":

However, they also add a considerable amount of complexity and uncertainty to Terraform usage.

We do not recommend using provisioners for any of the use-cases described in the following sections.
Even if your specific use-case is not described in the following sections, we still recommend attempting to solve it using other techniques first, and use provisioners only if there is no other option.

And none of the recommended use cases fit provisioners needs.

Preconditions script also uses Python, which adds extra complexity and dependencies in the underlying system. When something goes wrong it is not clear why without significant overhead to investigate (see #372).

Based on the arguments above, I recommend one of the following:

deprecate and remove preconditions entirely
add a field to enable preconditions (preferably default to false)

There is one genuinely very useful check done by the script currently, which is a check for access to the billing account being used. I have raised this issue with the provider team to see how this can be fixed in the provider directly.

The text was updated successfully, but these errors were encountered:

morgante · 2020-02-07T21:58:12Z

I'm happy to have a discussion on this, but please do some background research on the many failures which have happened before.

See these issues to start: https://github.com/terraform-google-modules/terraform-google-project-factory/blob/master/docs/TROUBLESHOOTING.md#common-issues

Terraform will not fail cleanly in cases where preconditions are not met. You can easily be left in a catastrophic and unrecoverable state where the project is half-created but cannot be. (In fact, based on your email thread, I'm pretty sure you've already encountered this.)

Preconditions exist for a reason to prevent you from getting to that state. This is the only module we have it on because creating projects is the most likely to encounter these.

umairidris · 2020-02-07T23:54:53Z

Following up after offline discussion:
It would be ideal if the google_project resource was able to use partial state on creation to get around a project only partially being created. If that is possible then preconditions script would no longer be needed. Need to follow up with provider team on this.

If partial state on creation is not possible then the next best alternative is to move as many checks to the provider as possible. Billing account is one thing that could be easily checked there before the project create call is made. However in this case the precondition script should still be kept as checks like API checks are not possible in the provider.

Whether or not a skip preconditions bool is added I will leave up to the maintainers, but it could be a good option to add with a default to false (i.e. run preconditions by default but allow user to explicitly bypass if they wish).

kuwas · 2020-02-08T02:29:37Z

I'm also of the belief that these gcloud cli and python calls cause more problems than they solve. Almost every time I run into an issue with the project-factory or kubernetes-engine modules, during create or destroy, its because of these null_resource local-execs.

It would make a lot more sense to add additional code into terraform-google-provider to solve these gaps in functionality. Terraform just does not work well when interacting with things outside of the provider.

umairidris · 2020-02-08T03:34:30Z

Agreed @kuwas. After discussing offline the current script is important in preventing irrecoverable states but hopefully we can resolve this in the provider. I have begun engaging the provider team on internal forums around this.

cc @danawillow for posterity

By gcloud I am guessing you mean the calls that get made to disable/delete the default compute engine service account? It is useful but I agree a non-gcloud solution would be ideal (perhaps along side the auto_create_network field in google_project we could add a new default_service_account field?). Though I am sure we would want to solve the existing issues with google_project before adding even more stuff in there :)

I do wish instead of being default to 'disable' that it defaulted to 'keep'. That would give users an option to use the functionality if needed while reducing the provisioners being run on the default path (even if 'disable' is a better default in principle). Not a severe issue though as it's easy enough to just set the field to whatever value you want.

morgante · 2020-02-11T23:19:48Z

We'd definitely prefer a non-gcloud approach, but unfortunately the Terraform resource module is ill-suited to many of the things we want to accomplish. The main cases we have where gcloud is needed:

Deleting a resource which was automatically created outside Terraform (ex. deleting the default SA)
Updating/editing a resource which was created outside Terraform (ex. updating DNS masq on Kubernetes)

That being said, I don't think it's impossible. There is some precedent for adding resources which partially update/manage a cloud resource (ex. iam_member resources), though this is a bit different. @danawillow I'd be curious on your thoughts here.

Unfortunately given the current state this is the best compromise we have.

danawillow · 2020-02-12T00:33:50Z

Both of those bullet points also have precedent in the provider. For deleting a resource created outside of Terraform, we have:

google_project.auto_create_network
google_container_cluster.remove_default_node_pool
google_sql_database_instance deletes the default root user (not configurable)

For updating/editing a resource which was created outside Terraform:

google_access_context_manager_service_perimeter_resource
google_compute_attached_disk
google_compute_network_peering_routes_config
IAM binding/member resources

For GKE there are a few open issues that I know we're starting to take seriously and plan out how to make them happen safely: hashicorp/terraform-provider-google#1480 and hashicorp/terraform-provider-google#2373 are the ones that come to mind- I don't think there are any others open at the moment.

Every single one of these examples is something we would prefer to just exist naturally in the API, but it's not always a priority for the product teams. Granted, we'd also have to prioritize them against our other open FRs, but I wouldn't discount them completely from getting in the provider.

morgante · 2020-02-12T04:34:58Z

@danawillow Thanks for chiming in, glad to hear you're open to adding these to the provider directly.

One question: do you think it's better to add these as parameters on an existing resource (ex. google_project.auto_create_network) or to create a new resource? Thus far the pattern seems to be to include this behavior in the parent resource, but I fear that ends up with very complicated resources which have a higher potential for error. I wonder if isolating the behavior into dedicated resources could be better, even if it's not idiomatic Terraform.

I'll try to take a stab at converting these over this quarter, if you're open to it.

rileykarson · 2020-02-12T17:47:07Z

I wonder if isolating the behavior into dedicated resources could be better, even if it's not idiomatic Terraform.

My concern with an approach like that is that it's hard to build the right dependency graph. It's natural to reference a google_project directly, but with fine-grained resources for these kinds of changes users would really want to reference google_project_remove_default_service_account to ensure that unmanaged children were deleted.

morgante · 2020-02-12T21:05:02Z

My concern with an approach like that is that it's hard to build the right dependency graph. It's natural to reference a google_project directly, but with fine-grained resources for these kinds of changes users would really want to reference google_project_remove_default_service_account to ensure that unmanaged children were deleted.

That's true that it is slightly less intuitive but presumably any such resources will have a project_id attribute as well. In fact, I might argue it's actually clearer to depend on google_project_remove_default_service_account if you're doing something which can only work after that's happened.

umairidris · 2020-02-13T16:20:22Z

For longer term solutions is this something we can engage the core Terraform team in? I can't imagine multiple steps on creation being something unique to the google provider...

The problematic behaviour seems to be the 'tainting' of the resource when an error is returned in creation after the id is set. So as long as we can signal to Terraform after the id is set to stop tainting the resource on any following errors we should be OK.

danawillow · 2020-02-13T21:27:21Z

If/when the core/sdk team allows providers to emit warnings to users instead of the only states being success and failure, then that would be another way of solving the problem. From my understanding, that's a long ways out though.

We've engaged them in the past for other issues wrt taint (hashicorp/terraform#21652), but that specific issue ended up getting worked around in a different way. We can certainly bring this to their attention though.

mhamrah · 2020-02-24T01:24:33Z

Just want to echo the removal of python and gcloud calls. We are ramping up on TF on GCP and would like to use Terraform Cloud, which has a limited runner where python and gcloud is not installed by default. Although this project handles gcloud to a certain extent, other projects in the terraform-google-modules org do not.

Possibly Hashicorp including these as a default in Terraform cloud would help, or a better set of lower-level apis (or in the Google provider) to accomplish goals in this repo without the need for gcloud would help.

And thanks for these patterns. It's very helpful for end users to see some form of best and recommended practices.

morgante · 2020-02-25T02:08:18Z

Thanks for the data point. We definitely want to support Terraform Cloud, either by building native resources or (as you've seen) referencing the gcloud module.

One ask: as you encounter other modules which don't work in Terraform Cloud please do file issues. At the very least, we can update them to use the gcloud module.

pascalwhoop · 2020-04-26T15:28:52Z

@morgante thanks for being open to this I assume rather large refactor.

As a data point, we're a company that would like to apply the cloud foundation toolkit to enable clients with a production grade GCP cloud quickly. This relies on a number of the modules in this github org, many of which again rely on this module. So it seems the project-factory is deeply nested in all of this.

I ran into several issues already:

using github actions, we're hitting issues as my personal laptop's PATH is somehow creeping into the tfstate files. This crashes on github actions
gcloud wasn't available on the CI, which is not a terraform pattern (it usually pulls in all it needs through GO binaries)
errors are hard to decipher with the local exec activities.
a terraform plan honestly doesn't tell me anything of what's actually going to run on the machine. Just trusting the Googlers here to not do any funny business

So while I really love the CFT initiative and would like to fully support it, this special behavior seems like a painful crack in the otherwise elegantly provided best-practice IaC toolkit.

So solution oriented: Can you provide a plan on how to get the local exec code out of the module? I'm happy to help out on this but I guess it will require a decent amout of coordination from the google provider team, you guys etc. If you can provide an overview / plan and a set of tasks (maybe via a shared github issues tag), I'm happy to pick some of these up

umairidris · 2020-04-26T16:38:20Z

Since this issue was opened, I added a check for verifying billing account permission before a project is created in the Google provider itself. This was the main issue that I would run into that this module prevented.

GCP also now has an org policy to deprivelege the default service account.

Thus, I think the only remaining issue is a required API not being enabled.

Given the above, I think we can safely remove a lot of the current checks and recommend the native solutions. For the API checks, further investigation is needed to implement a native solution. Perhaps it makes sense to move the preconditions check out of the core project factory module into its own module that is optionally called from the main module and can be bypassed by a flag?

morgante · 2020-04-27T17:13:19Z

We could potentially move it into a separate module, but I think the priority should be on removing the need for it entirely.

pascalwhoop · 2020-04-27T17:53:47Z

@morgante can you point me to some tasks that I could pick up? I don’t have the broad vision of all dependencies yet but I’m happy to pick up a piece or two if you guide me.

morgante · 2020-04-27T20:19:10Z

I've opened #407 to track work needed to deprecate the script.

umairidris mentioned this issue Feb 7, 2020

Add doc explaining preconditions.py #374

Merged

umairidris mentioned this issue Feb 13, 2020

project: check for billing account permissions as pre-requisite hashicorp/terraform-provider-google#5671

Closed

chrisst mentioned this issue Feb 18, 2020

Google project precheck GoogleCloudPlatform/magic-modules#3145

Merged

This was referenced Feb 19, 2020

Google project precheck hashicorp/terraform-provider-google#5719

Merged

Google project precheck hashicorp/terraform-provider-google-beta#1777

Merged

danawillow mentioned this issue Feb 19, 2020

Ability to prevent taints on create errors hashicorp/terraform-plugin-sdk#330

Open

morgante changed the title ~~Preconditions script is an anti-pattern~~ Null resources are an anti-pattern Mar 30, 2020

morgante mentioned this issue Apr 27, 2020

Remove preconditions script #407

Closed

11 tasks

morgante closed this as completed Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null resources are an anti-pattern #373

Null resources are an anti-pattern #373

umairidris commented Feb 7, 2020 •

edited

Loading

morgante commented Feb 7, 2020

umairidris commented Feb 7, 2020

kuwas commented Feb 8, 2020 •

edited

Loading

umairidris commented Feb 8, 2020 •

edited

Loading

morgante commented Feb 11, 2020

danawillow commented Feb 12, 2020 •

edited

Loading

morgante commented Feb 12, 2020 •

edited

Loading

rileykarson commented Feb 12, 2020

morgante commented Feb 12, 2020

umairidris commented Feb 13, 2020

danawillow commented Feb 13, 2020

mhamrah commented Feb 24, 2020

morgante commented Feb 25, 2020

pascalwhoop commented Apr 26, 2020

umairidris commented Apr 26, 2020

morgante commented Apr 27, 2020

pascalwhoop commented Apr 27, 2020

morgante commented Apr 27, 2020 •

edited

Loading

Null resources are an anti-pattern #373

Null resources are an anti-pattern #373

Comments

umairidris commented Feb 7, 2020 • edited Loading

morgante commented Feb 7, 2020

umairidris commented Feb 7, 2020

kuwas commented Feb 8, 2020 • edited Loading

umairidris commented Feb 8, 2020 • edited Loading

morgante commented Feb 11, 2020

danawillow commented Feb 12, 2020 • edited Loading

morgante commented Feb 12, 2020 • edited Loading

rileykarson commented Feb 12, 2020

morgante commented Feb 12, 2020

umairidris commented Feb 13, 2020

danawillow commented Feb 13, 2020

mhamrah commented Feb 24, 2020

morgante commented Feb 25, 2020

pascalwhoop commented Apr 26, 2020

umairidris commented Apr 26, 2020

morgante commented Apr 27, 2020

pascalwhoop commented Apr 27, 2020

morgante commented Apr 27, 2020 • edited Loading

umairidris commented Feb 7, 2020 •

edited

Loading

kuwas commented Feb 8, 2020 •

edited

Loading

umairidris commented Feb 8, 2020 •

edited

Loading

danawillow commented Feb 12, 2020 •

edited

Loading

morgante commented Feb 12, 2020 •

edited

Loading

morgante commented Apr 27, 2020 •

edited

Loading