Terraform passes incorrect provider through when provider depends on resources which need replacing #31520

gtmtech · 2022-07-26T23:56:05Z

Terraform Version

Terraform 1.2.5
AWS Provider 4.23.0

Description

I found an obscure but critical case where terraform does not use the correct provider when refreshing a resource. It is quite convoluted, but as terraform should NEVER mix providers up, I definitely feel its worth raising.

The problem happens with so-called dynamic providers, that is providers which are not immediately instantiated but are instantiated midway through a terraform run.

Recall that a provider without an alias is instantiated at beginning of the terraform run, but a provider with an alias can be made to instantiate part-way through the terraform run if the provider definition depends on resources yet to exist.

This is quite a cool feature that allows us to create a role, and then a provider based off that role to use in creating further resources as that role. This is a very useful way of provisioning multiple accounts for example all in one terraform run.

However I experienced a critical issue that all of a sudden, terraform switched back to using the initial provider to try and refresh the later resources that are supposed to be provisioned using the midway dynamic provider.

Here is some code to better illustrate.

Terraform Configuration Files

provider "aws" {
    alias = "foo"
    assume_role {
        role_arn     = "arn:aws:iam::111111111111:role/rolefoo"
        session_name = "terraform"
    }   
}

# create a role using the foo provider
resource "aws_iam_role" "bar" {
    provider = aws.foo
    name = "rolebar"
    managed_policy_arns = [ "arn:aws:iam::aws:policy/IAMFullAccess" ]
}

# now wait for role to be provisioned (because of aws eventual consistency)
resource "time_sleep" "wait_for_rolebar" {
    triggers = {
        name = aws_iam_role.bar.name
    }
    create_duration = "10s"
    depends_on = [ aws_iam_role.bar ]
}

# now dynamically create a bar provider
provider "aws" {
    alias = "bar"
    assume_role {
        role_arn     = time_sleep.ready.id != "" ? aws_iam_role.bar.arn : aws_iam_role.bar.arn
        session_name = "terraform"
    }
}

# now use the bar provider midrun to create something else
resource "aws_iam_policy" "policy1" {
    provider = aws.bar
    ...
}

All this works the first time you use it, provider foo creates rolebar, and provider bar - based off the new rolebar creates policy1 - great!

But now I realised my triggers were wrong, so I changed the time_sleep resource like this:

resource "time_sleep" "wait_for_rolebar" {
    triggers = {
        id = aws_iam_role.bar.id        # changes from name to id
    }
    create_duration = "10s"
    depends_on = [ aws_iam_role.bar ]
}

Now when I run terraform, I get very strange behaviour like this:

╷
│ Error: reading IAM Policy (arn:aws:iam::111111111111:policy/policy1): AccessDenied: User: arn:aws:sts::111111111111:assumed-role/rolefoo/[email protected] is not authorized to access this resource
│ 	status code: 403

That's odd I think - the code hasn't changed at all, and the IAM Policy policy1 is clearly set to use the aws.bar provider, not the aws.foo provider, so why is terraform trying to use the foo provider to refresh it?

When I turned on TF_LOG=debug, I saw absolutely no posts to the sts endpoint trying to assume the bar role.

What is happening here, is that provider bar depends on the time_sleep resource, and in this case, the time_sleep resource needs to be recreated, so the aws.bar provider which depends on that time_sleep resource cannot be evaluated until that happens.

But in the refresh of the whole plan, terraform has to try and refresh the iam policy policy1. Without an available aws.bar provider to use at this point (because it needs to apply the time_sleep before it can refresh the iam_policy policy1), terraform decided to unilaterally use the original provider foo.

This warrants a fix of some sort - it took absolutely ages to track down, and I would like to have seen a message that terraform couldn't refresh the policy1 resource because the underlying provider couldn't be created. That would have been much more helpful than mis-assigning the wrong provider to the refresh task.

When I finally figured out what was going wrong, I was able to target apply the time_sleep resources to fix the new triggers, and this "freed up" the internal terraform logic to create the aws.bar provider and refresh the policy1 with the correct provider again.

The text was updated successfully, but these errors were encountered:

jbardin · 2022-07-27T13:25:28Z

Hi @gtmtech,

I'm not sure exactly what's going in here from the description alone. Would it be possible to supply a trace log of the failed plan? If you use TF_LOG_CORE=trace it will skip all the provider output which avoids all the aws api calls being in the logs.

(probably unrelated, but depends_on = [ aws_iam_role.bar ] here is not doing anything, since time_sleep.wait_for_rolebar already refers to the aws_iam_role.bar resource, in case you intended for that to perform some other function)

Thanks!

gtmtech · 2022-11-04T08:17:21Z

@jbardin - sorry just coming back to this.

So when you have the following in a dependency chain:

Provider X
An Iam role defined that uses Provider X
A time_sleep 10s dependent on the Iam role
A Provider Y which references 2 as the role to assume, and 3 via some function-call (to create dependency on 3)
A resource that uses provider Y

.. and you have successfully terraformed the lot, but then a change to 2 requires 3 to be recreated (e.g. via triggers{} on 3 referencing an attribute from 2)...

If 3 needs to be recreated, terraform cant create provider Y for the refresh phase (as depends on 3, but 3 now needs recreating). During refresh phase, terraform assigns provider X (incorrectly) to try and refresh 5, rather than provider Y - as Y is no longer available. It should error or do something else, not assign the wrong provider.

jbardin · 2022-11-04T13:01:03Z

Hi @gtmtech,

I don't think the problem here is the use of the incorrect provider (at least there has been no evidence that could happen yet, but the logs would verify it), rather the configuration for the bar provider is going to contain unknown values during the next plan. While it may work in some cases, referencing managed resource from a provider is not supported, with the applicable excerpt from the docs being:

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources.

If the provider requires the full configuration in order to complete the plan operation, you won't be able to plan and apply a configuration like this without the use of `-target. When given a partially known configuration, most providers opt to attempt whatever operations they need during plan, but generally act as if the unknown value was unset so may fail outright or produce unexpected results.

gtmtech · 2022-11-22T13:32:12Z

Thanks @jbardin - I certainly did not realise that basing a provider on attributes of a managed resource was not supported behaviour as in most cases it works, but I appreciate the docs that you mention.

However, if it's not supported behaviour I would argue an error needs to be thrown - Assigning the wrong provider to the wrong resource is in my view of critical importance and should not be tolerated under any circumstances - for example this may lead to api requests being done against the wrong aws account, with all sorts of implications.

jbardin · 2022-12-07T16:50:02Z

I don't see any evidence here (yet) of the incorrect provider being assigned within Terraform, but an incomplete configuration could make it act like another provider configuration with the same defaults. If that's the case, there's not much we can do within the current protocol, because that is also a perfectly valid mode of operation for some providers.

gtmtech · 2022-12-20T12:46:07Z

I will try and record a video sometime to show you this error @jbardin 👍

wrschneider · 2023-01-24T15:43:30Z

I have seen exactly the same problem with Databricks workspaces.

azurerm_databricks_workspace resource
databricks provider depends on azurerm_databricks_workspace
databricks_cluster resource in databricks provider

If you make a change that causes the workspace resource to change, then Terraform isn't able to refresh the databricks_cluster resource because the provider doesn't exist yet. Often the error will be something that looks authentication-related and causes an enormous amount of confusion

Kubernetes providers tend to have the same problem.

I will make a feature request for a warning or more meaningful error -- the docs say not to do it, but the initial TF apply does work, and the error on subsequent apply is completely non-obvious.

It would be better to report a warning during the parse / validation phase.

jbardin · 2023-01-27T16:54:03Z

Since we don't have any evidence of an incorrect provider instance being used, I'm going to close this under the assumption that the behavior was an incomplete configuration for a provider.

Thanks!

github-actions · 2023-02-27T02:15:16Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

gtmtech added bug new new issue not yet triaged labels Jul 26, 2022

jbardin added waiting-response An issue/pull request is waiting for a response from the community waiting for reproduction unable to reproduce issue without further information and removed new new issue not yet triaged labels Jul 27, 2022

apparentlymart added the v1.2 Issues (primarily bugs) reported against v1.2 releases label Sep 16, 2022

wrschneider mentioned this issue Jan 24, 2023

Warning on providers that make unsafe references to resource attributes #32570

Open

jbardin closed this as completed Jan 27, 2023

github-actions bot locked as resolved and limited conversation to collaborators Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terraform passes incorrect provider through when provider depends on resources which need replacing #31520

Terraform passes incorrect provider through when provider depends on resources which need replacing #31520

gtmtech commented Jul 26, 2022 •

edited

Loading

jbardin commented Jul 27, 2022

gtmtech commented Nov 4, 2022 •

edited

Loading

jbardin commented Nov 4, 2022 •

edited

Loading

gtmtech commented Nov 22, 2022

jbardin commented Dec 7, 2022

gtmtech commented Dec 20, 2022

wrschneider commented Jan 24, 2023

jbardin commented Jan 27, 2023

github-actions bot commented Feb 27, 2023

Terraform passes incorrect provider through when provider depends on resources which need replacing #31520

Terraform passes incorrect provider through when provider depends on resources which need replacing #31520

Comments

gtmtech commented Jul 26, 2022 • edited Loading

Terraform Version

Description

Terraform Configuration Files

jbardin commented Jul 27, 2022

gtmtech commented Nov 4, 2022 • edited Loading

jbardin commented Nov 4, 2022 • edited Loading

gtmtech commented Nov 22, 2022

jbardin commented Dec 7, 2022

gtmtech commented Dec 20, 2022

wrschneider commented Jan 24, 2023

jbardin commented Jan 27, 2023

github-actions bot commented Feb 27, 2023

gtmtech commented Jul 26, 2022 •

edited

Loading

gtmtech commented Nov 4, 2022 •

edited

Loading

jbardin commented Nov 4, 2022 •

edited

Loading