Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault provider lookup-self on 127.0.0.1 instead of provided vault address in plan phase #829

Open
t3hami opened this issue Jul 24, 2020 · 8 comments
Labels

Comments

@t3hami
Copy link

t3hami commented Jul 24, 2020

Hi there,

I'm using Terraform to create GKE cluster, deploy vault helm charts in the cluster, initialise vault and then create policies, auth, secrets etc. I'm passing vault address (data.kubernetes_service.vault.load_balancer_ingress.0.ip) to vault provider which is coming from kubernetes service data source. The problem is when I use terraform plan, terraform hits some local URL https://127.0.0.1:8200/v1/auth/token/lookup-self instead of going to the URL which will be fed by the kubernetes service data source (The thing is terraform can't use this as the GKE isn't deployed and the kubernetes service data depends on that). When I set VAULT_ADDR to my local vault, it passes the error and then I use terraform apply which also works fine. The terraform documentation says that it automatically handles the depends_on graph when you use data from one resource to another, as it knows what to create first. I need a way to ignore vault lookup-self at the time of terraform plan.

Vault provider

provider "vault" {
  address = "http://${data.kubernetes_service.vault.load_balancer_ingress.0.ip}"
}

Note: I'm using depends_on = [null_resource.vault_init] in all vault resources.

@catsby catsby added the bug label Jul 24, 2020
@matttrach
Copy link

matttrach commented Oct 1, 2020

I have found that if you want to provision Vault and configure it in the same Terraform file/directory it does not handle the dependencies properly as @t3hami described.

I would like to use modules which provision servers and modules which configure Vault after those servers are ready, but this provider doesn't seem to respect module dependency.

I am using Terraform 13 and module dependencies as described here:
https://github.com/hashicorp/terraform/tree/guide-v0.13-beta/module-depends

@GJKrupa
Copy link

GJKrupa commented Oct 14, 2020

I'm seeing the same thing if I run an import and pass a hard-coded variable for the address into my submodule so I don't think this is a dependency issue

provider "vault" {
  address      = var.vault_url
  token        = var.vault_token
  ca_cert_file = "certs/my-ca.pem"
}

What it is using is the VAULT_ADDR environment variable if that's set.

@mcanevet
Copy link

mcanevet commented Oct 24, 2020

Same problem here. I looked at the code but I could not figure out exactly what happens.
I can see here that the provider configuration function calls DefaultConfig function of the api which configure the client to use http://127.0.0.1:8200 as default address here.
I guess that at the time the provider is configured, address is empty, and hence it does not override with the proper server address.
I'm not sure on what side this should be fixed, the terraform provider or the API.

The thing I'm wondering is how when the provider initialization is supposed to be done. The pattern of configuring a provider with outputs of a resource clearly work for some providers (an example with the Kubernetes provider here), but it clearly does not work with the Vault provider (even with proper dependencies set to every vault resources).

Maybe some advices from a Terraform ninja could be welcome here. /cc @apparentlymart

@mcanevet
Copy link

I have a minimal working example to reproduce this : https://gist.github.com/mcanevet/f698b53a32ac28a03b729c40d9d07b9f
When removing vault_* resources, it works, but when trying to create vault_* resources I get Error: Get "https://127.0.0.1:8200/v1/auth/token/lookup-self": dial tcp 127.0.0.1:8200: connect: connection refused.
If I add back the lines after the Vault is up, everything works fine.

@mcanevet
Copy link

mcanevet commented Nov 3, 2020

I think this "feature" is not officially supported yet (hashicorp/terraform#4149), but somehow works for some providers.

@apparentlymart
Copy link
Contributor

I think the root cause here is that the current Terraform SDK (which has no real name of its own, but we often call it helper/schema) doesn't handle the case when provider arguments are unknown, and instead treats them as if they aren't set at all. A provider that then tries to make use of these values in its configuration step can run into trouble, because it can mistakenly apply a default value as seems to be happening here with the vault provider assuming 127.0.0.1.

A way that other providers manage to avoid this situation is by deferring their connection until later on, when they are ready to perform an operation. For example, the hashicorp/mysql provider doesn't connect to the server until it's performing a real action, such as creating an object. Because most operations in that provider don't happen until the apply step, it rarely encounters the situation where its configuration is incomplete.

The vault provider could potentially take a similar strategy, but I don't think it would work out so well for this provider because it has a lot of data sources that are typically read during planning, and so that requires the provider configuration to be complete even to complete the plan operation.

I'm not familiar enough with the SDK implementation details to know if there's some way for the vault provider to actually detect when its address argument is unknown and treat that as different than it being unset. If so, it could potentially return an error explaining that the address argument must be known during planning, similar to what Terraform itself generates for unknown values in count and for_each, so it would at least fail explicitly rather than just doing something confusing and unexpected.


As @mcanevet noted, hashicorp/terraform#4149 is one way this might be addressed in the long run, by deferring certain operations entirely until a first round of changes have been applied. There is no plan to implement that in the short term because it's a significant change to Terraform's typical workflow (it might be necessary to run terraform apply multiple times to fully apply the plan, which is unprecedented), and once the Terraform Core team has more time to research that area further we're hoping to find other technical designs that don't have that disadvantage, though it remains to be seen what other designs are possible.

As I proposed it, hashicorp/terraform#4149 is basically the same as running Terraform with the -target option except that Terraform would calculate the necessary targets automatically and print out information about what it excluded and why. Given that, you can get the same effect today by explicitly adding the -target option. That is inconvenient when you're running Terraform in automation, but in practice I've seen that workaround work for most folks because typically it's only needed once when initially bootstrapping a configuration, unless they end up later on recreating some foundational object like a Kubernetes cluster or MySQL server.

Splitting the configuration into two parts that can be applied separately in sequence is the most robust, repeatable answer with today's Terraform.

@seanamos
Copy link

seanamos commented Sep 2, 2022

For those who are running into this, there is a workaround.
You can set skip_child_token = true.

Be aware of the potential security implications when using this workaround:
https://registry.terraform.io/providers/hashicorp/vault/latest/docs#skip_child_token
https://registry.terraform.io/providers/hashicorp/vault/latest/docs#using-vault-credentials-in-terraform-configuration

It seems the vault provider wants to do a token capabilities lookup, probably to check if it can create child tokens, but this happens regardless of resource dependencies. It will do this lookup out of order and can end up using empty/default values (address/token).

I know providers having dependencies isn't really something Terraform fully supports at the moment.
However, it is something that can be supported/worked around at an individual provider level. The Vault provider is very close to having this working already, it just needs some changes around how it handles child tokens.

The Consul/Nomad providers don't have the same issue and do work well already with Terraform's existing resource dependencies.

@4FunAndProfit
Copy link

works with opentofu 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants