Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot resolve role ARN when regional STS endpoint is configured #7418

Closed
hamishforbes opened this issue Sep 4, 2019 · 15 comments · Fixed by #7922
Closed

Cannot resolve role ARN when regional STS endpoint is configured #7418

hamishforbes opened this issue Sep 4, 2019 · 15 comments · Fixed by #7922
Assignees
Labels
bug Used to indicate a potential bug ecosystem secret/aws

Comments

@hamishforbes
Copy link

Describe the bug
If the AWS Auth client has a regional STS endpoint configured then resolving IAM role ARNs fails with a region scoping error.

I think this must be related to this code: https://github.com/hashicorp/vault/blob/v1.2.2/builtin/credential/aws/backend.go#L252

As the error reports a different invalid region if you repeat the command, until eventually by luck it matches and the role can be created.

Note if you disable resolve_aws_unique_ids the role will create fine, but any attempt to login against that role shows the same behaviour, repeated region scoping failures with different region names until by luck there's a match and the client logs in.

To Reproduce
With default configuration

~> vault read auth/aws/config/client
Key                           Value
---                           -----
access_key                    n/a
endpoint                      n/a
iam_endpoint                  n/a
iam_server_id_header_value    n/a
max_retries                   -1
sts_endpoint                  n/a

~> vault write auth/aws/role/test auth_type=iam bound_iam_principal_arn=arn:aws:iam::123456789:role/test_role resolve_aws_unique_ids=true
Success! Data written to: auth/aws/role/test

With custom STS endpoint

~> vault delete auth/aws/role/test
Success! Data deleted (if it existed) at: auth/aws/role/test

~> vault write auth/aws/config/client sts_endpoint=https://sts.eu-west-1.amazonaws.com
Success! Data written to: auth/aws/config/client

~> vault read auth/aws/config/client
Key                           Value
---                           -----
access_key                    n/a
endpoint                      n/a
iam_endpoint                  n/a
iam_server_id_header_value    n/a
max_retries                   -1
sts_endpoint                  https://sts.eu-west-1.amazonaws.com

~>> vault write auth/aws/role/test auth_type=iam bound_iam_principal_arn=arn:aws:iam::123456789:role/test_role resolve_aws_unique_ids=true
Error writing data to auth/aws/role/test: Error making API request.

URL: PUT https://123.123.123.123:8200/v1/auth/aws/role/test
Code: 400. Errors:

* unable to resolve ARN "arn:aws:iam::123456789:role/test_role" to internal ID: unable to fetch current caller: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'us-east-2'.
	status code: 403, request id: bb26e15e-cf0e-11e9-aaa3-adf801a6bb29

~> vault write auth/aws/role/test auth_type=iam bound_iam_principal_arn=arn:aws:iam::123456789:role/test_role resolve_aws_unique_ids=true
Error writing data to auth/aws/role/test: Error making API request.

URL: PUT https://123.123.123.123:8200/v1/auth/aws/role/test
Code: 400. Errors:

* unable to resolve ARN "arn:aws:iam::123456789:role/test_role" to internal ID: unable to fetch current caller: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'us-west-2'.
	status code: 403, request id: bd893bac-cf0e-11e9-bd06-a3cc45a73eac


~> vault write auth/aws/role/test auth_type=iam bound_iam_principal_arn=arn:aws:iam::123456789:role/test_role resolve_aws_unique_ids=true
Success! Data written to: auth/aws/role/test

Expected behavior
Should use the region from the configured endpoint (or us-east-1 if not configured?)

Environment:

  • Vault Server Version: 1.2.2
  • Vault CLI Version: 1.2.2
  • Server Operating System/Architecture: Ubuntu 18.04

Workaround
For me I've reset the STS endpoint to default and am having to specify region=us-east-1 in all AWS login commands.
This works but is a bit annoying because my Vault server and 99% of clients are in eu-west-1 and it would be preferable to not have to set the region parameter!

@tyrannosaurus-becks
Copy link
Contributor

Hi @hamishforbes , thanks for opening this issue.

I'm curious if setting export AWS_REGION=eu-west-1 on your Vault servers, then restarting them would solve the issue for you. In looking here, it suggests that may work. If not, I suspect we'll need to incorporate this code into the AWS auth method.

Could you take a look and maybe give it a try and see if it solves things for you?

@hamishforbes
Copy link
Author

Hi, Is that doco just talking about picking up the correct creds and region for the client to authenticate itself to AWS? Which is working, my Vault servers correctly pick up creds via the EC2 instance profile.

Setting the AWS_REGION variable hasn't made a difference to the ARN resolving problem.

vault.staging:~# grep AWS /etc/systemd/system/vault.service
Environment=AWS_REGION=eu-west-1

vault.staging:~# ps auxf | grep vault
vault    26579  0.6  9.3 193056 188504 ?       SLsl 08:34   0:01 /usr/local/bin/vault server -config=/etc/vault/vault.json

vault.staging:~# cat /proc/26579/environ
LANG=C.UTF-8PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOME=/etc/vaultLOGNAME=vaultUSER=vaultINVOCATION_ID=0282af6ed1114108a3877e4b9564781dJOURNAL_STREAM=9:1123624AWS_REGION=eu-west-1

~> vault read auth/aws/config/client
Key                           Value
---                           -----
access_key                    n/a
endpoint                      n/a
iam_endpoint                  n/a
iam_server_id_header_value    n/a
max_retries                   -1
sts_endpoint                  https://sts.eu-west-1.amazonaws.com

~> vault write auth/aws/role/test auth_type=iam bound_iam_principal_arn=arn:aws:iam::123456789:role/test resolve_aws_unique_ids=true
Error writing data to auth/aws/role/test: Error making API request.

URL: PUT https://123.123.123.123:8200/v1/auth/aws/role/test
Code: 400. Errors:

* unable to resolve ARN "arn:aws:iam::123456789:role/test" to internal ID: unable to fetch current caller: SignatureDoesNotMatch: Credential should be scoped to a valid region, not 'eu-central-1'.
	status code: 403, request id: 6a37057f-d081-11e9-98d9-bf5270b6ccbf

I'm not sure if using GetOrDefaultRegion here is the right approach either.
For example if my Vault server is in eu-west-2 but I configure the STS endpoint as eu-west-1 we would sign requests as eu-west-2 and send them to eu-west-1, which wouldn't work either?

Should it not just try and parse the region out of the sts_endpoint if configured, else use us-east-1?

Currently the request is signed using a random region from the current AWS partition (normal, gov, china)
If this is sent to the global sts.amazonaws.com endpoint then it works because that endpoint accepts any region?
If it sends this to a regional endpoint (sts.eu-west-1.amazonaws.com) it fails (unless the random region happens to match) because that endpoint only accepts that region.
This seems like a straightforward bug, if the STS endpoint is configured with a region you shouldn't pick a random region to sign requests being sent to it.

There's also a related issue with Vault CLI and Vault server default behaviours.
Assuming the random region bug is resolved the Vault server will, with no additional configuration, use the global STS endpoint and sign everything using us-east-1.

The Vault CLI, with no additional configuration, looks up your local region from environment/metadata and signs using that.
Given IAM objects are global in AWS as a user my expectation is that I shouldn't need to configure regional stuff at all.
Obviously the reality is that STS stuff is regional, but as a user it would be nice not to have to worry about that!

This is less of a bug and more of a usability / doco thing I think?

IMO the client should default to using us-east-1 when logging in unless explicitly set to use another region (by passing region into the command, not just from metadata).

I think this makes more sense as a user, if you don't configure anything then it doesn't matter which region the server and client are in, they both sign using us-east-1 and everything works via the global STS endpoint.

Or the server should also lookup it's own region via metadata and use the local region endpoint + signing.
So clients and servers in the same region will work with no configuration, but clients in a different region specify the server region.

Currently if you don't configure anything any client outside of us-east-1 has to specify us-east-1 as the region, which is super confusing given I have zero resources in us-east-1!

@bernardd
Copy link

bernardd commented Sep 9, 2019

Hey - I'm hitting this (random region selection) too as a result of adding sts_endpoint configuration to deal with the change described in #7397 and https://groups.google.com/forum/#!msg/vault-tool/ki0GUEu7FFo/QBulNTDtBwAJ.

Currently if you don't configure anything any client outside of us-east-1 has to specify us-east-1 as the region, which is super confusing given I have zero resources in us-east-1!

Same here - it makes no sense to me, but looks like the only way to work around the change in behaviour as things stand.

@joelthompson
Copy link
Contributor

This particular issue only comes into play when Vault is trying to determine the AWS account ID corresponding to the credentials it is using. It needs to do this to ensure that the ARNs being bound are actually resolvable. (For example, if you pass a bound_iam_principal_arn of arn:aws:iam::123456789012:role/MyRole and Vault actually has credentials for account 987654321098, then you wouldn't want Vault looking up the unique ID for MyRole in account 987654321098 -- that could potentially be a security vulnerability.) Once the account ID is looked up, Vault caches it for future use.

I think the best way is what @tyrannosaurus-becks suggests -- namely, ask Vault operators to specify the AWS region and then use awsutil.GetOrDefaultRegion. It's not ideal, but it's not terrible either. In theory this could be a breaking change, but as you already noted, the behavior is already largely broken so it should make things strictly better.

I can see a couple other options, but I don't think they're as good:

  1. Try to parse the region out of the STS endpoint URL. The problem with this is that it's not guaranteed to work. It needs to support the AWS standard partition (easy), the AWS GovCloud partition (a little harder), the AWS China partition (a bit harder), VPC Endpoint URLs for all of these partitions (harder still since I can't test in AWS China or GovCloud), and ideally would even work for the AWS Secret partition (very hard -- see Vault - AWS IAM/EC2 Auth- GovCloud #6631 (comment)). And, beyond all that, it should also support "custom" STS endpoints, and in this case, it's impossible to divine the region from a custom endpoint. Perhaps we could combine this with the awsutil.GetOrDefaultRegion method, i.e., have this be an attempted fallback if no region is specified explicitly, but I'm worried this starts to get into too much complexity and magic.
  2. Ignore this error on the code path and just verify later that, if an IAM principal is returned when looking up the unique ID, ensure that the principal's account ID in the ARN matches the account ID passed in. This is potentially very confusing to users if Vault is using credentials for account 987654321098, a user passes in a bound_iam_principal_arn of arn:aws:iam::123456789012:role/MyRole, and MyRole doesn't exist in account 987654321098. In that case, Vault would return an error saying that the role doesn't exist, when in fact it actually does

A few other notes:

If this is sent to the global sts.amazonaws.com endpoint then it works because that endpoint accepts any region?

Actually, the global sts.amazonaws.com endpoint accepts only requests signed for the us-east-1 parameter. it happens to work because the AWS golang SDK ignores the region parameter when creating an STS client unless you also explicitly specify an endpoint. So with no sts_endpoint parameter set, the underlying golang SDK will just always sign the request for the correct us-east-1 region.

I'm not sure if using GetOrDefaultRegion here is the right approach either.
For example if my Vault server is in eu-west-2 but I configure the STS endpoint as eu-west-1 we would sign requests as eu-west-2 and send them to eu-west-1, which wouldn't work either?

In theory, yes, you're right. But, more practically speaking, why would you or anyone else ever do that? Is there a valid use case for this? By using a closer region, you can take advantage of lower latency and VPC endpoints. And, even if you wanted to do that, in my proposal, you could just override the region parameter in the client config to be eu-west-1 and it would still work.

There's also a related issue with Vault CLI and Vault server default behaviours.
Assuming the random region bug is resolved the Vault server will, with no additional configuration, use the global STS endpoint and sign everything using us-east-1.

Can we move this discussion to a separate GitHub issue? I'm not trying to shut down debate here; I genuinely think you're raising real usability concerns and I'd love to continue the discussion about how to improve the usability and user experience. But the discussions around AWS STS endpoints is confusing enough and I think it'll be easier to keep these two discussions separate.

@hamishforbes
Copy link
Author

hamishforbes commented Sep 10, 2019

I think the best way is what @tyrannosaurus-becks suggests -- namely, ask Vault operators to specify the AWS region and then use awsutil.GetOrDefaultRegion. It's not ideal, but it's not terrible either. In theory this could be a breaking change, but as you already noted, the behavior is already largely broken so it should make things strictly better.

So having an sts_signing_region parameter (or similar) on the auth client to explicitly configure which region is used? Otherwise falling back to environment/metadata?
That sounds fine to me.

If you configure a signing region and an appropriate endpoint then clients in the same region will 'just work'.

But servers outside of us-east-1 would have to configure a matching sts endpoint to work? As long as it's clearly documented in the setup instructions I think that would be fine though.

In theory, yes, you're right. But, more practically speaking, why would you or anyone else ever do that? Is there a valid use case for this?

No, it was just a contrived scenario that would break, I think I was assuming it would just use GetOrDefaultRegion without an explicit configuration parameter, which is obviously daft!

Can we move this discussion to a separate GitHub issue?

For sure, although I think if theres a configurable signing region then the disconnect between the 2 is less pronounced so maybe not needed

@joelthompson
Copy link
Contributor

So having an sts_signing_region parameter (or similar) on the auth client to explicitly configure which region is used? Otherwise falling back to environment/metadata?

No. The client already has this parameter. The proposal would be to add this to the server.

But servers outside of us-east-1 would have to configure a matching sts endpoint to work? As long as it's clearly documented in the setup instructions I think that would be fine though.

No. Servers configured with a custom sts_endpoint would need a custom sts_signing_region (or somesuch parameter) to work properly. Clients would still need to know what that parameter is.

Can we move this discussion to a separate GitHub issue?

For sure, although I think if theres a configurable signing region then the disconnect between the 2 is less pronounced so maybe not needed

Maybe? The client signing region would still need to match the server signing region, and the disconnect between those two isn't great. (One could imagine a helper to read the server-side STS signing region first before doing the client-side signing, but that configuration would need to be anonymously readable, because you need to sign a request to authenticate to Vault in the first place, and I'm not sure that's a great idea.) Which is why I think a separate issue to discuss how to improve the user experience would be great to make this better for everyone.

@tyrannosaurus-becks
Copy link
Contributor

I think we're in decent accord here, and would support a PR to that effect. @joelthompson you're welcome to do it if you'd like. If not, I can take it on at my next opportunity, but I need to deliver some other unrelated items first so it will be at least a month or two before I can start.

@ivan85viv
Copy link

ivan85viv commented Sep 23, 2019

I have the same issue and its really frustrating because in my case I have vault and vault clients in us-west-2 and I have sts endpoint pointing to us-west-2 ... this worked fine but when I upgrade with latest vault 1.2.3 I started to have issues with terraform vault provider complaining for the same error. I tried export AWS_REGION=us-west-2 in my shell before running terraform but the issue still persists. I see there is no way of configuring region in the vault provider itself. In my aws profile I have the region configured also but this does not make any difference.

The workaround for me to make sure clients and terraform works okay is to remove the us-west-2 sts endpoint (leave the default) and force clients to use us-east-1 endpoint (this works for terraform and all my clients but its really confusing).

@kalafut
Copy link
Contributor

kalafut commented Oct 15, 2019

Fixed in #7622 and #7632

@kalafut kalafut closed this as completed Oct 15, 2019
@kalafut
Copy link
Contributor

kalafut commented Oct 16, 2019

Reopening since the linked issues only improve the login issues, not role creation.

@kalafut kalafut reopened this Oct 16, 2019
@kalafut kalafut removed this from the 1.2.4 milestone Oct 16, 2019
@michelvocks michelvocks added bug Used to indicate a potential bug secret/aws labels Nov 5, 2019
@martinssipenko
Copy link
Contributor

@tyrannosaurus-becks any updates on this? This is a blocker as it's impossible to use custom STS endpoint (creating a role fails because signatures don't match).

@tyrannosaurus-becks
Copy link
Contributor

@martinssipenko my apologies, I have been tied up with a separate project and it looks like I will be in the upcoming time period. We are a small engineering team in relation to the number of issues that come in across the repositories we maintain. Vault PR's are welcome!

@tyrannosaurus-becks
Copy link
Contributor

tyrannosaurus-becks commented Nov 21, 2019

@hamishforbes and/or @martinssipenko, I had the opportunity to circle back around and work on this. Would either of you be able to confirm this fixes it for you by pulling the branch, and running this test? Just will need to comment out the skip, and past a real arn in here.

I know you both write Go because I have seen your PRs before! 👀 😄

@martinssipenko
Copy link
Contributor

I created a user with admin policy, exported it's keys and ran the below.

make testacc TESTARGS='-run=TestRoleResolutionWithSTSEndpointConfigured' TEST=./builtin/credential/aws
==> Checking that build is using go version >= 1.12.7...
==> Using go version 1.13.4...
VAULT_ACC=1 go test -tags='' ./builtin/credential/aws -v -run=TestRoleResolutionWithSTSEndpointConfigured -timeout=60m
=== RUN   TestRoleResolutionWithSTSEndpointConfigured
--- PASS: TestRoleResolutionWithSTSEndpointConfigured (0.98s)
PASS
ok  	github.com/hashicorp/vault/builtin/credential/aws	1.561s

@mdshoaib707
Copy link

As per the vault doc, they say if sts_endpoint is set then sts_region should also be set.
Previously it was failing for me when I had only set sts_endpoint but later when sts_region was set then the error was gone.

Commands for setting the sts region and endpoint

vault write auth/aws/config/client sts_endpoint=https://sts.eu-west-1.amazonaws.com
vault write auth/aws/config/client sts_region=eu-west-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Used to indicate a potential bug ecosystem secret/aws
Projects
None yet
Development

Successfully merging a pull request may close this issue.