Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance AWS node attestor server plugin to validate EC2 instances across multiple AWS account. #2218

Closed
sushil-prasad opened this issue Apr 14, 2021 · 26 comments

Comments

@sushil-prasad
Copy link
Contributor

We are trying to set up the spire infrastructure in our production environment. The production environment has multiple AWS accounts. We want to deploy a spire server in one of the accounts and attest the EC2 hosts in all other AWS accounts using AWS aws_iid node attestor.

When both the spire server and the spire agent runs in the same AWS account, the spire server is able to attest to the EC2 host. However when spire server and spire agent are in different AWS account, the node attestation fails with this message:
ERRO[0000] agent crashed error=“failed to get SVID: error getting attestation response from SPIRE server: rpc error: code = Internal desc = failed to attest: rpc error: code = Unknown desc = aws-iid: attempted attestation but an error occurred querying AWS via describe-instances: InvalidInstanceID.NotFound:

@sushil-prasad
Copy link
Contributor Author

One potential solution for this problem could be to allow providing a role name as part of AWS aws_iid node attestor plugin configuration. The spire server will perform AssumeRole in the AWS account presented by the AWS aws_iid document. It is assumed that same role name is present in all AWS accounts. If no role name is provided as part of plugin configuration, the spire server will continue with the current implementation.

@walmav
Copy link

walmav commented Apr 22, 2021

If we limit the selectors to be based on the fields of the IID and since we are able to verify the IID document through AWS DSA public certificate, wonder if we can get away from the need for AWS specific credentials on the SPIRE server.

@georgi-lozev
Copy link
Contributor

I like @walmav's idea. It will make the workflow similar to the one implemented in the gcp_iit plugin. The plugin could then get a list of allowed accounts and accept verified IIDs only from those.

Looking into the docs I see there are 3 types of signatures available.
Do you know what is the difference between them and are there any security implications from using one or the other?

@sushil-prasad
Copy link
Contributor Author

sushil-prasad commented Apr 28, 2021

I was digging into the AWS IID. When I run curl http://169.254.169.254/latest/dynamic/instance-identity/document on the EC2 instance, I see following output
{
“accountId” : “XXXXXXXXX",
“architecture” : “x86_64",
“availabilityZone” : “us-west-2b”,
“billingProducts” : null,
“devpayProductCodes” : null,
“marketplaceProductCodes” : null,
“imageId” : “ami-0e81828bfafcdddbd”,
“instanceId” : “i-06309d51c78628654”,
“instanceType” : “r5.large”,
“kernelId” : null,
“pendingTime” : “2021-04-22T18:21:27Z”,
“privateIp” : “172.22.78.250",
“ramdiskId” : null,
“region” : “us-west-2”,
“version” : “2017-09-30”
}

I do not see any suitable fields in this that can use be used to form selectors. That could be the reason that the current implementation of AWS node attestor is making call to AWS API to acquire additional information about the EC2 instance and then converting it to a set of node selectors.

It may be that going back to AWS API is unavoidable.

@evan2645
Copy link
Member

One potential solution for this problem could be to allow providing a role name as part of AWS aws_iid node attestor plugin configuration. The spire server will perform AssumeRole in the AWS account presented by the AWS aws_iid document.

I love this. It is much better than having to provide creds for multiple accounts :)

Is role assumption required to do this? Or is it also possible for other AWS accounts to grant access to us directly based on the ARN of our role in this account?

I also wonder if anything funny can happen here... someone with an IID from a totally random account can make SPIRE Server try to assume a role in it. I don't know if that's bad or not... it's probably worth thinking about for a while

If no role name is provided as part of plugin configuration, the spire server will continue with the current implementation.

Hmm I think this could work OK from a compat standpoint, but this plugin has a lot of other work that needs to be done, and I'd really love to get the configuration experience refactored as a whole.

If we limit the selectors to be based on the fields of the IID and since we are able to verify the IID document through AWS DSA public certificate, wonder if we can get away from the need for AWS specific credentials on the SPIRE server.

As @sushil-prasad has pointed out, the information contained in the IID itself is of limited value. I don't know if there are use cases for AWS validation that involve the AWS APIs being unreachable, but I think an option to disable AWS API calls is reasonable, so long as the user understands that they'll get only limited information back as a result.

The plugin could then get a list of allowed accounts and accept verified IIDs only from those.

Yes, we might ned an account allowed list no matter what direction we take here.

Looking into the docs I see there are 3 types of signatures available.
Do you know what is the difference between them and are there any security implications from using one or the other?

Hmm good question, I don't know... I think this has changed slightly since the last time I looked (e.g. the key to verify the signature is no longer publicly available...)

@caleygoff-invitae
Copy link

Just chiming in here

+1 on having an assume_role_arn configurable on the aws_iid plugin for spire-server. I think I observed on 12.1 the aws_iid actually AssumeRole and allowing me to have pod specific permissions via our OIDC connector back to AWS IAM.

I am seeing that in 12.2 however it is forcing me to use an InstanceProfile in which for my shop the way we configure our EKS clusters, we have a single InstanceProfile for each node.group. We actually do not use the InstanceProfiles, but for some reason their attached to our instances.

So actually allowing a cluster InstanceProfile to run the queries needed to get aws_iid to attest seems problematic because we really dont want the entire cluster the ability to describe and get metadata.

@evan2645
Copy link
Member

evan2645 commented May 6, 2021

Hey @caleygoff-invitae thanks for adding this, it is good to know.

I thought that the ability to assume a role was based on some role you already have (i.e. instance profile role)? Or there is some other way to specify who is authorized to assume?

@caleygoff-invitae
Copy link

caleygoff-invitae commented May 6, 2021

Hey @caleygoff-invitae thanks for adding this, it is good to know.

I thought that the ability to assume a role was based on some role you already have (i.e. instance profile role)? Or there is some other way to specify who is authorized to assume?

My understanding is we can do both, either by specifying a Role ARN to assume via the STS Assume Role API by passing in the ARN https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/stscreds/

Or use the Ec2 instance profile that we're already doing https://docs.aws.amazon.com/sdk-for-go/api/aws/credentials/

But yes in both cases the role already exists and is setup we are just passing in the IAM role arn for the specific workload to use in one use case and applying an the IAM role arn to an instanceprofile and associating that to an ec2 for all workloads in another case.

It should also be noted that on the IAM Role thats passed into the workload there is a trust relationship and sts:AssumeRole is allowed on the ARN in which the workload is running (our cluster node.group arn) .

I just realized that the aws_pca plugin uses assume_role_arn also to reach out to AWS for PCA things, this works for us because the role is applied to the environment and this plugin picks it up. Its seems like an easy stretch to do the same for aws_iid and allow it to be optional

@sushil-prasad
Copy link
Contributor Author

sushil-prasad commented May 7, 2021

I just realized that the aws_pca plugin uses assume_role_arn also to reach out to AWS for PCA things, this works for us because the role is applied to the environment and this plugin picks it up. Its seems like an easy stretch to do the same for aws_iid and allow it to be optional

👍
Should the "assume_role_arn" be templatized based on the account is of the Spire node attestor agent . A given spire server should be able to attest spire agents across multiple accounts.

@mlakewood
Copy link
Contributor

mlakewood commented May 9, 2021

I think the singular assume_role_arn works in the case of AWS PCA because you are only going to assume a single role. I wonder if a more flexible solution is some mapping of aws_account_id to role_arn in the configuration. That way the plugin can just grab the account id from the instance metadata returned, and assume the role_arn. This would make the implementation relatively straightforward, but pretty flexible. It also means you dont have to do any string parsing on the role_arn etc. It could also double as an allowlist if thats what we want, although I'm hesitant to lump two purposes into one set of data, and has the risk of complecting things.

@evan2645
Copy link
Member

Yes, ok .. this is all helpful, thanks everone.

IMO the right thing to do here is probably user configurable role name. We try to assume the role you specify in the account presented by the IID. My relatively strong preference is to avoid an account/role map if at all possible. While it is certainly more flexible, it's also more of a pain to configure, you need to know these ARNs, and if the name is the same you can't just set it once and go.

It could also double as an allowlist if thats what we want, although I'm hesitant to lump two purposes into one set of data, and has the risk of complecting things.

Yes... I agree, and FWIW, I do think we also need an allow list. Just to double-check, nobody here is concerned about the following?

I also wonder if anything funny can happen here... someone with an IID from a totally random account can make SPIRE Server try to assume a role in it. I don't know if that's bad or not... it's probably worth thinking about for a while

@caleygoff-invitae
Copy link

caleygoff-invitae commented May 10, 2021

I also wonder if anything funny can happen here... someone with an IID from a totally random account can make SPIRE Server try to assume a role in it. I don't know if that's bad or not... it's probably worth thinking about for a while

I'm like 90% sure this only an issue if the accounts in question have allowed the other to be assumed. Otherwise AWS just denies the request and would never allow an arbitrary account ARN to assume a role belonging to some other arbitrary account ARN. At least in my experience I have had to explicitly setup policies and trust entities to allow for cross account assumerole access on a given role.

@evan2645
Copy link
Member

I also wonder if anything funny can happen here... someone with an IID from a totally random account can make SPIRE Server try to assume a role in it. I don't know if that's bad or not... it's probably worth thinking about for a while

When considering this, my mind goes to places like:

  • If I obtain a (possibly no longer relevant) IID from a "target" AWS account, can I force SPIRE Server to exhaust rate limit or call counters on the target account?
  • If I send SPIRE Server a large number of IIDs from arbitrary account(s), can I cause AWS to flag SPIRE Server as abusive and block any further (legitimate) requests?

@evan2645
Copy link
Member

Hey @sushil-prasad, curious to hear if you have any thoughts on the above? Also, is this functionality something you'd be willing to contribute?

@sushil-prasad
Copy link
Contributor Author

I would love to contribute towards this. Please assign this ticket to me. Let me review the code and test all the assumptions and then I will propose the potential design and draft PR.

As long as the rate limit is concerned, I think doing this per IP might be simpler than per AWS account. I have seen similar issues in other scenarios happening with the same AWS account itself. Finding the right volume per account might be tricker based on account size. Finalizing the upper bound on the rate limit per IP is much easier. Any suggestion on what the rate limit per IP should be (10 requests/per minute???). Should this limit be configurable?

Should we allow a configurable DENY account list?

I agree with @caleygoff-invitae. This will need explicit policies and trust entities. I will also document the exact AWS policy needed to make this work. This will also help us review if this change accidentally introduces any potential security vulnerabilities.

@georgi-lozev
Copy link
Contributor

Maybe a silly question from my side, but is there a way to get rid of the need for pre-shared aws_access_key_id and secret_access_key in the default case of aws_iid?

Similar to what is the configuration experience of the gcp_iit. Just require secrets when you actually need more sophisticated selectors than the one available in the iid/iit.

Of course this should not be at the cost of any major security implications.

P.S. I'm commenting here, because I think those two are very much related, but feel free to ask me to follow-up in another issue if you don't see it fit here

@sushil-prasad
Copy link
Contributor Author

@georgi-lozev I think access_key_id and secret_access_key are optional configuration. If you do not specify them, spire server falls back to using EC2 security group. In our deployment, we have granted required permissions to the security group that is attached to the spire server and we do not configure access_key_id/secret_access_key

@georgi-lozev
Copy link
Contributor

Hmm, that's interesting @sushil-prasad
Do you know where in the code this fall back happens ?
Are you using the same AWS account for hosting the SPIRE server and also attesting workloads from it?

Based on my experiments where I've used different accounts for the SPIRE server and the workloads I didn't managed to make this call

instancesDesc, err := awsClient.DescribeInstancesWithContext(ctx, &ec2.DescribeInstancesInput{
to pass without providing credentials.

It could be that I have misconfigured something or was somehow mislead from the docs, so I didn't spent too much time trying to make it working without credentials.

@sushil-prasad
Copy link
Contributor Author

@georgi-lozev For my case, both spire-server and spire agents are running in the same AWS account. The current code will not work across multiple account without explicit AWS credentials. Going across accounts without explicit AWS credentials with need Assume Role which is this enhancement is trying to address

@sushil-prasad
Copy link
Contributor Author

sushil-prasad commented Jun 15, 2021

I also wonder if anything funny can happen here... someone with an IID from a totally random account can make SPIRE Server try to assume a role in it. I don't know if that's bad or not... it's probably worth thinking about for a while

Here is the doc describing delegate access across AWS accounts using IAM roles
https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html,

The spire agent account MUST be recognized by the spire server account to make the spire server issue the certificate. So sending the AWS IID from a random account to the spire server will fail at DescribeInstance step in the spire server and hence no certificate will be issued.

Here is the high level setup to make this work:

  • Name of Spire Server Account: spire-server-account
  • Name of Spire Agent Account: spire-agent-account
  • Name of Spire Delegate Role: spire-delegate-role. This role has permission to describe EC2 instances in the respective spire agent accounts. The Spire server will assume this role in the spire agent account to perform describe EC2 instance
  • Name of Spire Server EC2 Role: spire-server-role. This role is attached to the EC2 instances running the spire server.

Setup

  • Spire agent account setup
    - Create spire-delegate-role
    - Setup spire-server-account as the trusted entity

  • Spire server account setup
    - Grant spire-server-role to assume spire-delegate-role@spire-agent-account
    This step needs to be done for each account where the Spire agent is running and wants to get attested with the spire server running in the spire-server-account.

From a security perspective, a Spire server account has no control over a spire-agent account. But the spire server account has control over this step. So sending an AWS IID from a random account will fail because the spire server will not be able to assume spire-delegate-role@spire-agent-account in the random spire-agent-account. Hence no certificate will be issued.

@sushil-prasad
Copy link
Contributor Author

sushil-prasad commented Jun 15, 2021

On the rate limits:
Both the spire server and spire agent AWS needs to do policy setup so the spire server will not be able deliver certificates for a random AWS account.

I am assuming that the spire server is not exposed on the public network. So we are talking about spire servers and agents taking over private networking. This will require explicit network configurations between AWS accounts like VPC peering. This means that only configured AWS spire agent accounts are able to talk to Spire server.

That leaves us to the case, where an EC2 host is compromised and it is being used to abuse the spire server. This possibility exists even when spire-server is serving just one or few pre configured AWS accounts.

So my opinion is that we should externalize the hardening of the spire server deployment and keep the spire code simple. The rate limit could be implemented in the load balancer layer or using iptables (if this has to be done on the spire server hosts). https://making.pusher.com/per-ip-rate-limiting-with-iptables

My goal here is to get to a point where just adding Assume Role in the AWS Node Attestor plugin is sufficient without introducing additional security vulnerabilities (or have an known option to address them externally). I will start working on this issue accordingly.

@sushil-prasad
Copy link
Contributor Author

PR: #2387

evan2645 pushed a commit that referenced this issue Jul 22, 2021
…2387)

If assume_role_arn_template parameter is set in the AWS server node Attestor plugin, the spire server will assume the role as specified by the assume_role_arn_template after expanding the template. Currently only template variable available is AccountID from the AWS IID document sent by the spire agent to the spire server.

* added assume_role_arn_template parameter to AWS server node attestation plugin
* return grpc status on error
* changed parameter assume_role_arn_template to assume_role in AWS server node attestor plugin configuration

Signed-off-by: Sushil Prasad <[email protected]>
@evan2645
Copy link
Member

#2387 is now merged, @sushil-prasad is this issue fixed?

@sushil-prasad
Copy link
Contributor Author

Thanks a lot for merging this. I will test this and update this PR.

@sushil-prasad
Copy link
Contributor Author

sushil-prasad commented Jul 23, 2021

@evan2645 The change on the main worked. We can close this issue.

@evan2645
Copy link
Member

Awesome, thanks for the confirmation @sushil-prasad

rturner3 pushed a commit to rturner3/spire that referenced this issue Aug 4, 2021
…2218) (spiffe#2387)

If assume_role_arn_template parameter is set in the AWS server node Attestor plugin, the spire server will assume the role as specified by the assume_role_arn_template after expanding the template. Currently only template variable available is AccountID from the AWS IID document sent by the spire agent to the spire server.

* added assume_role_arn_template parameter to AWS server node attestation plugin
* return grpc status on error
* changed parameter assume_role_arn_template to assume_role in AWS server node attestor plugin configuration

Signed-off-by: Sushil Prasad <[email protected]>
rturner3 pushed a commit that referenced this issue Aug 5, 2021
…2387)

If assume_role_arn_template parameter is set in the AWS server node Attestor plugin, the spire server will assume the role as specified by the assume_role_arn_template after expanding the template. Currently only template variable available is AccountID from the AWS IID document sent by the spire agent to the spire server.

* added assume_role_arn_template parameter to AWS server node attestation plugin
* return grpc status on error
* changed parameter assume_role_arn_template to assume_role in AWS server node attestor plugin configuration

Signed-off-by: Sushil Prasad <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants