Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bootstrap] Consolidated bootstrap buckets use duplicate object hashes #10710

Closed
Brian969 opened this issue Oct 6, 2020 · 3 comments
Closed
Assignees
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p2 package/tools Related to AWS CDK Tools or CLI

Comments

@Brian969
Copy link

Brian969 commented Oct 6, 2020

Problem Summary

We are using CDK to deploy AWS features and guardrails in all AWS accounts in an AWS Organization

  • We do this for administrative accounts and more importantly customer workload accounts
  • We do this in every region around the world
  • This means we need to bootstrap CDKToolkit in every region, in every account (which is fine)

What's the original problem we were trying to solve?

  • Every AWS accounts S3 console is now a mess, with a minimum of 16 CDK buckets
  • Customers are complaining that this is confusing for their users and makes it hard to find their own buckets

Why don't you just uninstall the CDK stack/bootstrap bucket?

  • On every code execution, we would need to redeploy it, in every account and every region around the world
  • Meaning, on every state machine execution, we would need to also uninstall it, in every account and every region around the world
  • Resulting in large waste of time / very inefficient
  • Additionally, this does not solve the problem, the state machine could be executed at any time
  • Customers would see 16+ buckets appear and disappear from their accounts while using them, causing even more confusion, leaving the buckets at least gives customers consistency (we need to validate/reapply/change guardrails in accounts)

Other notes:

  • Customers deleting CDK staging files does NOT cause issues
  • Customers deleting CDK staging bucket BREAKS the CDK and our automation, so we protect these bucket with SCP's
  • Deleted CDK staging buckets requires removing CDK stack and redeploying CDK stack, no other fix known
  • Need to be able to scale to 1000's of accounts per AWS Organization
  • Currently 1000 accounts * 16 regions (minimum) = 16,000 CDK staging buckets in an organization (just for CDK)

Options Explored:

  • Can we drop the need for a CDK staging bucket?
    • No we are using Assets
  • Can we create one bucket per account in a single region and have all regions in that account access the bucket from a single central region? (i.e. assuming 1000 aws accounts, 1000 cdk staging buckets, one per account, in a single designated region)
    • No, a bucket is required in the local region.
  • Can we add a 'secret hidden bucket' flag to hide the display of the cdktoolkit buckets in S3?
    • No, No such S3 feature exists today.
  • Can we create one bucket per region in a central account and have all accounts in the org use a single regional central CDK staging bucket? (i.e. assuming 16 regions, 16 CDK buckets total no matter the number of accounts in the org)

BUG

  • we updated our codebase to use a single centralized bucket per region for use across all AWS accounts
  • our codebase is highly parallel
  • Unfortunately, CDK hashes are only generated from the filename and the asset content (which is identical across accounts/parallel executions)
  • When executing in parallel across multiple accounts using the single shared bucket, Access Denied Error messages everywhere
  • Deploy to a single account with no parallel executions results in success
  • All the parallel executions are attempting to concurrently:
    a) upload the identical asset objects to S3 at the same time and getting write collisions;
    b) attempting to read s3 objects and getting consistency/read issues/failures.
  • CDK object hash needs to include additional attributes (i.e. account #) to prevent collisions

Reproduction Steps

  • bootstrap multiple AWS accounts to the same CDK central bucket (per region)
  • execute the same CDK assets across multiple accounts in parallel
    (i.e. do the same thing in multiple AWS accounts at the same time)

What did you expect to happen?

  • expected the cdk to successfully deploy stacks in every account in parallel

What actually happened?

  • CDK failed to deploy on asset deployment due to hash conflicts

Environment

  • CDK Version : 1.46.0, we have upgraded to v1.66
  • Language (Version): TypeScript, Node.js v12

This is 🐛 Bug Report

@Brian969 Brian969 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 6, 2020
@Brian969 Brian969 changed the title [bootstrap] Consolidated bootstrap buckets use overlapping object hashes [bootstrap] Consolidated bootstrap buckets use duplicate object hashes Oct 6, 2020
@SomayaB SomayaB added the package/tools Related to AWS CDK Tools or CLI label Oct 7, 2020
@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 13, 2020

That's a big report. I'm going to need more details.

Unfortunately, CDK hashes are only generated from the filename and the asset content (which is identical across accounts/parallel executions)

If this is true, the content should be the same and it shouldn't matter who writes the file.

When executing in parallel across multiple accounts using the single shared bucket, Access Denied Error messages everywhere

Have you been able to figure out why this is happening? Feels like this should be unrelated to what the hashes are based on. Also, multiple PutObjects in parallel to the same key from the same account should work, so what's different for multiple accounts? What do your bucket policies look like?

All the parallel executions are attempting to concurrently:

a) upload the identical asset objects to S3 at the same time and getting write collisions;

I don't know what write collisions mean in S3. Just saying you're seeing "access denied" doesn't really tell me why that's happening. In response to what kind of call?

b) attempting to read s3 objects and getting consistency/read issues/failures.

Given that an object key should be unique for every asset content, I don't see how you can have consistency issues. Maybe you got the upload performed by account A, maybe you got the upload performed by account B, but since the content should be the same it shouldn't matter. Please enlighten me?

CDK object hash needs to include additional attributes (i.e. account #) to prevent collisions

No, I don't think it should be in the hash, because the hash can't really be trusted (which is the problem with this entire scheme).

The correct solution is probably to design an object scheme like this:

s3://central-bucket-name/<ACCOUNT ID>/<ASSET HASH>.zip

Put the ACCOUNT ID in the object key, and put a bucket policy on with a Condition that makes it so only users from that same account id have access to files with the prefix /<ACCOUNT ID>/.

That needs an additional prefix property in the DefaultSynthesizer: https://github.com/aws/aws-cdk/blob/master/packages/@aws-cdk/core/lib/stack-synthesizers/default-synthesizer.ts#L27 which could be set to ${AWS::AccountId}/ and that would be it.

I will accept a PR that adds this.

@rix0rrr rix0rrr added effort/medium Medium work item – several days of effort p2 labels Oct 19, 2020
@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Oct 19, 2020
naveenkoppula added a commit to naveenkoppula/aws-cdk that referenced this issue Nov 6, 2020
@Brian969
Copy link
Author

Brian969 commented Nov 6, 2020

@rix0rrr - please review the provided PR and let Naveen know what you need to merge.
FYI: This enhancement/fix will be used by this CDK Open Source project:
https://github.com/aws-samples/aws-secure-environment-accelerator

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

mergify bot pushed a commit that referenced this issue Jan 6, 2021
… assets (#11855)

Related to #10710, #11327 @rix0rrr 

Adding bucketPrefix to "stackTemplateAssetObjectUrl"

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. effort/medium Medium work item – several days of effort p2 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

No branches or pull requests

4 participants