-
Notifications
You must be signed in to change notification settings - Fork 85
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RFC 513: Application Specific Staging Resources (#515)
Retrospective RFC #513 Not the best write-up, but it'll give us a location to start discussing. [Rendered version](https://github.com/aws/aws-cdk-rfcs/blob/huijbers/app-specific-bootstrapping/text/0513-app-specific-staging.md) --- _By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license_
- Loading branch information
Showing
1 changed file
with
305 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,305 @@ | ||
# Application Specific Staging Resources | ||
|
||
* **Original Author(s):**: @rix0rrr | ||
* **Tracking Issue**: #513 | ||
* **API Bar Raiser**: - | ||
|
||
Currently, to deploy any interesting applications the CDK requires an account to be bootstrapped: it requires the | ||
provisioning of roles and staging resources to hold "assets" (files and Docker images) before any application can | ||
be deployed. | ||
|
||
If those staging resources could be created as part of a normal application deployment, the requirement to precreate | ||
those resources is dropped. Users can choose to provision roles if they want to enable CI/CD or cross-account | ||
deployments, or they can choose not to bootstrap at all if they want to use existing credentials. | ||
|
||
## A brief history of synthesizers and bootstrapping | ||
|
||
The AWS CDK needs some infrastructure to deploy applications into an account and region. What supporting resources exist | ||
and what their names are is a contract between the CDK application and the AWS account. "Synthesizers" are the part of | ||
a CDK application that encode this contract: users prepare their account a certain way, and then pick a synthesizer | ||
that matches the resources they have provisioned (optionally configuring it with non-default parameters). Synthesizers | ||
were introduced in CDKv2; before that, there was only "the" default assumptions that the CDK would make about "the" | ||
account, and none of it was configurable. | ||
|
||
The process of preparing an AWS account to be used with a synthesizer is called "bootstrapping". | ||
|
||
### V1 | ||
|
||
In the original bootstrapping stack, we create an S3 bucket to hold files: large CloudFormation templates and assets | ||
such as Lambda code. ECR repositories are created on-demand by the CLI, if Docker images needed to be uploaded. | ||
Originally, we added in a Custom Resource to the template that would clean up the ECR repository when the Stack gets | ||
cleaned up. In 1.21.0, we removed this, and now leave cleanup of dynamically created ECR repositories to users. Asset | ||
locations are completely controlled by the CLI via parameters. | ||
|
||
All deployments are being done with the credentials of the user that runs the CLI. | ||
|
||
DOWNSIDES | ||
|
||
* Assets take up template parameters, of which there is a limited amount (~50 when we built this system) | ||
* The dynamism and arbitrary ECR repo creation does not work well in CI/CD systems. | ||
* The user must have CLI credentials for each account they want to deploy to, and if a single app deployment should | ||
go into multiple accounts they must selectively deploy stacks into different accounts using different sets of | ||
credentials. | ||
|
||
### V2 | ||
|
||
The bootstrap resources were redesigned as part of the development of CDK Pipelines, an opinionated construct that | ||
allows trivial deployment of any number of CDK stacks to any number of accounts and regions. The design was designed to | ||
work for the CLI, a CodePipeline-based solution, as well as other CI/CD solutions in general. It also allows | ||
cross-region deployments. | ||
|
||
To that end, the bootstrap stack now creates (for each account and region combination): | ||
|
||
* A single S3 bucket and single ECR repository with well-known names (that need to be reflected in the CDK app if they are non-standard). | ||
* An encryption key for the S3 bucket | ||
* An Execution Role for the CloudFormation deployment | ||
* A role to trigger the deployment, a role to write to the S3 bucket, a role to write to the ECR repository | ||
* A role to look up context in the account | ||
* An SSM parameter with a version number of the bootstrap stack | ||
|
||
This solution solves for the CI/CD and cross-environment deployments by pre-provisioned roles, and removes | ||
the need for parameters by rendering the location of each asset directly into the template. | ||
|
||
DOWNSIDES | ||
|
||
* Some users don’t like the pre-provisioned roles and prefer the v1 situation where their existing credentials were used | ||
for permissions. | ||
* A common complaint about the bootstrap stack is that the resources we create by default do not comply with a given | ||
corporate policy, followed by an endless stream of feature requests to add this-and-that feature to the bootstrap | ||
stack (block public access, block SSL, tag immutability, image scanning, etc. etc). We solve this by telling customers | ||
to take the bootstrap template and customize it themselves, but CloudFormation templates can’t be patched simply and | ||
this requires users to effectively “fork” our bootstrap stack and manually keep it up-to-date with incoming changes. | ||
* Because all staging resources need to be provisioned a priori and need to serve all types of applications, we can't | ||
depend on application knowledge. Specifically, we won't know how many Docker images will be used in the application, | ||
so we create a single ECR repository to hold all images. This has a number of downsides: | ||
* Docker caching relies on pulling the “latest” image from a repository and skipping layers that were already built. | ||
This doesn’t work if images built off of various different Dockerfiles are in the same repository. | ||
* Lifecycle policies cannot be used because different images from potentially different applications with very | ||
different life cycles are all in the same repository. The same was already true for S3, but the problem is | ||
less severe because S3 is pretty cheap while ECR is not. | ||
* Some people were using the V1 Docker image publishing mechanism not as a vehicle for uploading Docker images to be used | ||
by the CDK’s CloudFormation deployment, but simply as a mechanism for building and publishing Docker images, to be | ||
used by a completely different deployment later. The lack of control over the target ECR repository breaks this | ||
use case (required the development of an `aws-ecr-deployments` construct module, which does give the necessary | ||
control but racks up costs by doubling ECR storage requirements, and still does not allow staging resource cleanup). | ||
* We always create an empty ECR repository because we cannot know whether apps deployed into the account will need | ||
it or not, so the ECR repository may go unused. AWS Security Hub will throw warnings about empty ECR repositories, | ||
which makes customers uneasy. | ||
* Bootstrap stacks are expected to be account-wide, and mix assets from all applications. Some customers that deploy | ||
multiple applications into the same account are very sensitive to this mixing, and would rather keep these resources | ||
separate. They can do multiple bootstrap stacks in the same account, but this is all a bit onerous. | ||
|
||
## A new proposal: application specific staging resources | ||
|
||
The bootstrap stack contains two classes of resources: staging resources, which hold assets (bucket and ECR repo), and | ||
roles, which allow for unattended (CI/CD) and cross-account access. In the new proposal, we will separate out the | ||
staging resources from the roles. Roles will still be bootstrapped (if used), but staging resources will not. | ||
|
||
* Staging resources will be created on a per-CDK app basis. We will create one S3 bucket with different object prefixes | ||
for different types of assets (see Appendix A: two types of assets), and an ECR repository per Docker image. Resource | ||
access roles can also be created on an as-needed basis. This solves the problems of asset resources of different | ||
applications mixing together, and it would also remove the need for garbage collection by allowing use of life cycle | ||
rules. | ||
* Since the roles are now the only things that need to be bootstrapped, that will have a number of advantages: | ||
* Bootstrapping will be faster since the heavy resource of a KMS key is no longer involved. | ||
* Because roles are a global resource, every account now only needs to be bootstrapped once. First of all the lack | ||
of necessary control of regions will work a lot better with Control Tower+automatic Stack Sets (which does not | ||
allow region control). | ||
|
||
If we can make the bootstrapping resources part of the CDK application, then users now have a familiar way to customize | ||
them to their heart’s content, so the treadmill of bootstrap stack customization requests is going to disappear, and | ||
customers will also not need to customize the bootstrap template anymore (assuming their customizations have to do with | ||
the resources instead of the roles). | ||
|
||
A downside is potentially that we lose the ability to have a version number on the bootstrapped resources (because SSM | ||
is not global), but we might say that’s not necessary anymore since the Roles are unlikely to change often. | ||
|
||
> If we wanted to maintain versioning on the Roles, we could say that the stack always must be deployed in `us-east-1` | ||
> and that’s where we look for the version; however, this may require cross-internet traffic and therefore be considered | ||
> dodgy from a reliability perspective, and we could only do the versioning check using the CLI, not from the | ||
> CloudFormation template. Of course we’ll have to pick the correct leader region per partition, `aws-cn`, `aws-iso`, etc. | ||
### How it will work in practice | ||
|
||
Bootstrapping resources are currently designed the way they are because the CLI relies on the assumption that the | ||
bootstrap resources are present with a well-known name, before the first CloudFormation deployment starts. In other | ||
words, this is purely a limitation of the orchestration, that we can take away. | ||
|
||
Here’s what we’re going to do: | ||
|
||
* We will introduce a new Stack Synthesizer, called `AppStagingSynthesizer`. | ||
* This synthesizer will create a support stack with the bucket, and an ECR repository per Docker image. | ||
* Assets will have a dependency on the support stack. This is a new concept that doesn’t currently exist because assets | ||
are an orchestration artifact that looks independent like stacks are, but they aren't really: in practice the orchestration | ||
ignores everything except stacks, and treats assets as being part of a stack. | ||
* Docker assets may still be built before the first deployment (although for proper caching we need the repository | ||
to exist first), but will only be uploaded when it’s their time in the orchestration workflow. | ||
* For a minimal diff these resources could have fixed names, but we could add support for Stack Outputs and assets could | ||
have support for Parameters, so that we can thread generated bucket and repository names through the system. For now, | ||
we will do fixed names for the staging resources. | ||
|
||
### What the API looks like | ||
|
||
To use the new synthesizer: | ||
|
||
```ts | ||
import { AppStagingSynthesizer } from '@aws-cdk/app-staging-synthesizer'; | ||
|
||
const app = new App({ | ||
defaultStackSynthesizer: AppStagingSynthesizer.defaultResources({ | ||
appId: 'my-app-id', // put a unique id here | ||
deploymentIdentities: DeploymentIdentities.defaultBootstrapRoles({ bootstrapRegion: 'us-east-1' }), | ||
|
||
// How long to keep File and Docker assets around for rollbacks (without requiring resynth) | ||
deployTimeFileAssetLifetime: Duration.days(100), | ||
imageAssetVersionCount: 10, | ||
}), | ||
}); | ||
``` | ||
|
||
For any additional customization (such as using custom buckets or ECR repositories), `DefaultStagingStack` | ||
can be subclasses or a full reimplementation of `IStagingResources` can be provided: | ||
|
||
```ts | ||
class MyStagingStack extends DefaultStaginStack { | ||
private bucket?: s3.Bucket; | ||
|
||
public addFile(asset: FileAssetSource): FileStagingLocation { | ||
this.getCreateBucket(); | ||
|
||
return { | ||
bucketName: 'my-asset-bucket',, | ||
dependencyStack: this, | ||
}; | ||
} | ||
|
||
private createOrGetBucket() { | ||
if (!this.bucket) { | ||
this.bucket = new s3.Bucket(this, 'Bucket', { | ||
bucketName: 'my-asset-bucket', | ||
}); | ||
} | ||
return this.bucket; | ||
} | ||
} | ||
|
||
const app = new App({ | ||
defaultStackSynthesizer: AppStagingSynthesizer.customFactory({ | ||
factory: { | ||
obtainStagingResources(stack, context) { | ||
const myApp = App.of(stack); | ||
return new MyStagingStack(myApp, `CustomStagingStack-${context.environmentString}`, {}); | ||
}, | ||
}, | ||
}), | ||
}); | ||
``` | ||
|
||
--- | ||
|
||
Ticking the box below indicates that the public API of this RFC has been | ||
signed-off by the API bar raiser (the `api-approved` label was applied to the | ||
RFC pull request): | ||
|
||
``` | ||
[ ] Signed-off by API Bar Raiser @xxxxx | ||
``` | ||
|
||
## Public FAQ | ||
|
||
### What are we launching today? | ||
|
||
We are launching a new synthesizer that has fewer demands on the AWS account that CDK apps are deployed into. It only | ||
needs preprovisioned Roles, and those are only necessary for CI/CD deployments or for cross-account deployments. For | ||
same-account, CLI deployments no bootstrapping is necessary anymore. If you are using bootstrapped roles anyway, | ||
they only need to be provisioned in one region, making it easier to use with StackSets. | ||
|
||
The new staging resources are specific to an application and can be cleaned up alongside the application. In addition, | ||
the way the staging resources are structured, they now allow the use of lifecycle rules, keeping costs down for | ||
running CDK applications over a long period of time. | ||
|
||
### Why should I use this feature? | ||
|
||
You should use this feature if you: | ||
|
||
- Want to take advantage of lifecycle rules on asset staging resources; | ||
- Do not use ECR and don't want to see the SecurityHub warning that tells you you have an empty ECR repository; | ||
- Need to deploy to multiple regions in a set of accounts and want to use StackSets to bootstrap the accounts; | ||
- Want to deploy an application and remove it and be sure that the assets have been cleaned up as well; | ||
|
||
## Internal FAQ | ||
|
||
### Why should we _not_ do this? | ||
|
||
Users generally don't appreciate change, especially if it saddles them with busywork. While the migration path will be | ||
purely optional, and there are definite benefits to be had, synthesis+bootstrapping is already a sore spot for users | ||
(it’s hard to explain and therefore a bit under-documented) and introducing more churn may lead to backlash. | ||
|
||
### What is the high-level project plan? | ||
|
||
- We will release the new synthesizer as an optional feature, first initially only for the CLI. | ||
- CDK Pipelines support can be added later. When Pipelines support is added, it should be taken into | ||
account that the time interval between stage deployments may be significant, especially if it involves manual | ||
approval steps. We must take care that the docker images published to the Testing stage are not rebuilt for | ||
the Production stage, but are replicated. | ||
- We have to clearly explain the concept of Synthesizers, the account contract, and Bootstrapping, along with the choices | ||
users have and how they should navigate them in the Developer Guide. | ||
- Customization by subclassing is possible, but we will probably have to selectively expose some protected helper | ||
functions to make it more convenient. We will do that when feature requests start coming in. | ||
- After a tryout period, we will move the synthesizer into the core library and document it as a possible alternative | ||
in the developer guide, and we will probably vend a bootstrap template specifically for this synthesizer. | ||
|
||
### New bootstrap template | ||
|
||
By introducing a new template, we technically have an opportunity to rename roles and get rid of the `hnb659fds` | ||
identifier that customers hate. However, to make the migration from the current bootstrap stack as smooth as possible, | ||
we probably should NOT be taking this opportunity and just keep the same role names. | ||
|
||
The new bootstrap template will contain exactly the **CloudFormation Execution Role**, **Deployment Role**, and **Lookup Role** | ||
from the current template, and nothing else. | ||
|
||
We can put a version on it for informational purposes, but that version will not be checkable by CloudFormation deployments; | ||
perhaps it could be make checkable by the CLI during `cdk deploy` time. At least `cdk bootstrap` will be able to look at the | ||
version to prevent downgrading. | ||
|
||
The bootstrap template will be selected by either running `cdk bootstrap` in an app directory that uses the `AppStagingSynthesizer`, | ||
or passing a command-line flag to CDK bootstrap: `cdk bootstrap --synthesizer=[legacy|default|appstaging]`. If `cdk bootstrap` detects | ||
it is changing the "type" of bootstrap stack, it will throw up a confirmation prompt with an explanation of the consequences: | ||
|
||
``` | ||
$ cdk bootstrap --synthesizer=appstaging | ||
This operation will change the style of bootstrap stack from "default" version 18 to "appstaging" version 1. | ||
This bootstrap stack style has been designed for the AppStagingSynthesizer; make sure that you are using that synthesizer | ||
in the CDK apps you plan to deploy to this environment. For more information, see http://amzn.to/5vjQYrtejA. | ||
Continue (y/N)? | ||
``` | ||
|
||
### Are there any open issues that need to be addressed later? | ||
|
||
- The template for the staging resources stack must be small enough to fit into a CloudFormation API call, which means | ||
it may not exceed 50kB. Since every ECR repository will add to this size, we have to limit the count. We may need | ||
to create multiple stacks using an overflow strategy to lift this limit. | ||
|
||
## Appendix A: two types of assets | ||
|
||
There are two types of assets: | ||
|
||
* “Handoff” assets: these are temporarily put somewhere, so that in the course of a service call we can point to them. | ||
The service will make their own copy of these assets. For example, large CloudFormation templates and Lambda Code | ||
bundles are an example of this: the CloudFormation template will only read the template once during the deployment, | ||
and the Lambda service will make a private copy of the S3 file. | ||
* Rollbacks by means of a pure-CloudFormation deployment (so not fresh deployment that involves a CLI call) may | ||
require presence of the old handoff asset for a while, so it shouldn’t be deleted right away, but it is reasonable | ||
to put a lifecycle policy on handoff assets, equal to the longest period of time a user should still reasonably | ||
expect to want to do a rollback in (see the BONES sev2 and damage control campaign from a couple of years ago when | ||
the BONES team decided a month was a reasonable period and some service team wanted to roll back to a version of 2 | ||
months old). | ||
* “Live” assets: these get continuously accessed in their staged location by the running application. Examples are ALL | ||
Docker images (ECS will constantly pull from the user’s ECR container, and never make their own copy), and some | ||
asset-assisted conveniences like CodeBuild shellables or CFN-init scripts. | ||
* These can in principle only be garbage collected by mark-and-sweep: we must know they are not needed by any | ||
current CDK stacks, nor by any CDK stack revisions the user might want to roll back to. | ||
* However, for ECR images we can do slightly better: since we have an ECR repository per docker image per | ||
application, we can use a lifecycle policy of the form “keep only the most recent 5 images.” | ||
* That leaves only certain eccentric types of file assets which are not collectible (until the entire application | ||
gets deleted). This might be a “good enough” position to be in. |