-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): regression: source directory is fingerprinted even if bundling is skipped #11440
Conversation
If the asset uses OUTPUT or BUNDLE it's generally because its source is a very large directory. This is the case for the `NodejsFunction` which mounts the project root (along with the `node_modules` folder!). Use a custom hash in this case when skipping bundling. Otherwise running `cdk ls` can result in heavy fingerprinting operations (again this is the case for the `NodejsFunction`). Regression introduced in aws#11008 (https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)
@rix0rrr can you take a look? |
return { | ||
assetHash: this.calculateHash(AssetHashType.SOURCE), | ||
assetHash: this.calculateHash(hashType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can always use a custom hash in this case?
This looks good! Using a unique id seems appropriate. I was considering hashing the I don't think it would fix #11459 though, since the I was thinking maybe something like this? It mimics the hash calculation before bundling (just after the skip block). if (skip) {
// We should have bundled, but didn't to save time. Still pretend to have a hash.
// If the asset uses OUTPUT (or BUNDLE), we use a CUSTOM hash to avoid fingerprinting
// a potentially very large source directory. Other hash types are kept the same.
let hashType = this.hashType;
if (hashType === AssetHashType.OUTPUT || hashType === AssetHashType.BUNDLE) {
this.customSourceFingerprint = Names.uniqueId(this);
hashType = AssetHashType.CUSTOM;
}
return {
assetHash: this.calculateHash(hashType, bundling),
stagedPath: this.sourcePath,
};
} |
@rix0rrr ping... would be nice if this could be merged before the next release with the new |
// If the asset uses OUTPUT or BUNDLE, we use a CUSTOM hash to avoid fingerprinting | ||
// a potentially very large source directory. Other hash types are kept the same. | ||
let hashType = this.hashType; | ||
if (hashType === AssetHashType.OUTPUT || AssetHashType.BUNDLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what that means :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean here exactly? This is not a Docker performance issue here. It's the fact that we are fingerprinting the wrong directory.
If someone asks for a hash based on the bundle/output you currently change it to source if bundling is skipped. You introduced this in #11008.
Before #11008, when bundling was skipped hash calculation was changed to CUSTOM.
It's a regression because we are now fingerprinting the source arbitrally and it could be a very large directory.
I agree that there should be a discussion around deferring bundling but how can we first fix this regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note on why the NodejsFunction
uses OUTPUT
as hash type and not SOURCE
: it bundles not only the user's code but also all referenced node modules so it would be impractical to fingerprint the whole node modules folder. Moreover, we don't want to fingerprint .ts/.js files that represent the CDK infra code because this would incorrectly impact the hash of an asset representing runtime code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I mean,
if (hashType === AssetHashType.OUTPUT || AssetHashType.BUNDLE) {
was probably intended to read:
if (hashType === AssetHashType.OUTPUT || hashType === AssetHashType.BUNDLE) {
Feels like running the Docker container during Is there really now way to defer this work to asset publishing time? Is the whole reason we do this during And if that's true, is that only because a source hash is too expensive? I'm starting to think we need an alternative approach here (though I don't know what it should be yet). |
I think you're right @rix0rrr about the need for a more generic solution to this. I have also noticed that Lambda Functions (maybe CodeAssets?) cause the asset to be staged every time a CDK command is run, whereas most other assets are cached and won't be run again if they exist already (or maybe it's got to do with the hash being different?). In any case, I believe we have two regressions at the moment, which I tried to put into #11459 and #11646 These come down to the current implementation for bundling:
In my opinion these are regressions that should be addresses asap before looking into a future solution. |
Generally speaking, I'm not sure we should make an assumption here for the user. If they decide to hash based on the output of a bundle, that's their decision. We could however provide a better interface to allow different hashing at various stages. I.e. leave it up to them to pick between OUTPUT (default), SOURCE, CUSTOM or even a function? |
The problem is also present when doing |
Pull request has been modified.
Merging master because this branch is too much behind it and gives incorrect API breaking changes
|
Pull request has been modified.
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
If the asset uses OUTPUT or BUNDLE it's generally because its source is
a very large directory. This is the case for the
NodejsFunction
whichmounts the project root (along with the
node_modules
folder!).Use a custom hash in this case when skipping bundling. Otherwise running
cdk ls
can result in heavy fingerprinting operations (again this isthe case for the
NodejsFunction
) and can be much slower than runningcdk synth
orcdk diff
, making it pointless to skip bundling.Regression introduced in #11008
(https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)
Before #11008:
aws-cdk/packages/@aws-cdk/core/lib/asset-staging.ts
Lines 159 to 160 in c145314
Closes #11459
Closes #11460
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license