Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core): regression: source directory is fingerprinted even if bundling is skipped #11440

Merged
merged 7 commits into from
Dec 1, 2020

Conversation

jogold
Copy link
Contributor

@jogold jogold commented Nov 12, 2020

If the asset uses OUTPUT or BUNDLE it's generally because its source is
a very large directory. This is the case for the NodejsFunction which
mounts the project root (along with the node_modules folder!).

Use a custom hash in this case when skipping bundling. Otherwise running
cdk ls can result in heavy fingerprinting operations (again this is
the case for the NodejsFunction) and can be much slower than running
cdk synth or cdk diff, making it pointless to skip bundling.

Regression introduced in #11008
(https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)

Before #11008:

return props.assetHashType === AssetHashType.BUNDLE || props.assetHashType === AssetHashType.OUTPUT
? this.calculateHash(AssetHashType.CUSTOM, this.node.path) // Use node path as dummy hash because we're not bundling

Closes #11459
Closes #11460


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

If the asset uses OUTPUT or BUNDLE it's generally because its source is
a very large directory. This is the case for the `NodejsFunction` which
mounts the project root (along with the `node_modules` folder!).

Use a custom hash in this case when skipping bundling. Otherwise running
`cdk ls` can result in heavy fingerprinting operations (again this is
the case for the `NodejsFunction`).

Regression introduced in aws#11008
(https://github.com/aws/aws-cdk/pull/11008/files#diff-62eef996be8abeb157518522c3cbf84a33dd4751c103304df87b04eb6d7bbab6L160)
@gitpod-io
Copy link

gitpod-io bot commented Nov 12, 2020

@jogold
Copy link
Contributor Author

jogold commented Nov 12, 2020

@rix0rrr can you take a look?

return {
assetHash: this.calculateHash(AssetHashType.SOURCE),
assetHash: this.calculateHash(hashType),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can always use a custom hash in this case?

@mrgrain
Copy link
Contributor

mrgrain commented Nov 13, 2020

This looks good! Using a unique id seems appropriate. I was considering hashing the BundlingOptions object.

I don't think it would fix #11459 though, since the AssetHashType is set to source unless it's one of the output/bundle ones.

I was thinking maybe something like this? It mimics the hash calculation before bundling (just after the skip block).

    if (skip) {
      // We should have bundled, but didn't to save time. Still pretend to have a hash.
      // If the asset uses OUTPUT (or BUNDLE), we use a CUSTOM hash to avoid fingerprinting
      // a potentially very large source directory. Other hash types are kept the same.
      let hashType = this.hashType;
      if (hashType === AssetHashType.OUTPUT || hashType === AssetHashType.BUNDLE) {
        this.customSourceFingerprint = Names.uniqueId(this);
        hashType = AssetHashType.CUSTOM;
      }

      return {
        assetHash: this.calculateHash(hashType, bundling),
        stagedPath: this.sourcePath,
      };
    }

@jogold
Copy link
Contributor Author

jogold commented Nov 19, 2020

@rix0rrr ping... would be nice if this could be merged before the next release with the new NodejsFunction. Without this doing cdk ls on an app with NodejsFunction almost never completes.

// If the asset uses OUTPUT or BUNDLE, we use a CUSTOM hash to avoid fingerprinting
// a potentially very large source directory. Other hash types are kept the same.
let hashType = this.hashType;
if (hashType === AssetHashType.OUTPUT || AssetHashType.BUNDLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what that means :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean here exactly? This is not a Docker performance issue here. It's the fact that we are fingerprinting the wrong directory.

If someone asks for a hash based on the bundle/output you currently change it to source if bundling is skipped. You introduced this in #11008.

Before #11008, when bundling was skipped hash calculation was changed to CUSTOM.

It's a regression because we are now fingerprinting the source arbitrally and it could be a very large directory.

I agree that there should be a discussion around deferring bundling but how can we first fix this regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note on why the NodejsFunction uses OUTPUT as hash type and not SOURCE: it bundles not only the user's code but also all referenced node modules so it would be impractical to fingerprint the whole node modules folder. Moreover, we don't want to fingerprint .ts/.js files that represent the CDK infra code because this would incorrectly impact the hash of an asset representing runtime code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I mean,

if (hashType === AssetHashType.OUTPUT || AssetHashType.BUNDLE) {

was probably intended to read:

if (hashType === AssetHashType.OUTPUT || hashType === AssetHashType.BUNDLE) {

@rix0rrr
Copy link
Contributor

rix0rrr commented Nov 23, 2020

Feels like running the Docker container during synth is just asking for all kinds of performance issues, which then need a big slew of workarounds.

Is there really now way to defer this work to asset publishing time? Is the whole reason we do this during synth simply to get an artifact hash?

And if that's true, is that only because a source hash is too expensive?

I'm starting to think we need an alternative approach here (though I don't know what it should be yet).

@mrgrain
Copy link
Contributor

mrgrain commented Nov 24, 2020

I think you're right @rix0rrr about the need for a more generic solution to this. I have also noticed that Lambda Functions (maybe CodeAssets?) cause the asset to be staged every time a CDK command is run, whereas most other assets are cached and won't be run again if they exist already (or maybe it's got to do with the hash being different?).

In any case, I believe we have two regressions at the moment, which I tried to put into #11459 and #11646

These come down to the current implementation for bundling:

  • It ignores a custom asset hash.
  • It prefers the source hash in an attempt to "speed" things up. However this is a very biased assumption and the best way to really speed things up would be using a static or options based hash. Alternatively I think no assumption should be made and the code should be bundled and use the output hash as one would expect.

In my opinion these are regressions that should be addresses asap before looking into a future solution.

@mrgrain
Copy link
Contributor

mrgrain commented Nov 24, 2020

I'm starting to think we need an alternative approach here (though I don't know what it should be yet).
For anything using Node.js we can get around this mostly by using a source hash and ignore the node_modules directory. However the current version has deprecated the exclude option on some higher level asset related classes.

Generally speaking, I'm not sure we should make an assumption here for the user. If they decide to hash based on the output of a bundle, that's their decision. We could however provide a better interface to allow different hashing at various stages. I.e. leave it up to them to pick between OUTPUT (default), SOURCE, CUSTOM or even a function?

@jogold
Copy link
Contributor Author

jogold commented Nov 25, 2020

The problem is also present when doing cdk synth STACK1 -e.

@mergify mergify bot dismissed rix0rrr’s stale review November 30, 2020 15:14

Pull request has been modified.

rix0rrr
rix0rrr previously approved these changes Nov 30, 2020
@rix0rrr rix0rrr changed the title fix(core): heavy source fingerprinting when bundling is skipped fix(core): regression: source directory is fingerprinted even if bundling is skipped Nov 30, 2020
@jogold
Copy link
Contributor Author

jogold commented Nov 30, 2020

Merging master because this branch is too much behind it and gives incorrect API breaking changes

Updated assembly:  @aws-cdk/[email protected]
API elements with incompatible changes:
err  - METHOD @aws-cdk/core.Lazy.list: has been removed [removed:@aws-cdk/core.Lazy.list]
err  - IFACE @aws-cdk/core.IStableAnyProducer: has been removed [removed:@aws-cdk/core.IStableAnyProducer]
err  - IFACE @aws-cdk/core.IStableNumberProducer: has been removed [removed:@aws-cdk/core.IStableNumberProducer]
err  - IFACE @aws-cdk/core.IStableStringProducer: has been removed [removed:@aws-cdk/core.IStableStringProducer]
err  - PROP @aws-cdk/core.BootstraplessSynthesizer.DEFAULT_FILE_ASSET_PREFIX: has been removed [removed:@aws-cdk/core.BootstraplessSynthesizer.DEFAULT_FILE_ASSET_PREFIX]
err  - PROP @aws-cdk/core.DefaultStackSynthesizer.DEFAULT_FILE_ASSET_PREFIX: has been removed [removed:@aws-cdk/core.DefaultStackSynthesizer.DEFAULT_FILE_ASSET_PREFIX]
err  - METHOD @aws-cdk/core.Lazy.any: has been removed [removed:@aws-cdk/core.Lazy.any]
err  - IFACE @aws-cdk/core.IStableListProducer: has been removed [removed:@aws-cdk/core.IStableListProducer]
err  - METHOD @aws-cdk/core.Lazy.number: has been removed [removed:@aws-cdk/core.Lazy.number]
err  - METHOD @aws-cdk/core.Lazy.string: has been removed [removed:@aws-cdk/core.Lazy.string]
err  - METHOD @aws-cdk/core.Lazy.uncachedAny: has been removed [removed:@aws-cdk/core.Lazy.uncachedAny]
err  - METHOD @aws-cdk/core.Lazy.uncachedList: has been removed [removed:@aws-cdk/core.Lazy.uncachedList]
err  - METHOD @aws-cdk/core.Lazy.uncachedNumber: has been removed [removed:@aws-cdk/core.Lazy.uncachedNumber]
err  - METHOD @aws-cdk/core.Lazy.uncachedString: has been removed [removed:@aws-cdk/core.Lazy.uncachedString]
err  - PROP @aws-cdk/core.DefaultStackSynthesizerProps.bucketPrefix: has been removed [removed:@aws-cdk/core.DefaultStackSynthesizerProps.bucketPrefix]

@mergify mergify bot dismissed rix0rrr’s stale review November 30, 2020 16:44

Pull request has been modified.

@jogold jogold requested a review from rix0rrr November 30, 2020 16:44
@mergify
Copy link
Contributor

mergify bot commented Dec 1, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: b3cae9a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify
Copy link
Contributor

mergify bot commented Dec 1, 2020

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 3cbc7fa into aws:master Dec 1, 2020
@jogold jogold deleted the custom-hash-skip-bundling branch December 1, 2020 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants