Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kernel): package cache fails under parallelism #4215

Merged
merged 5 commits into from
Aug 10, 2023

Conversation

rix0rrr
Copy link
Contributor

@rix0rrr rix0rrr commented Aug 9, 2023

The package cache mechanism that was turned on by default in #4181 works in theory under parallelism, but not in practice.

Typically the CDK CLI will prevent CDK apps from running in parallel, but Python testing frameworks like tox use subprocess parallelism to speed up test runs, leading to the jsii imports being executed at the same time.

Since jsii is sync, the locking needs to be sync. The sync locking feature of the lockfile library doesn't have wait support (for good reason), and so when a lock is already acquired by another process it quickly burns through its 12 retries in a hot loop, and then exits with an error.

Two changes to address this:

  • (Ab)use Atomics.wait() to get a synchronous sleeping primitive; since lockSync doesn't support synchronous sleep, we build our own retry loop with synchronous sleep around lockSync.
  • Since the extracted directory is immutable: if the marker file in the extracted directory exists, we can treat it as evidence that the directory has been completely written and we can skip trying to vy for exclusive access to write it. This avoids all lock contention after the very first CDK app execution.

Fixes #4207.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

The package cache mechanism that was turned on by default in #4181
works in theory under parallelism, but not in practice.

Typically the CDK CLI will prevent CDK apps from running in parallel,
but Python testing frameworks like `tox` use subprocess parallelism
to speed up test runs, leading to the jsii imports being executed
at the same time.

Since jsii is sync, the locking needs to be sync. The sync locking
feature of the `lockfile` library doesn't have wait support (for good
reason), and so when a lock is already acquired by another process
it quickly burns through its 12 retries in a hot loop, and then exits
with an error.

Two changes to address this:

- (Ab)use `Atomics.wait()` to get a synchronous sleeping primitive;
  since `lockSync` doesn't support synchronous sleep, we build our
  own retry loop with synchronous sleep around `lockSync`.
- Since the extracted directory is immutable: if the marker file in the
  extracted directory exists, we can treat it as evidence that the
  directory has been completely written and we can skip trying to vy
  for exclusive access to write it. This avoids all lock contention
  after the very first CDK app execution.

Fixes #4207.
@rix0rrr rix0rrr requested a review from a team August 9, 2023 14:18
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Aug 9, 2023
@mergify
Copy link
Contributor

mergify bot commented Aug 9, 2023

Thank you for contributing! ❤️ I will now look into making sure the PR is up-to-date, then proceed to try and merge it!

@mergify mergify bot added the pr/ready-to-merge This PR is ready to be merged. label Aug 9, 2023
@mergify
Copy link
Contributor

mergify bot commented Aug 9, 2023

Merging (with squash)...

@mergify
Copy link
Contributor

mergify bot commented Aug 10, 2023

Merging (with squash)...

@mergify mergify bot merged commit b739ef6 into main Aug 10, 2023
@mergify mergify bot deleted the huijbers/parallelism-safety branch August 10, 2023 09:47
@mergify mergify bot removed the pr/ready-to-merge This PR is ready to be merged. label Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution/core This is a PR that came from AWS.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jsii release 1.86.0 is breaking our tox tests
2 participants