Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_runtime_package_dependencies might not be deterministic #14843

Closed
stuhood opened this issue Mar 17, 2022 · 5 comments · Fixed by #15292
Closed

build_runtime_package_dependencies might not be deterministic #14843

stuhood opened this issue Mar 17, 2022 · 5 comments · Fixed by #15292
Assignees
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Mar 17, 2022

I haven't investigated deeply yet, but it looks like pants.core.goals.test.build_runtime_package_dependencies might not be deterministic.

While inspecting traces of Pants' own CI, I noticed that pep561_integration_test.py and its inputs tend to (always?) miss the remote cache, and are the long pole in CI:
pep561-integration

It's not clear how the non-determinism gets kicked off, but it's unlikely to be a process (at least initially), because hitting the cache for a non-deterministic process is still deterministic. So most likely either a @rule or uncacheable process.

@stuhood stuhood added the bug label Mar 17, 2022
@stuhood
Copy link
Member Author

stuhood commented Mar 18, 2022

It's possible that this would be a good testbed for #14195.

@stuhood
Copy link
Member Author

stuhood commented Apr 29, 2022

Noticed this again tonight (without it, pantsbuild/pants CI would complete about 1 minute faster on a warm cache).

It reproduces locally if you restart pantsd between attempts to run:

./pants test tests/python/pants_test/integration/pep561_integration_test.py

I think that rather than build_runtime_package_dependencies, this is likely to be the code below it: either src/python/pants/backend/python/util_rules/dists.py or src/python/pants/backend/python/goals/setup_py.py. In the absence of #14195, the best bet is probably to run in -ldebug and compare the input/output digests from the relevant processes.

@stuhood
Copy link
Member Author

stuhood commented Apr 29, 2022

Running with -ldebug, and grepping out and diffing the spawning local process lines, it looks like the culprit is that:

../build_backend.pex_pex_shim.sh backend_shim.py

... has a different input digest in each run. That input was probably the output of some other process (or @rule) which was non-deterministic (we don't log process results currently: maybe we should).

@jsirois
Copy link
Contributor

jsirois commented Apr 29, 2022

I can't make sense of inputs to that process being the culprit, but the outputs are almost certainly non-deterministic without extra effort. The sdists and wheels built by that process are being built by known and unknown 3rd parties (setuptools+wheel, poetry, flit, ...). They may respect SOURCE_DATE_EPOCH, but we should probably be setting it regardless just in case here:

if python_setup.macos_big_sur_compatibility and is_macos_big_sur():
extra_env = {"MACOSX_DEPLOYMENT_TARGET": "10.16"}
else:
extra_env = {}

@stuhood
Copy link
Member Author

stuhood commented Apr 29, 2022

Yea, let's definitely do SOURCE_DATE_EPOCH: thanks.

The reason inputs are relevant here though is that even if the backend_shim.py process is non-deterministic, if we hit the cache for it like we should in this case, the non-determinism would be papered over, and it wouldn't need to re-run. So we have multiple culprits.

@stuhood stuhood self-assigned this May 1, 2022
stuhood added a commit that referenced this issue May 1, 2022
…15292)

As described in #14843, (some) tests which consumed `runtime_package_dependencies` would always miss the cache.

This was because the dists generated for each run were slightly different, which came down to the `package_data` generated for `resources` being in a non-deterministic order due to iterating over a `set`. See #14195 (comment) for some thoughts on how to avoid this kind of issue in the future.

Fixes #14843: warm CI times should drop by about 1 minute from ~3m to ~2m.
stuhood added a commit to stuhood/pants that referenced this issue May 2, 2022
…antsbuild#15292)

As described in pantsbuild#14843, (some) tests which consumed `runtime_package_dependencies` would always miss the cache.

This was because the dists generated for each run were slightly different, which came down to the `package_data` generated for `resources` being in a non-deterministic order due to iterating over a `set`. See pantsbuild#14195 (comment) for some thoughts on how to avoid this kind of issue in the future.

Fixes pantsbuild#14843: warm CI times should drop by about 1 minute from ~3m to ~2m.
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
stuhood added a commit to stuhood/pants that referenced this issue May 2, 2022
…antsbuild#15292)

As described in pantsbuild#14843, (some) tests which consumed `runtime_package_dependencies` would always miss the cache.

This was because the dists generated for each run were slightly different, which came down to the `package_data` generated for `resources` being in a non-deterministic order due to iterating over a `set`. See pantsbuild#14195 (comment) for some thoughts on how to avoid this kind of issue in the future.

Fixes pantsbuild#14843: warm CI times should drop by about 1 minute from ~3m to ~2m.
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
stuhood added a commit that referenced this issue May 2, 2022
…Cherry-pick of #15292) (#15299)

As described in #14843, (some) tests which consumed `runtime_package_dependencies` would always miss the cache.

This was because the dists generated for each run were slightly different, which came down to the `package_data` generated for `resources` being in a non-deterministic order due to iterating over a `set`. See #14195 (comment) for some thoughts on how to avoid this kind of issue in the future.

Fixes #14843: warm CI times should drop by about 1 minute from ~3m to ~2m.

[ci skip-rust]
stuhood added a commit that referenced this issue May 2, 2022
…Cherry-pick of #15292) (#15300)

As described in #14843, (some) tests which consumed `runtime_package_dependencies` would always miss the cache.

This was because the dists generated for each run were slightly different, which came down to the `package_data` generated for `resources` being in a non-deterministic order due to iterating over a `set`. See #14195 (comment) for some thoughts on how to avoid this kind of issue in the future.

Fixes #14843: warm CI times should drop by about 1 minute from ~3m to ~2m.


[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants