Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote-apis testing: pants is failing since 2.16 #20731

Open
jjardon opened this issue Mar 29, 2024 · 6 comments
Open

Remote-apis testing: pants is failing since 2.16 #20731

jjardon opened this issue Mar 29, 2024 · 6 comments
Labels
backend: Python Python backend-related issues bug remote

Comments

@jjardon
Copy link

jjardon commented Mar 29, 2024

Describe the bug
Since 2.16 (commit 27fc9ee7761e61f3c5c9b502d612df5f1f13e29b in https://github.com/pantsbuild/example-python), pants appears to be behave incorrectly when used for remote execution

See https://gitlab.com/remote-apis-testing/remote-apis-testing/-/merge_requests/362

Pants version
Issue is present from 2.16 to current one, 2.19 (commit f37c500e4f4e0c67e29aa9434b1b414f333bdd79 in https://github.com/pantsbuild/example-python)

OS
Linux

Additional info
We have upgraded to latest (2.19) in https://remote-apis-testing.gitlab.io/remote-apis-testing/ to show current status for now, please let us know if this is a known issue and we will upgrade as soon as a new tag is available

@jjardon jjardon added the bug label Mar 29, 2024
@huonw huonw added the remote label Apr 16, 2024
@huonw
Copy link
Contributor

huonw commented Apr 16, 2024

Thanks for flagging this.

Is there a summary of how we can run the remote-apis-testing tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybe docker-compose/run.sh is what we should be starting with?

@sdclarke
Copy link

Is there a summary of how we can run the remote-apis-testing tests locally, to reproduce this? I see that https://gitlab.com/remote-apis-testing/remote-apis-testing/-/blob/master/CONTRIBUTING.md seems to be focused on augmenting the CI pipeline, although peeking at that pipeline does suggest that maybe docker-compose/run.sh is what we should be starting with?

In order to run a test with pants locally you can cd docker-compose and run ./run.sh -g -c pants.yml -s <server>.yml (the server options being buildbarn, buildfarm and buildgrid). After the first run it is no longer necessary to specify the -g option, as this is used to generate the docker compose yaml files. You can make changes to the generated files if necessary and then re-use the same command above without that flag to test the changes.

I hope this helps!

@seifertm
Copy link

seifertm commented May 3, 2024

The logs of one of the test runs report that the RE server fails to find the Python3.9 executable in /root/.cache/nce/60b51…8fda/….

I see a similar error when testing locally with Nativelink as the underlying REAPI server. It looks like Pants submits an absolute path to the client's home directory to the remote execution server. The server is unable to find that path locally, because it runs on a different machine. The log below shows server output in my test. Pants is run as user michael, but there is no user michael in the executor container.

nativelink_executor-1   |   2024-05-03T04:41:00.194254Z ERROR nativelink_worker::local_worker: Error executing action, err: Error { code: NotFound, messages: ["No such file or directory (os error 2)", "Could not execute command [\"/home/michael/.cache/nce/fa6ec1ff473e58cf7dff9577ae94c2bde6bf1c7a837c75b928b414c0195eb80e/bindings/venvs/2.20.0/bin/python3.9\", \"./pex\", \"--tmpdir\", \".tmp\", \"--no-emit-warnings\", \"--pip-version\", \"23.1.2\", \"--python-path\", \"\", \"--output-file\", \"local_dists.pex\", \"--intransitive\", \"--interpreter-constraint\", \"CPython==3.11.*\", \"--sources-directory=source_files\", \"--no-pypi\", \"--index=https://pypi.org/simple/\", \"--manylinux\", \"manylinux2014\", \"--resolver-version\", \"pip-2020-resolver\", \"--layout\", \"zipapp\"]"] }

I assume something similar is happening in the remote-apis-testing repository, except that the mismatching contents of /root/.cache isn't apparent, because both the local and remote environments are supposedly run as root.

@huonw
Copy link
Contributor

huonw commented May 5, 2024

Ah okay, that's a handy smoking gun. Thank you!

It looks like the process invocation is referencing the absolute path to the ~/.cache/nce/.../python3.9 binary that scie-pants provides (the "bootstrap python"). I thus have a suspicion that this might've been caused by #18433 (cherry-picked back to 2.16 in #18495) which switched us to running PEX with that Python interpreter, rather than one more "normally" managed.

@thejcannon (as author) @stuhood (as reviewer): do you have any insight into how we might fix this remote execution issue?

@huonw huonw added the backend: Python Python backend-related issues label May 5, 2024
@thejcannon
Copy link
Member

The get_python_for_scripts should be computing the path of the unpacked digest in the remote environment (and not the local path). Either that's wrong or remote execution is incorrectly having their code return the local path.

@seifertm
Copy link

seifertm commented May 9, 2024

Thanks for the pointer! The rule you mentioned clearly distinguishes between remote and local environments:

@rule
async def get_python_for_scripts(env_tgt: EnvironmentTarget) -> PythonBuildStandaloneBinary:
if env_tgt.val is None or isinstance(env_tgt.val, LocalEnvironmentTarget):
return PythonBuildStandaloneBinary(sys.executable)
result = await Get(_PythonBuildStandaloneBinary, _DownloadPythonBuildStandaloneBinaryRequest())
return PythonBuildStandaloneBinary(result.path)

When get_python_for_scripts requests _PythonBuildStandaloneBinary in line 55, I expect the engine to run the download_python_binary rule:

@rule(desc="Downloading Python for scripts", level=LogLevel.TRACE)
async def download_python_binary(
_: _DownloadPythonBuildStandaloneBinaryRequest,
platform: Platform,
tar_binary: TarBinary,
bash_binary: BashBinary,
python_bootstrap: PythonBootstrapSubsystem,
system_binaries_environment: SystemBinariesSubsystem.EnvironmentAware,
) -> _PythonBuildStandaloneBinary:

The Python binary download should emit a log message, but running pants with -ltrace doesn't seem to emit that log message. Consequently, it looks that the EnvironmentTarget passed to get_python_for_scripts or the corresponding if clause has an issue.

Here's a reproducer. Start an REAPI server in one terminal:

git clone https://github.com/TraceMachina/nativelink.git
cd nativelink/deployment-examples/docker-compose
docker compose up -d --build
docker compose logs -f

Start Pants in another terminal:

cat << EOF >pants.toml
[GLOBAL]
pants_version = "2.20.0"
backend_packages = [
    "pants.backend.python",
]
remote_execution = true
remote_store_address = "grpc://127.0.0.1:50051"
remote_execution_address = "grpc://127.0.0.1:50052"
remote_instance_name = "main"
process_execution_remote_parallelism = 1

[python]
interpreter_constraints = ["==3.11.*"]
EOF

cat << EOF >app.py
print("hello")
EOF

cat << EOF >BUILD
python_sources()
EOF

pants -ltrace --no-pantsd --no-local-cache run app.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues bug remote
Projects
None yet
Development

No branches or pull requests

5 participants