Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move architecture and operating system probing to Python #2381

Merged
merged 24 commits into from
Mar 13, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Mar 12, 2024

The architecture of uv does not necessarily match that of the python interpreter (#2326). In cross compiling/testing scenarios the operating system can also mismatch. To solve this, we move arch and os detection to python, vendoring the relevant pypa/packaging code, preventing mismatches between what the python interpreter was compiled for and what uv was compiled for.

To make the scripts more manageable, they are now a directory in a tempdir and we run them with python -m . I've simplified the pypa/packaging code since we're still building the tags in rust. A Platform is now instantiated by querying the python interpreter for its platform. The pypa/packaging files are copied verbatim for easier updates except a lru_cache() python 3.7 backport.

Error handling is done by a "result": "success|error" field that allow passing error details to rust:

$ uv venv --no-cache
  × Can't use Python at `/home/konsti/projects/uv/.venv/bin/python3`
  ╰─▶ Unknown operation system `linux`

I've used the maturin sysconfig collection as reference. I'm unsure how to test these changes across the wide variety of platforms.

Fixes #2326

@konstin konstin marked this pull request as ready for review March 12, 2024 13:40
@charliermarsh charliermarsh self-requested a review March 12, 2024 14:18
interpreter: interpreter.to_path_buf(),
err,
})?;
let tempdir = tempdir()?;
Copy link
Member

@zanieb zanieb Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be annoying here, but are we going to say we're okay with temporary directories outside of scopes we control? I understand the system cleanup argument but perhaps we should be garbage collecting them ourselves from a dedicated cache directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a specific problem with the system directories, i believe we should follow system conventions and user's settings over rolling our own. For our own garbage collection, we'd also have to implement our own synchronization when there are multiple uv instances running in parallel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say let's just put it in the cache for consistency. It shouldn't be too hard here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What synchronization would we need? I would just delete old temporary directories (i.e. 24 hours) similar to how /tmp can be cleaned at arbitrary times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that. What does it give us over the system temp?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't follow the motivation yet.

Copy link
Member

@zanieb zanieb Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard of problems being caused by the system cleaning up temporary directories at arbitrary times which can break your program. Notably though we've already had problems with the system temporary directory (as discussed in the copy-on-write issue) and it seems safer to just use a directory we control. I'm totally fine with us deciding we want to use the system temporary directory, but I want us to be consistent if feasible or be clear about when we should use one vs another.

@charliermarsh previously @konstin said that using the system temporary directory was preferred for automated cleanup if we fail to clean up the directory i.e. if we crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard of problems being caused by the system cleaning up temporary directories at arbitrary times which can break your program.

If we can get a reproduction for that i'm all for switching to never using system temp again.

Notably though we've already had problems with the system temporary directory (as discussed in the copy-on-write issue) and it seems safer to just use a directory we control.

That is a very specific problem: Operating system don't offer atomic write operations, so e.g. two installer threads writing the same file could produce something broken/crash. Renames are (assumed to be) atomic and replace the existing file. Renames don't work across device boundaries (it would need a copy), so we need to create a named tempfile/tempdir in the same location as the final file, write to it and persist it. Some people e.g. put a ramdisk as their /tmp or the target fs is part of an overlayfs while tmp isn't, or the project lies on a different partition or drive than the system.

fs_err::write(
packaging_dir.join("_musllinux.py"),
include_str!("../python/packaging/_musllinux.py"),
)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 100% serious, should we consider concatenating these into a single Python file for simplicity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my first idea, i like the current one for manageable file sizes and because we can easily update from pypa/packaging upstream

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a processing script that merges them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is PyInstaller?

Copy link
Member Author

@konstin konstin Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can script it i'm in favor. Maybe https://docs.python.org/3/library/zipimport.html is for that case? I haven't used it myself but i know people use it for bundling use cases.

@konstin konstin force-pushed the konsti/get-arch-and-os-from-python branch from 5c61823 to 47bd417 Compare March 12, 2024 16:31
@@ -120,7 +119,10 @@ async fn resolve(
let client = RegistryClientBuilder::new(Cache::temp()?).build();
let flat_index = FlatIndex::default();
let index = InMemoryIndex::default();
let interpreter = Interpreter::artificial(Platform::current()?, markers.clone());
// TODO(konstin): Should we also use the bootstrapped pythons here?
let real_interpreter =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fake interpreter needs to know it's platform to compute the tags we need for resolution. Previously, we could ask in rust, now we need an actual python interpreter. Assuming the dev has any python 3 with the right os + arch installed seemed like an ok assumption here, but i can pull in our bootstrapped python finding for a correct works-independent-of-outside-project-environment solution.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice work. I just made a few small tweaks and fixed what I think is one bug in the BSD path.

@charliermarsh charliermarsh changed the title Move arch and os probing to python Move architecture and operating system probing to Python Mar 13, 2024
@charliermarsh charliermarsh added the bug Something isn't working label Mar 13, 2024
@charliermarsh charliermarsh enabled auto-merge (squash) March 13, 2024 04:21
@charliermarsh
Copy link
Member

@konstin - I tried to merge this but windows_shims was failing after merging in main, I have no idea why.

@konstin
Copy link
Member Author

konstin commented Mar 13, 2024

It was the usual suspect again: bat files don't support UNC paths so the shim was skipped.

@charliermarsh charliermarsh merged commit 7964bfb into main Mar 13, 2024
26 checks passed
@charliermarsh charliermarsh deleted the konsti/get-arch-and-os-from-python branch March 13, 2024 11:51
konstin added a commit that referenced this pull request Mar 18, 2024
## Summary

Closes #1958

This adds linehaul metadata to uv's user-agent when pep 508 markers are
provided to the RegistryClientBuilder. Thanks to #2381, we were able to
leverage most information from markers and avoid inconsistency.

Linehaul is meant to be accompanying metadata pip sends in it's user
agent when talking to registries. You can see this output by running
something like `python -c 'from pip._internal.network.session import
user_agent; print(user_agent())'`.
In PyPI, this metadata processed by the
[linehaul-cloud-function](https://github.com/pypi/linehaul-cloud-function).
More info about linehaul can be found in #1958.

Below are some examples from pip:

* Linux GHA: `pip/24.0
{"ci":true,"cpu":"x86_64","distro":{"id":"jammy","libc":{"lib":"glibc","version":"2.35"},"name":"Ubuntu","version":"22.04"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.2 15 Mar
2022","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Linux","release":"6.5.0-1016-azure"}}`
* Windows GHA: `pip/24.0
{"ci":true,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.13 30 Jan
2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Windows","release":"2022Server"}}`
* OSX GHA: `pip/24.0
{"ci":true,"cpu":"arm64","distro":{"name":"macOS","version":"14.2.1"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.13 30 Jan
2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Darwin","release":"23.2.0"}}`



Here's how uv results look like (sorry for the keys not having the same
order):

* Linux GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":"Linux","release":"6.5.0-1016-azure"},"cpu":"x86_64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`
* Windows GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":null,"system":{"name":"Windows","release":"2022Server"},"cpu":"AMD64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`
* OSX GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"macOS","version":"14.2.1","id":null,"libc":null},"system":{"name":"Darwin","release":"23.2.0"},"cpu":"arm64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`

Distro information (such as the one pip uses `from pip._vendor import
distro` to retrieve instead of `platform` module) was not retrieved from
markers. Instead, the linux release codename/name/version uses
`sys-info` crate, adding about 50us of extra overhead on linux. The
distro osx version re-used the [mac_os version
implementation](https://github.com/astral-sh/uv/blob/99c992e38b220fbcda09b0b43602b3db2321480b/crates/platform-host/src/mac_os.rs)
from #2381 which adds about 20us of overhead on osx. I tried to use
other crates to avoid re-introducing `mac_os.rs` but most of them didn't
yield satisfactory performance (40ms-60ms~) or had the wrong values
needed (e.g. darwin version vs osx version).

I also didn't add libc retrieval or rustc retrieval as those seem to add
substantial overhead due to querying `ldd` or `rustc`. PyPy version
detection was also not added to avoid adding extra overhead to [support
PyPy for
linehaul](https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L123).
All other behavior was kept 1-1 to match what pip's linehaul
implementation does (as of 24.0). This also aligns with what was
discussed in #1958.

## Test Plan

Added new integration test to uv-client.

---------

Co-authored-by: konstin <[email protected]>
konstin added a commit that referenced this pull request Aug 26, 2024
As described in #4242, we're currently incorrectly downloading glibc python-build-standalone on musl target, but we also can't fix this by using musl python-build-standalone on musl targets since the musl builds are effectively broken.

We reintroduce the libc detection previously removed in #2381, using it to detect which libc is the current one.

For simplicity, i've decided to just filter out the musl python-build-standalone archives from the list of available archive, given this is temporary. This means we show the same error message as if we don't have a build for the platform. We could also add a dedicated error message for musl.

Fixes #4242

## Test strategy

On my ubuntu host, python downloads continue to pass:
```
target/x86_64-unknown-linux-musl/debug/uv python install
```

On alpine, we fail:
```
$ docker run -it --rm -v .:/io alpine /io/target/x86_64-unknown-linux-musl/debug/uv python install
  Searching for Python installations
  error: No download found for request: cpython-any-linux-x86_64-musl
```
charliermarsh pushed a commit that referenced this pull request Aug 27, 2024
As described in #4242, we're currently incorrectly downloading glibc
python-build-standalone on musl target, but we also can't fix this by
using musl python-build-standalone on musl targets since the musl builds
are effectively broken.

We reintroduce the libc detection previously removed in #2381, using it
to detect which libc is the current one before we have a python
interpreter. I changed the strategy a big to support an empty `PATH`
which we use in the tests.

For simplicity, i've decided to just filter out the musl
python-build-standalone archives from the list of available archive,
given this is temporary. This means we show the same error message as if
we don't have a build for the platform. We could also add a dedicated
error message for musl.

Fixes #4242

## Test Plan

Tested manually.

On my ubuntu host, python downloads continue to pass:
```
target/x86_64-unknown-linux-musl/debug/uv python install
```

On alpine, we fail:
```
$ docker run -it --rm -v .:/io alpine /io/target/x86_64-unknown-linux-musl/debug/uv python install
  Searching for Python installations
  error: No download found for request: cpython-any-linux-x86_64-musl
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uv should use arch of python, not of uv
3 participants