-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move architecture and operating system probing to Python #2381
Conversation
interpreter: interpreter.to_path_buf(), | ||
err, | ||
})?; | ||
let tempdir = tempdir()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to be annoying here, but are we going to say we're okay with temporary directories outside of scopes we control? I understand the system cleanup argument but perhaps we should be garbage collecting them ourselves from a dedicated cache directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there is a specific problem with the system directories, i believe we should follow system conventions and user's settings over rolling our own. For our own garbage collection, we'd also have to implement our own synchronization when there are multiple uv instances running in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say let's just put it in the cache for consistency. It shouldn't be too hard here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What synchronization would we need? I would just delete old temporary directories (i.e. 24 hours) similar to how /tmp
can be cleaned at arbitrary times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that. What does it give us over the system temp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't follow the motivation yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard of problems being caused by the system cleaning up temporary directories at arbitrary times which can break your program. Notably though we've already had problems with the system temporary directory (as discussed in the copy-on-write issue) and it seems safer to just use a directory we control. I'm totally fine with us deciding we want to use the system temporary directory, but I want us to be consistent if feasible or be clear about when we should use one vs another.
@charliermarsh previously @konstin said that using the system temporary directory was preferred for automated cleanup if we fail to clean up the directory i.e. if we crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard of problems being caused by the system cleaning up temporary directories at arbitrary times which can break your program.
If we can get a reproduction for that i'm all for switching to never using system temp again.
Notably though we've already had problems with the system temporary directory (as discussed in the copy-on-write issue) and it seems safer to just use a directory we control.
That is a very specific problem: Operating system don't offer atomic write operations, so e.g. two installer threads writing the same file could produce something broken/crash. Renames are (assumed to be) atomic and replace the existing file. Renames don't work across device boundaries (it would need a copy), so we need to create a named tempfile/tempdir in the same location as the final file, write to it and persist it. Some people e.g. put a ramdisk as their /tmp
or the target fs is part of an overlayfs while tmp isn't, or the project lies on a different partition or drive than the system.
fs_err::write( | ||
packaging_dir.join("_musllinux.py"), | ||
include_str!("../python/packaging/_musllinux.py"), | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm 100% serious, should we consider concatenating these into a single Python file for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my first idea, i like the current one for manageable file sizes and because we can easily update from pypa/packaging upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a processing script that merges them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is PyInstaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can script it i'm in favor. Maybe https://docs.python.org/3/library/zipimport.html is for that case? I haven't used it myself but i know people use it for bundling use cases.
5c61823
to
47bd417
Compare
@@ -120,7 +119,10 @@ async fn resolve( | |||
let client = RegistryClientBuilder::new(Cache::temp()?).build(); | |||
let flat_index = FlatIndex::default(); | |||
let index = InMemoryIndex::default(); | |||
let interpreter = Interpreter::artificial(Platform::current()?, markers.clone()); | |||
// TODO(konstin): Should we also use the bootstrapped pythons here? | |||
let real_interpreter = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fake interpreter needs to know it's platform to compute the tags we need for resolution. Previously, we could ask in rust, now we need an actual python interpreter. Assuming the dev has any python 3 with the right os + arch installed seemed like an ok assumption here, but i can pull in our bootstrapped python finding for a correct works-independent-of-outside-project-environment solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice work. I just made a few small tweaks and fixed what I think is one bug in the BSD path.
@konstin - I tried to merge this but |
It was the usual suspect again: bat files don't support UNC paths so the shim was skipped. |
## Summary Closes #1958 This adds linehaul metadata to uv's user-agent when pep 508 markers are provided to the RegistryClientBuilder. Thanks to #2381, we were able to leverage most information from markers and avoid inconsistency. Linehaul is meant to be accompanying metadata pip sends in it's user agent when talking to registries. You can see this output by running something like `python -c 'from pip._internal.network.session import user_agent; print(user_agent())'`. In PyPI, this metadata processed by the [linehaul-cloud-function](https://github.com/pypi/linehaul-cloud-function). More info about linehaul can be found in #1958. Below are some examples from pip: * Linux GHA: `pip/24.0 {"ci":true,"cpu":"x86_64","distro":{"id":"jammy","libc":{"lib":"glibc","version":"2.35"},"name":"Ubuntu","version":"22.04"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.2 15 Mar 2022","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Linux","release":"6.5.0-1016-azure"}}` * Windows GHA: `pip/24.0 {"ci":true,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Windows","release":"2022Server"}}` * OSX GHA: `pip/24.0 {"ci":true,"cpu":"arm64","distro":{"name":"macOS","version":"14.2.1"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Darwin","release":"23.2.0"}}` Here's how uv results look like (sorry for the keys not having the same order): * Linux GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":"Linux","release":"6.5.0-1016-azure"},"cpu":"x86_64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` * Windows GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":null,"system":{"name":"Windows","release":"2022Server"},"cpu":"AMD64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` * OSX GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"macOS","version":"14.2.1","id":null,"libc":null},"system":{"name":"Darwin","release":"23.2.0"},"cpu":"arm64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` Distro information (such as the one pip uses `from pip._vendor import distro` to retrieve instead of `platform` module) was not retrieved from markers. Instead, the linux release codename/name/version uses `sys-info` crate, adding about 50us of extra overhead on linux. The distro osx version re-used the [mac_os version implementation](https://github.com/astral-sh/uv/blob/99c992e38b220fbcda09b0b43602b3db2321480b/crates/platform-host/src/mac_os.rs) from #2381 which adds about 20us of overhead on osx. I tried to use other crates to avoid re-introducing `mac_os.rs` but most of them didn't yield satisfactory performance (40ms-60ms~) or had the wrong values needed (e.g. darwin version vs osx version). I also didn't add libc retrieval or rustc retrieval as those seem to add substantial overhead due to querying `ldd` or `rustc`. PyPy version detection was also not added to avoid adding extra overhead to [support PyPy for linehaul](https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L123). All other behavior was kept 1-1 to match what pip's linehaul implementation does (as of 24.0). This also aligns with what was discussed in #1958. ## Test Plan Added new integration test to uv-client. --------- Co-authored-by: konstin <[email protected]>
As described in #4242, we're currently incorrectly downloading glibc python-build-standalone on musl target, but we also can't fix this by using musl python-build-standalone on musl targets since the musl builds are effectively broken. We reintroduce the libc detection previously removed in #2381, using it to detect which libc is the current one. For simplicity, i've decided to just filter out the musl python-build-standalone archives from the list of available archive, given this is temporary. This means we show the same error message as if we don't have a build for the platform. We could also add a dedicated error message for musl. Fixes #4242 ## Test strategy On my ubuntu host, python downloads continue to pass: ``` target/x86_64-unknown-linux-musl/debug/uv python install ``` On alpine, we fail: ``` $ docker run -it --rm -v .:/io alpine /io/target/x86_64-unknown-linux-musl/debug/uv python install Searching for Python installations error: No download found for request: cpython-any-linux-x86_64-musl ```
As described in #4242, we're currently incorrectly downloading glibc python-build-standalone on musl target, but we also can't fix this by using musl python-build-standalone on musl targets since the musl builds are effectively broken. We reintroduce the libc detection previously removed in #2381, using it to detect which libc is the current one before we have a python interpreter. I changed the strategy a big to support an empty `PATH` which we use in the tests. For simplicity, i've decided to just filter out the musl python-build-standalone archives from the list of available archive, given this is temporary. This means we show the same error message as if we don't have a build for the platform. We could also add a dedicated error message for musl. Fixes #4242 ## Test Plan Tested manually. On my ubuntu host, python downloads continue to pass: ``` target/x86_64-unknown-linux-musl/debug/uv python install ``` On alpine, we fail: ``` $ docker run -it --rm -v .:/io alpine /io/target/x86_64-unknown-linux-musl/debug/uv python install Searching for Python installations error: No download found for request: cpython-any-linux-x86_64-musl ```
The architecture of uv does not necessarily match that of the python interpreter (#2326). In cross compiling/testing scenarios the operating system can also mismatch. To solve this, we move arch and os detection to python, vendoring the relevant pypa/packaging code, preventing mismatches between what the python interpreter was compiled for and what uv was compiled for.
To make the scripts more manageable, they are now a directory in a tempdir and we run them with
python -m
. I've simplified the pypa/packaging code since we're still building the tags in rust. APlatform
is now instantiated by querying the python interpreter for its platform. The pypa/packaging files are copied verbatim for easier updates except alru_cache()
python 3.7 backport.Error handling is done by a
"result": "success|error"
field that allow passing error details to rust:I've used the maturin sysconfig collection as reference. I'm unsure how to test these changes across the wide variety of platforms.
Fixes #2326