-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include linehaul information in user agent #1958
Comments
Perfect, thanks @pradyunsg! |
Everything here looks pretty straightforward (and we should already have it all available), perhaps with the exception of |
Yea, skipping values you don't have readily available seems fine. |
Last I benchmarked, construction of user agent was surprisingly expensive in pip, took like 200ms. |
Hm, looks like it might cost up to 600ms in a representative environment:
If pip were to skip getting setuptools version then pip's user agent construction costs 190ms for me. Next most expensive thing is rustc version, which wouldn't be horrible to cache. I guess off topic for this tracker, but Pradyun let me know if pip is interested in PRs here |
I've asked upstream about
The entrypoint is a shim, so afaik you can't cache it reliably (i'm happy to query the default rustc from a different place though). Is the setuptools information from uv relevant given that we don't install it be default, and always use the latest (compatible) version in build envs? Except for the |
In general I would say that the setuptools information is much less useful than it used to be: projects who need a newer setuptools can now reliably depend on it with What do you mean a page with rust version stats? Are you asking where you can see the stats from PyPI? They're all available in BigQuery. Here's an example of the type of analysis I do with it: |
Let’s just stick to what we have access to (I’d like to omit rustc and setuptools for now). |
Yes, something like all but including the rust version. Getting the data from bigquery is quite the overhead if someone wants just simple caniuse.com style check. |
Ah. I'm not aware of any website that displays rust versions for all of PyPI. |
## Summary Closes #1977 This allows us to send uv's version in the `uv-client` User Agent header. Here's how request headers look like to a server now: ``` ... Accept: application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html;q=0.2, text/html;q=0.01 User-Agent: uv/0.1.13 ... ``` ~~I went for a mix of Option 1 and 2 from #1977.~~ Open to alternative naming as well, not tied too strongly here to the names picked. ~~Another possibility for this new crate is that we can use it to consolidate metadata that exists across crates to ultimately be able to create linehaul information described in #1958, but I haven't looked into what those changes might look like.~~ <!-- What's the purpose of the change? What does it do, and why? --> ## Test Plan <!-- How was it tested? --> Added initial tests in the new crate to exercise its public API and added a new test to uv-client to validate the headers using a 1-time disposable server.
## Summary Closes #1958 This adds linehaul metadata to uv's user-agent when pep 508 markers are provided to the RegistryClientBuilder. Thanks to #2381, we were able to leverage most information from markers and avoid inconsistency. Linehaul is meant to be accompanying metadata pip sends in it's user agent when talking to registries. You can see this output by running something like `python -c 'from pip._internal.network.session import user_agent; print(user_agent())'`. In PyPI, this metadata processed by the [linehaul-cloud-function](https://github.com/pypi/linehaul-cloud-function). More info about linehaul can be found in #1958. Below are some examples from pip: * Linux GHA: `pip/24.0 {"ci":true,"cpu":"x86_64","distro":{"id":"jammy","libc":{"lib":"glibc","version":"2.35"},"name":"Ubuntu","version":"22.04"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.2 15 Mar 2022","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Linux","release":"6.5.0-1016-azure"}}` * Windows GHA: `pip/24.0 {"ci":true,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Windows","release":"2022Server"}}` * OSX GHA: `pip/24.0 {"ci":true,"cpu":"arm64","distro":{"name":"macOS","version":"14.2.1"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Darwin","release":"23.2.0"}}` Here's how uv results look like (sorry for the keys not having the same order): * Linux GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":"Linux","release":"6.5.0-1016-azure"},"cpu":"x86_64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` * Windows GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":null,"system":{"name":"Windows","release":"2022Server"},"cpu":"AMD64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` * OSX GHA: `uv/0.1.21 {"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"macOS","version":"14.2.1","id":null,"libc":null},"system":{"name":"Darwin","release":"23.2.0"},"cpu":"arm64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}` Distro information (such as the one pip uses `from pip._vendor import distro` to retrieve instead of `platform` module) was not retrieved from markers. Instead, the linux release codename/name/version uses `sys-info` crate, adding about 50us of extra overhead on linux. The distro osx version re-used the [mac_os version implementation](https://github.com/astral-sh/uv/blob/99c992e38b220fbcda09b0b43602b3db2321480b/crates/platform-host/src/mac_os.rs) from #2381 which adds about 20us of overhead on osx. I tried to use other crates to avoid re-introducing `mac_os.rs` but most of them didn't yield satisfactory performance (40ms-60ms~) or had the wrong values needed (e.g. darwin version vs osx version). I also didn't add libc retrieval or rustc retrieval as those seem to add substantial overhead due to querying `ldd` or `rustc`. PyPy version detection was also not added to avoid adding extra overhead to [support PyPy for linehaul](https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L123). All other behavior was kept 1-1 to match what pip's linehaul implementation does (as of 24.0). This also aligns with what was discussed in #1958. ## Test Plan Added new integration test to uv-client. --------- Co-authored-by: konstin <[email protected]>
This is effectively a request to include download-related information available to PyPI when interacting with the index server, so that informtion can used to make ecosystem-wide decision (by querying said information via https://warehouse.pypa.io/api-reference/bigquery-datasets.html#download-statistics-table).
https://github.com/pypi/linehaul-cloud-function is the PyPI side implementation. https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L109 is the pip side implementation.
This data powers decision making such as https://pypistats.org/packages/__all__ (and similar sites), https://mayeut.github.io/manylinux-timeline/ and a few ad-hoc queries to determine usage patterns across the ecosystem.
The text was updated successfully, but these errors were encountered: