Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc --version is slow even without the rustup wrapper #121631

Open
konstin opened this issue Feb 26, 2024 · 7 comments
Open

rustc --version is slow even without the rustup wrapper #121631

konstin opened this issue Feb 26, 2024 · 7 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@konstin
Copy link

konstin commented Feb 26, 2024

Problem Description

Running rustc --version without the rustup wrapper takes 11ms on my linux machine (See rust-lang/rustup#2626 for the rustup side of this).

This is an issue for uv, as we've been asked to include the output of rustc --version in our user agent when making requests to the python package index so the python ecosystem gets usage stats. A minimal resolution with a network request (revalidation request) takes ~100ms on machine, so 20ms extra before the first network request is noticeable. I'd also be happy to read the default rustc version from another location, given that this works with alternative ways of installation.

Benchmarks

The benchmark runs from my user home on ubuntu, and i've include rustc with and without rustup, python without shim and node with volta shim and without for comparison. Tested with rustc 1.76.0 (07dca48 2024-02-04).

$ hyperfine --warmup 10 --shell=none "rustc --version" ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "python --version" "node --version" ".volta/tools/image/node/18.18.2/bin/node --version"
Benchmark 1: rustc --version
  Time (mean ± σ):      19.9 ms ±   1.5 ms    [User: 14.8 ms, System: 5.0 ms]
  Range (min … max):    17.5 ms …  26.3 ms    157 runs
 
Benchmark 2: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      10.6 ms ±   3.0 ms    [User: 4.8 ms, System: 5.6 ms]
  Range (min … max):     4.4 ms …  17.6 ms    240 runs
 
Benchmark 3: python --version
  Time (mean ± σ):       1.4 ms ±   0.5 ms    [User: 0.9 ms, System: 0.4 ms]
  Range (min … max):     0.3 ms …   2.3 ms    1635 runs
 
Benchmark 4: node --version
  Time (mean ± σ):       9.7 ms ±   3.1 ms    [User: 3.9 ms, System: 5.8 ms]
  Range (min … max):     2.8 ms …  14.3 ms    229 runs
 
Benchmark 5: .volta/tools/image/node/18.18.2/bin/node --version
  Time (mean ± σ):       7.2 ms ±   2.2 ms    [User: 2.4 ms, System: 4.6 ms]
  Range (min … max):     1.7 ms …  12.0 ms    796 runs
 
Summary
  python --version ran
    5.28 ± 2.54 times faster than .volta/tools/image/node/18.18.2/bin/node --version
    7.14 ± 3.50 times faster than node --version
    7.77 ± 3.64 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
   14.62 ± 5.57 times faster than rustc --version

On a low-end server and a shared server the contrast to python becomes even more stark:

$ hyperfine --warmup 10 --shell=none ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "python3.11 --version"
Benchmark 1: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      20.8 ms ±   2.3 ms    [User: 7.8 ms, System: 12.7 ms]
  Range (min … max):    18.3 ms …  35.9 ms    136 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: python3.11 --version
  Time (mean ± σ):       1.8 ms ±   0.3 ms    [User: 1.1 ms, System: 0.6 ms]
  Range (min … max):     1.3 ms …   5.9 ms    1882 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  python3.11 --version ran
   11.36 ± 2.12 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
$ hyperfine --warmup 10 --shell=none ".rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version" "/usr/bin/python3.11 --version"
Benchmark 1: .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
  Time (mean ± σ):      34.3 ms ±   7.1 ms    [User: 12.1 ms, System: 21.2 ms]
  Range (min … max):    26.2 ms …  64.8 ms    80 runs
 
Benchmark 2: /usr/bin/python3.11 --version
  Time (mean ± σ):       6.6 ms ±   3.1 ms    [User: 1.5 ms, System: 4.7 ms]
  Range (min … max):     4.1 ms …  64.7 ms    476 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  /usr/bin/python3.11 --version ran
    5.20 ± 2.65 times faster than .rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/rustc --version
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 26, 2024
@bjorn3
Copy link
Member

bjorn3 commented Feb 26, 2024

For me without strace it takes about 6ms, with strace it takes about 10ms. From the execve call up to prlimit64(0, RLIMIT_STACK, ...) (which is still before the main function executes) takes 9ms. After that is a tiny of of time initializing jemalloc. The time between the rust main function being called and the process exiting is less than 1ms total.

Python is only a 6.6MB executable with basically no dylib dependencies. Rustc on the other hand has 263MB worth of dynamic libraries which it needs to load outside of libc. Even just calling mprotect on the mapped dynamic libraries takes 5ms already.

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 26, 2024
@jyn514
Copy link
Member

jyn514 commented Feb 26, 2024

i wonder if it would be possible to dlopen LLVM at runtime so it can be delayed until codegen starts. then only the rustc_driver shared object has to be opened unconditionally (and maybe even that can be dlopen-ed if argument parsing moves to the rustc-main binary?)

@bjorn3
Copy link
Member

bjorn3 commented Feb 26, 2024

We used to dlopen librustc_codegen_llvm.so (to support separate LLVM versions for emscripten and for regular use, no longer necessary as emscripten now uses the upstream wasm backend rather than the asm.js fastcomp backend), but it was merged into librustc_driver.so for perf reasons.

@joshtriplett
Copy link
Member

The performance wins that #97154 would provide (if we could do that without breaking codegen backends) seem likely to help substantially with this. That might be worth revisiting.

@Doineann
Copy link

Maybe related to rustc --version doing more than it is supposed to do? #127649

@Kobzol
Copy link
Contributor

Kobzol commented Jul 12, 2024

Maybe related to rustc --version doing more than it is supposed to do? #127649

That is performed by the rustup wrapper, not by rustc directly, so that is not related to this issue.

@Doineann
Copy link

Yeah, I wasn't even really aware of the rustup wrapper behaving as a proxy in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants