-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve rustc wrapper startup time? #2626
Comments
@matthiaskrgr this is about the rustup wrapper, not the x.py rustc shim, right? If so I think this should be moved to https://github.com/rust-lang/rustup/. |
Ah, ~/.cargo/bin/rustc is rustup? Yes, the issue should be moved then. :) |
I can confirm that the difference in performance is around 6x:
|
Looking locally, ca. 15ms are before main() is entered (presumably due to dynamic link time for all the libraries rustup uses by dint of being linked to curl). It looks like about 25ms is taken parsing manifest files, sadly this is also done twice, and thus accounts for the vast majority of the extra time. The toml parser must be quite inefficient. It might be possible to knock one of those parse steps on the head since in theory we only need to parse once, but it'll involve quite a bit of refactoring. If someone wants to begin to look at this, then the place to start is the config.rs file, around line 620 which is |
To help triage this, how much does this contribute to typical build times? like, - sure we can put a chunk of time into making this snappy, and that would be good, but if a typical build is 1500ms ... ? |
@rbtcollins I wouldn't notice so much on builds, but this hurts things like
|
to addd to the list:
|
Comparing with C compilers is disingenuous since the number of rustc invocations for a similarly sized codebase will be many fewer. For example, rustup is around 130 .rs files, but including its dependencies the total rustc invocations is only 320 or so. If we assume a wasted 35ms (the repeated manifest parse) of CPU time on my computer per rustc invocation and we generously assume that all that waste is reflected in the wallclock compile time at the same parallelism that the build occurs at, then of the 83s it took my computer to build rustup just now at 6.1x parallel, then that's 11375ms wasted which at 6.1x parallel is 1.864s of wallclock time or 2.2% of all build time in a debug build. The same maths on a release build, 11375ms wasted for a total build time of 195s is only 0.95% of the total runtime of the build. Considering the however Having said all that, I do think it's worth correcting the double-load of the manifest since it's wasteful to do that anyway, and that shouldn't be too complex to do in a grungy way; or only medium hard to do nicely. What would be of even more use though is working out how it takes 35ms on a 4GHz CPU core to parse the manifest, that's utterly abysmal. |
Further investigation yields: The majority of the time spent parsing the manifest is in the toml load itself, i.e. before we apply any semantic understanding to the manifest. The channel is around 700k and will only get larger over time. Another possible improvement to do would be to trim out anything not relevant to the toolchain being installed when fetching the manifest. That ought not to be too complex to do and would speed up future operations since the manifest would be significantly smaller |
I wrote a simple trimmer for the manifests to reduce the toml quantity in installed toolchains. Doing that results in the following (stable is untrimmed, beta is trimmed):
(for reference, directly invoking rustc (stable or beta) without rustup in the way is ca. 11ms. |
@matthiaskrgr Assuming #2627 goes CI-green, you might want to try the binary from there and see how it affects performance for you. |
Well, I somewhat disagree: https://danluu.com/keyboard-latency/#humans-can-t-notice-100ms-or-200ms-latency. But the numbers after your change are a lot better :) |
Well, I made a small ICE-finder that runs rustc on all the files inside the rustc repo. Using |
Running it serially, individually? Not terribly surprising. I'm totally fine with rustup being faster, but development of rust itself isn't our primary use case: shipping rust toolchains to our users is, and running rustc in such a non idiomatic fashion is very much a specialised, developer of rust, need. To me that says, if someone wants to make this faster, great, but we're unlikely to steer folk to this as a priority issue, unlike eg the corruption issues, locking issues, etc. |
If folk are running error explains in their game update loops - the context of that blog post - we have a whole new set of requirements to feed into rustup design. Fast is good, and it is a feature. But language servers and dev environments can avoid making the user wait for the invocation of the compiler in various ways; and should simply because remote dev environments pay latency in many ways, so having latency hiding techniques in play is just good design. |
When I was playing with my dodgy PR above, I ran into the fact that |
Manifest parsing aside, would it potentially make sense for the top-level rustup-enabled cargo invocation to determine the toolchain and then set the |
I just tested this approach building I think that'd be a worthwhile optimization. |
Is setting One concern with this, rather than simply improving the startup performance of rustup, is how this might interact with how rustup sets up fallbacks for incomplete linked toolchains (i.e. as used in cargo/rustc development flows). If we can be sure that the fallback approach won't be broken by this, or that we can somehow detect when we'd invoke fallback case and not set the variables in that context, then it ought to be plausible. |
I think it'd be reasonable to always set
Right. See https://doc.rust-lang.org/cargo/reference/environment-variables.html ; cargo will invoke
I think we can handle that case by just not setting |
Is it worth reconsidering rustup as a chimera binary, given just how much stuff gets linked that isn't relevant to Alternatively, if a chimera is still desirable, splitting most of rustup into a dynamic library that's only loaded after we've checked if we're a proxy binary. |
Yes, worth considering. I have a project (#2441) to make proxies safe on windows as well which is related; that said, how much warm startup-time impact does linking those libraries have today? Cold time is irrelevant, since a single process invocation is trivially amortised over a full build. There is some complexity consider in having a separate binary though such as distribution and updating. I certainly don't have time to engage with it at the moment, though if someone else wanted to take it on I do have a few ideas related to #2441 that would be worth considering. |
IME having profiled it a bit, the majority of the startup time of the proxies, by a LONG margin, is the parsing of the channel toml. Hence I played with #2627 though it's the wrong solution, it's an approach to consider. |
@kinnison it may be different on Windows |
@rbtcollins good point, I suppose I'd best test there too. |
I was poking around recently when I felt that I'd expect that ideally when a toolchain is already installed and such the Cargo itself has already landed a change to avoid rustup wrappers entirely because of the large performance boost to the test suite, and I suspect that users in general can tell as well with 100= ms added to each command invocation. |
Switching away from toml to bincode basically made the manifest decode drop to be in the noise. sub 1ms. Not loading the manifest at all would mean we'd need to introduce lazy manifest loading because we do manifest/component reconciliation during proxy startup in case a |
I still think it would be valuable to export the appropriate environment variables so that invocations of rustc underneath a rustup'd cargo don't need to re-do the parsing and selection logic, even if we can substantially speed up that logic by switching to bincode. |
This is an issue for uv, too. We've been asked to include the output of The benchmark runs from my user home, and i've include python as well as node with volta shim and without for comparison.
8ms without rustup shim still seems slow, should i file a separate ticket for that in rust-lang/rust? |
@konstin Yes, please do file a ticket for that in rust-lang/rust. And please give the details about bypassing the wrapper and invoking rustc directly, so that anyone trying to reproduce it can easily do so without getting sidetracked by the wrapper. |
rust-lang/rust issue: rust-lang/rust#121631 |
time ~/.rustup/toolchains/master/bin/rustc >& /dev/null
0,040 total
time ~/.cargo/bin/rustc >& /dev/null
0,272 total
That's almost 7x slower.
I wonder if there is some something we can do to get these numbers a bit closer?
The text was updated successfully, but these errors were encountered: