-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark performance doesn't only depend on rustc #239
Comments
In theory though optimizations in this area (proc macro crates) are orthogonal to rustc optimizations -- that is, we don't actually have to change anything to measure rustc wins. Having said that, it may be worth talking to people who maintain proc macro crates (@dtolnay @alexcrichton are two who I know of) who may have ideas of how to better integrate those benchmarks into rustc-perf. It's possible this is a case where we're ultimately measuring "runtime" performance instead of compiler performance, something that's not really supported by perf.r-l.o today. Long-term though I could see us utilizing derive(Serialize)'s underlying calls to quote as a benchmark for quote that we'd then track over time. Once again that should stay steady unless we make changes to MIR optimizations or updates to LLVM/codegen. I guess the summary here is that this is true -- but also seems "irrelevant" sort of to the current suite: optimizations in this area can probably be made without knowledge of, or touching, the compiler itself. |
AIUI, rustc-perf is measuring the performance of (a) whatever version of rustc you run it with, combined with (b) whatever procedural macro crate versions are specified by the Perhaps I'm misunderstanding something, but it doesn't seem "irrelevant" to me. Users care about compile times; non-rustc components contribute to those times; rustc-perf has shortcomings in its measurement of those components. |
Well, perhaps irrelevant was the wrong wording: what I mean is that we have no good way of measuring changes to quote and other derive infrastructure, and maybe we shouldn't be anyway: it seems like quote could add benchmarks to itself and then optimize/iterate on those. Similarly, none of these "optimizations" are going to be easily representable on our existing graphs, as they are likely rustc version agnostic. |
I see perf.r-l.o as basically exclusively for the compiler in the sense that performance of crates.io crates, while important, aren't necesarily important for our own perf tracking. We can't land changes to rustc to speed up To be clear I think it's very important to actually optimize things like |
I'm worried about the potential for wasting developers' time. Imagine somebody new comes along and starts profiling rustc via the rustc-perf benchmarks. They might do what I did, and discover that (Similarly, every time I look at a profile for |
We could periodically run cargo update across benchmarks, but it seems to me that keeping benchmarks unchanging and focused solely on rustc performance is better for regression tracking, to an extent. That is, currently I can look at a graph of rustc performance over time and "trust" it, whereas if we updated crate versions I'd have to rely somewhat on intuition -- is that change because the ecosystem updated, or because rustc is faster? I think the distinction here is that perf.r-l.o today is largely focused on tracking compiler performance over time, not ecosystem performance. However, maybe that's the wrong approach to take. I'm not sure which is more valuable. As a corollary, we have regex at I think the question here is whether you'd find it more helpful if we tracked the ecosystem or just the compiler. AIUI, you're suggesting that tracking the ecosystem would be better: is that an accurate assessment of your beliefs? |
I want to track whatever best matches the experience of users. The tricky thing with ecosystem performance is that there's a small number of crates (proc macro ones, basically) that have a significant effect. They're probably the only ones that matter. Doing a full cargo update feels wrong... but maybe it's an accurate reflection of a user's experience? If a user freshly compiles a revision of Cargo that is two years old they will get current versions of external crates, rather than two year old crates, right? (Assuming crate versions haven't been pinned.) But for the benchmarks that are live projects, if we do a cargo update, the benchmark is effectively changing, and so seems like we might as well update the benchmark code itself to the latest version too. Hmm. It's a weird situation: for each benchmark that uses proc macros, part of the benchmark itself effectively constitutes part of the compiler; freezing the benchmark effectively freezes part of the compiler. |
We could just update proc macro dependencies... that might be the best of both worlds, and shouldn't be too hard. What do you think about that approach? |
If there is a clear definition of "proc macro dependencies", then that seems reasonable? But consider this comment:
Sounds like there is a lot of ad hoc customization going on, which might make it hard to judge which dependencies should be updated? |
Well, we'd decide by way of looking at the crates whose type is proc-macro and then recursively examining their dependencies. This shouldn't be too hard, and overlap I'd presume would be fairly minimal (and not in the crate we're measuring, anyway). |
Ok, that sounds reasonable. |
I'm less concerned about this now that we have a policy to regularly update benchmarks. |
I've been looking at the compile times for
cargo
where procedural macro handling takes a significant fraction of the time in some of the runs. For example, see dtolnay/proc-macro2#94, which has info on two changes that speed things up.Those changes are not to rustc itself, but to external crates (
quote
andproc-macro2
) involved in procedural macro generation. If/when these changes land, rustc-perf's results won't change unless we update the versions of those crates used within rustc-perf (and possibly the crates that depend on them; the dependencies can be a couple of levels deep).I'm not sure how to handle this in general. We might be able to muddle through with significant crate version updates when we know about them. But there is a gap in our coverage. For example, if a crate used by procedural macros (such as
quote
) introduced a change that caused a large compile-time regression, users would experience that regression but rustc-perf would not.The text was updated successfully, but these errors were encountered: