-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect binary size statistics #145
Comments
We need a proposal as to what would be good to collect. Ideally, I think we'd have this statistic track the individual ".so" files we see in the unarchived directory. This would be a somewhat unique statistic since it wouldn't depend on the benchmarks directory, instead only looking at the sizes of the artifacts Rust ships. I'd appreciate advice on whether we can get more fine-grained details for the size (e.g., debuginfo KB, code size KB, etc.) easily on Linux. |
Yeah I was thinking we wouldn't necessarily do this for everything if we couldn't clearly map to an artifact, and we should only do it for "final artifacts" for sure (tracking rlibs wouldn't be too useful). I'd imagine something like:
|
I started on a collection of rust+wasm code size micro benchmarks: https://github.com/fitzgen/code-size-benchmarks Would be awesome to get this integrated here -- is this the kind of thing you were imagining @alexcrichton ? |
@fitzgen looks great! I'd probably throw in non-LTO builds as well for comparison, but otherwise that seems like a great starting point to me. It may even be possible to just set up a daily cron job running on Travis pushing the results somewhere? |
Should be feasible to integrate on a surface basis; I'll look into it sometime over the next couple weeks. |
Okay, so integration is primarily blocked on being able to install targets (i.e., rustup target add equivalent) for benchmarks. This is something we need to do anyway for winapi/windows benchmarks so I'll take a look at doing so soon. |
Can I help somehow? I'd love to see binary size statistics get tracked (and ultimately, optimized 🙂) |
There are a couple things that need doing here:
Order here doesn't matter too much, I think. For the targets, the first step would be to get the sysroot module capable of installing targets other than the default std. We can't rely on the target existing (due to try builds) so these would have to be optional installations. I don't think this is terribly hard, but does require some legwork. For Cargo, we'll probably want to try to get a representative artifact for each crate -- likely this'll be a heuristic but I suspect that's the best we can do (so/wasm, rlib, binary). I believe Cargo will dump the output artifacts from rustc if run with |
As mentioned in rust-lang/rust#82834 (comment), collecting the sizes for the output binaries of the benchmark - those which produce one at least - could be useful too, as a cheaper proxy measure for runtime benchmarks. If more time is spent in LLVM that would hopefully result in smaller binary sizes. That way we could make better decisions around whether compile time regressions can be justified with runtime (or footprint) improvements. |
I created #1348 as a first step into the direction of collecting binary (artifact) sizes for the existing benchmarks. I would like to eventually generalize Going further, I think that it would be nice to record the binary size of a few special benchmarks that are interesting for binary size. For starters, it could be just We will probably want to record the size of
For this to work, we need to:
|
Categories are currently mutually exclusive. Would that still be true with this suggestion? I agree that these measurements could be useful, and that it's hard to decide what to collect and how to collect them.
Perhaps add some new optional scenarios: FullPanicAbort, FullStripTrue, FullOptLevelZ, FullLtoTrue, FullCodegenUnits1, FullNoStd, FullAll? And then use the |
I also think scenarios are a good fit here, pretty much along the same lines that @nnethercote suggests. I'm not sure I buy the argument that enums make things harder here -- I think it's good we need to think about each variant if we're matching on the enum; if we can handle it opaquely, then I would expect us to be able to still do that (e.g., by just passing it along, or rendering it to a String for the UI, etc.). |
Scenarios currently have some interactions between each other, e.g. Full has to be completed before the Incr ones can be started, IncrPatched performs a patching operation during the benchmark. So they don't serve only to change the compilation flags, but they also affect how is the benchmark performed. It seems to me that the mentioned compile options are slightly orthogonal to that, I could want to benchmark how PanicAbort works with We need to do several things with the compilation variants:
We can use an explicit enum variants, like The more I'm thinking about this, the more it seems to me that compile time flags like LTO, panic=abort, CGU=1 etc. which do not affect the benchmark process itself (unlike incr. and patching), should be orthogonal to the other options. What about creating a new DB column and an enum like Of course debug/opt is also a compile time flag in a way, but it's special enough (isn't normally passed through
Incremental compilation artifact sizes are probably also quite interesting. |
I'm having trouble following why LTO is any different from incremental in terms of affecting compilation - both of these are going to increase compile times (or not), have a different memory profile, produce a different binary output... I agree in some sense that if we end up with a complete cross matrix of the current scenarios and the new options, storing them separately could make sense. But it also seems to me that for ease of implementation, we can do so purely at the code level (perhaps only in the collector) and serialize into the database as just a scenario. I'm not convinced we have a good semantic for the extra flags column that pushes it to be distinct from scenarios. |
The difference is that incr. + patching requires changes to the code that runs the benchmark, while LTO/PanicAbort/etc. doesn't, we just use different flags. But it's not so important.
I agree! Storing compilation flags into DB/UI as "just another scenario string", but treating them as a different axis that can be set on the collector level sounds like a good compromise. |
This has now been implemented in #1348, as a first step. |
We now record both the size of generated artifacts and the binary size of the compiler (and its associated libraries) itself. And we can also distinguish between library/binary artifacts, to get more context about the sizes. So I think that this can be closed (more specialized issues can be created for concrete use-cases). |
Would be perhaps an interesting metric!
The text was updated successfully, but these errors were encountered: