-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata is too big for its own good #21482
Comments
For your information, I had written a custom metadata decoder in Python for debugging #15309 (the table is a bit out of date, but still works), and out of curiosity I've also made some basic efforts to reduce the metadata overhead. The
|
Wow, those are some nice wins! I had no idea that existed! If we could implement some of those optimizations today that would be awesome. |
@alexcrichton Does any breaking modification to metadata need a new snapshot? I'm afraid if such modifications cannot be easily done incrementally. |
Thankfully you shouldn't need a snapshot, the stage N compiler conveniently only ever reads metadata generated by itself so there's no bootstrapping issues. |
Is this true? I was under the impression that:
|
@michaelwoerister Rustc uses LLVM's memory map abstraction to mmap the executable. LLVM itself does not use the metadata. |
@lifthrasiir Oh, that refers to this: https://github.com/rust-lang/rust/blob/master/src/librustc/metadata/loader.rs#L270. All clear now |
I'm currently working on two temporary but public branches:
Any suggestions or patches would be appreciated. |
This is a series of individual but correlated changes to the metadata format. The changes are significant enough that it (finally) bumps the metadata encoding version. In brief, they altogether reduce the total size of stage1 binaries by 27% (!!!!). Almost every low-hanging fruit has been considered and fixed; see the individual commits for details. Detailed library (not just metadata) size changes for x86_64-unknown-linux-gnu stage1 binaries (baseline being 3a96d6a): ```` before after delta path --------- --------- ------ -------------------------------- 1706146 1050412 38.4% liballoc-4e7c5e5c.rlib 398576 152454 61.8% libarena-4e7c5e5c.rlib 71441 56892 20.4% libarena-4e7c5e5c.so 14424754 5084102 64.8% libcollections-4e7c5e5c.rlib 39143186 14743118 62.3% libcore-4e7c5e5c.rlib 195574 188150 3.8% libflate-4e7c5e5c.rlib 153123 152603 0.3% libflate-4e7c5e5c.so 477152 215262 54.9% libfmt_macros-4e7c5e5c.rlib 77728 66601 14.3% libfmt_macros-4e7c5e5c.so 1216936 684104 43.8% libgetopts-4e7c5e5c.rlib 207846 181116 12.9% libgetopts-4e7c5e5c.so 349722 147530 57.8% libgraphviz-4e7c5e5c.rlib 60196 49197 18.3% libgraphviz-4e7c5e5c.so 729842 259906 64.4% liblibc-4e7c5e5c.rlib 349358 247014 29.3% liblog-4e7c5e5c.rlib 88878 83163 6.4% liblog-4e7c5e5c.so 1968508 732840 62.8% librand-4e7c5e5c.rlib 1968204 696326 64.6% librbml-4e7c5e5c.rlib 283207 206589 27.1% librbml-4e7c5e5c.so 72369394 46401230 35.9% librustc-4e7c5e5c.rlib 11941372 10498483 12.1% librustc-4e7c5e5c.so 2717894 1983272 27.0% librustc_back-4e7c5e5c.rlib 501900 464176 7.5% librustc_back-4e7c5e5c.so 15058 12588 16.4% librustc_bitflags-4e7c5e5c.rlib 4008268 2961912 26.1% librustc_borrowck-4e7c5e5c.rlib 837550 785633 6.2% librustc_borrowck-4e7c5e5c.so 6473348 6095470 5.8% librustc_driver-4e7c5e5c.rlib 1448785 1433945 1.0% librustc_driver-4e7c5e5c.so 95483688 94779704 0.7% librustc_llvm-4e7c5e5c.rlib 43516815 43487809 0.1% librustc_llvm-4e7c5e5c.so 938140 817236 12.9% librustc_privacy-4e7c5e5c.rlib 182653 176563 3.3% librustc_privacy-4e7c5e5c.so 4390288 3543284 19.3% librustc_resolve-4e7c5e5c.rlib 872981 831824 4.7% librustc_resolve-4e7c5e5c.so 1817642 14795426 18.6% librustc_trans-4e7c5e5c.rlib 3657354 3480026 4.8% librustc_trans-4e7c5e5c.so 16815076 13868862 17.5% librustc_typeck-4e7c5e5c.rlib 3274439 3123898 4.6% librustc_typeck-4e7c5e5c.so 21372308 14890582 30.3% librustdoc-4e7c5e5c.rlib 4501971 4172202 7.3% librustdoc-4e7c5e5c.so 8055028 2951044 63.4% libserialize-4e7c5e5c.rlib 958101 710016 25.9% libserialize-4e7c5e5c.so 30810208 15160648 50.8% libstd-4e7c5e5c.rlib 6819003 5967485 12.5% libstd-4e7c5e5c.so 58850950 31949594 45.7% libsyntax-4e7c5e5c.rlib 9060154 7882423 13.0% libsyntax-4e7c5e5c.so 1474310 1062102 28.0% libterm-4e7c5e5c.rlib 345577 323952 6.3% libterm-4e7c5e5c.so 2827854 1643056 41.9% libtest-4e7c5e5c.rlib 517811 452519 12.6% libtest-4e7c5e5c.so 2274106 1761240 22.6% libunicode-4e7c5e5c.rlib --------- --------- ------ -------------------------------- 499359187 363465583 27.2% total ```` Some notes: * Uncompressed metadata compacts very well. It is less visible for compressed metadata but still it achieves about 5~10% reduction. * *Every* commit is designed to reduce the metadata in one way. There is absolutely no negative impact associated to changes (that's why the table above doesn't contain a minus delta). * I've confirmed that this compiles through `make all`, making it almost correct. Other platforms have to be tested though. * Oh, I'll rebase this as soon as I have spare time, but I guess this needs an extensive review anyway. * I haven't rigorously checked the encoder and decoder performance. I tried to minimize the impact (some encodings are actually simpler than the original), but I'm not sure. Fixes #2743, #9303 (partially) and #21482.
@steveklabnik #2743 is fully fixed. I think I've said #9303 is fixed partially because it does not really fix the naming issue ("Rename it from |
Some updates pertaining to this issue:
|
#35764 has significantly reduced the size of metadata since
Before, it looked like this:
|
So, at what point is metadata small enough that this bug can be considered fixed? |
1 byte, tops |
This is a super old and much less relevant issue now, so closing. |
Couldn't find a previous issue on this, so I'd like to open a tracking issue for this. We've known this for a long time, but the metadata format for the compiler is far too large and there are surely methods to shrink its size and impact. Today when I compile
librustc
, I get the following numbers:librustc.rlib
- 64MBrustc.o
- 12MBrust.metadata.bin
- 32MBrustc.0.bytecode.deflate
- 21MBThis means that the metadata is three times as large as the code we're generating. Another statistic is that 36% of the binary data of the nightly is metadata.
There are, however, a number of competing concerns around metadata:
There are a few open issues on this already, but none of them are necessarily a silver bullet. Here's a smattering of wishlist ideas or various strategies.
More will likely be added to this over time as it's a metabug.
The text was updated successfully, but these errors were encountered: