Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Enable ThinLTO #2548

Merged
merged 2 commits into from
Jun 6, 2023
Merged

chore: Enable ThinLTO #2548

merged 2 commits into from
Jun 6, 2023

Conversation

osiewicz
Copy link
Contributor

@osiewicz osiewicz commented Jun 1, 2023

Per conversation with Antonio, I've suggested enabling full LTO; right now we use a crate-local ThinLTO, which does not inline function calls across crates.

Configuration Current main (788f97e) Thin LTO Full LTO
Size in bytes 158806721 155868753 111115553
% of main size 100% 98.14% 69.96%
Size in bytes (no debug info) 129186657 127942929 108281345

@SomeoneToIgnore
Copy link
Contributor

I would really love to see compilation time differences (or cargo --timings files, ideally)

@SomeoneToIgnore
Copy link
Contributor

Another idea for the size reduction, that might not increase the build time a lot: cut off dependencies' debug symbols

[profile.release.package."*"]
debug = false

since we don't use big executor runtimes, we won't loose important the backtrace info this way?

@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 1, 2023

Yup, I can provide those (though that'll take a while). Full LTO took about 20 minutes for me (vs 3 minutes for ThinLTO), which is far from ideal; however, we could add a custom dist profile that'd run full LTO and leave release profile as is.

@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 1, 2023

Tbh I'm more interested in potential perf improvements, though size reduction is visible at glance

@SomeoneToIgnore
Copy link
Contributor

SomeoneToIgnore commented Jun 1, 2023

AFAIK, we neither have any benchmark runs nor any historical perf data to notice any perf benefits LTO brings, hence I'm discussing the sizes only.
I would not expect LTO to help anyhow notably with the perf, but would be glad to be proven wrong 🙂

@maxbrunsfeld
Copy link
Collaborator

You could just do an ad-hoc before-and-after comparison of some known expensive operation, like editing a multi-buffer with 1000 cursors, or something like that.

@maxbrunsfeld
Copy link
Collaborator

maxbrunsfeld commented Jun 1, 2023

A 20-minute LTO time, even if we only did it on CI for our releases, would definitely slow down our release process a bit. Do you have a sense of whether LTO time can be reduced by running on a machine with lots of CPU cores?

We could upgrade the machine we use to build the releases. MacStadium (the service we use for cloud-hosted macs) offers a Mac Studio (which has 20 CPU cores instead of 8).

@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 1, 2023

I will upload timings that @SomeoneToIgnore requested tomorrow. In private convo we also came up with comparing sizes of various LTO flavours + debug=false (to see how much of the size reduction comes from removing references to functions -> removing debug info).
I don't believe fat LTO benefits that much from more cores. :/ We'd have to first see what kind of gain we could expect.
The intent of this PR is more so to spark a discussion than to get it merged ASAP (setting our CI machines on fire in the meantime).

@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 1, 2023

Here are the timings.zip
Tl;dr:
Full LTO takes over 20 minutes,
Both NoLTO and ThinLTO take 4m20s.

Commands:
cargo clean followed by cargo build --timings --release

At the very least it may be worth a shot to try ThinLTO at some point in the future, as it aids cross-crate inlining and bunch of other optimisations. At the end of the day, features are more important right now so I guess we don't want to spend too much time on this?

@maxbrunsfeld
Copy link
Collaborator

maxbrunsfeld commented Jun 1, 2023

Thin LTO is the default, right?

Nevermind, I guess "thin local LTO" is the default, which is different from "thin LTO". I'm open to just switching to "thin LTO" if the compile-time cost is low.

Change Fat LTO to ThinLTO
@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 6, 2023

I believe that most of Full LTO's size reduction comes from stripping unused code -> stripping debug info (as the difference is size is not nearly as big when Zed is built with debug=false); another factor that might come into play is that as far as I know, Full LTO greatly aids devirtualization of virtual calls.

Anyways, we can come back to this later. As it stands, I'll merge ThinLTO variant.

Edit: we can also use multitargets to speed up bundle script.

@osiewicz osiewicz merged commit 572c59e into main Jun 6, 2023
@osiewicz osiewicz deleted the enable_lto branch June 6, 2023 19:50
@osiewicz osiewicz changed the title chore: Enable full LTO chore: Enable ThinLTO Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants