Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android NDK, no Link-Time Optimization? #13287

Open
cpsauer opened this issue Apr 1, 2021 · 20 comments
Open

Android NDK, no Link-Time Optimization? #13287

cpsauer opened this issue Apr 1, 2021 · 20 comments
Labels
area-EngProd Bazel CI, infrastructure, bootstrapping, release, and distribution tooling P2 We'll consider working on this in future. (Assignee optional) team-Android Issues for Android team type: bug

Comments

@cpsauer
Copy link
Contributor

cpsauer commented Apr 1, 2021

Hi awesome Bazel folks,

I was debugging an NDK build and looking at the link commands being run. I happened to notice that dbg and opt builds are getting the exact same set of linker flags--no -flto (or -O#) flags in either case. That made me wonder if Bazel is accidentally omitting link-time optimization (LTO) in optimized builds.

[I'm assume Bazel (a) intends to have optimized (opt) builds run with LTO, and (b) that those flags would be show up in the link command dumped in --verbose_failures. Apologies if I've missed something somewhere and this is a false alarm, but then maybe we should clarify the docs and have consistency across platforms.]

Easiest way to repro should be the Bazel NDK example. (I'm using NDK21, macOS 11, Bazel 4, if that might matter.) I was looking at it by building with an invalid linker flag and --verbose_failures.

Thanks for taking a peek,
Chris

@aiuto aiuto added area-EngProd Bazel CI, infrastructure, bootstrapping, release, and distribution tooling team-Android Issues for Android team untriaged labels Apr 1, 2021
@ahumesky ahumesky added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels May 12, 2021
@cpsauer
Copy link
Contributor Author

cpsauer commented Feb 18, 2022

Hey, so to clarify: This really is not a support question--I know how to enable LTO/stripping and had already done so.

Instead it's either a defaults bug or a doc request depending on Bazel's intent for --compilation_mode=opt.

opt meant to be shorthand for (fully) optimized binaries? -> Missing LTO flags when building with the Android NDK (among several other important missing flags, see #10837)

opt indeed meant to be per-translation-unit-optimization only? -> Docs should probably say that, since other platforms have interpreted it the other way.

@jgavris
Copy link
Contributor

jgavris commented Mar 7, 2022

ThinLTO seems like a reasonable default, not much build time overhead on top of -O2 unless you are also including debug symbols. And in many benchmarks (but not on average), it performs better at runtime than regular LTO.

@cpsauer
Copy link
Contributor Author

cpsauer commented Mar 9, 2022

Good call on ThinLTO :)

@jgavris
Copy link
Contributor

jgavris commented Mar 9, 2022

@cpsauer Thanks. I think it was intended to be opt-in using --features=thin_lto. As far as I can tell on Bazel 4.2.2 using NDK 19 (and macOS desktop builds), that doesn't pass -flto=thin when building (ex: passing --features=thin_lto doesn't affect previously cached builds without it, also looking at the compile commands with -s). So support for it in general may have broader scope / require another issue. Right now we're just tossing in this into a .bazelrc.release config file.

build --copt -flto=thin
build --linkopt -flto=thin

@keith
Copy link
Member

keith commented Mar 9, 2022

The android crosstool doesn't actually have the thin_lto feature, and therefore doesn't support it. I think this is something we should add, similar to how it works in the unix toolchain (assuming the NDK's tools support the same flags)

@jgavris
Copy link
Contributor

jgavris commented Mar 10, 2022

@keith @cpsauer should we add it / can we as part of the defaults for the opt config for the NDK? Same for iOS and macOS I believe, last time I peeked into the compile commands. ThinLTO on Clang has been around for years.

@keith
Copy link
Member

keith commented Mar 10, 2022

We were looking at this yesterday and found that if you're trying to use -Os you couldn't actually use LTO as well on Android until NDK 22 android/ndk#721 so I guess we're blocked on #12889 for that part.

As far as opt-in vs opt-out, I imagine folks would want us to mirror the current behavior where it is supported, but I'm not sure, either way I think adding it to make it as easy as flipping a feature for all platforms would be great.

@cpsauer
Copy link
Contributor Author

cpsauer commented Mar 11, 2022

Keith with a blazingly fast, good crossref, as usual :) There's an interaction I didn't know about.

@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked android/ndk#721?

Asking bc we'd been running -flto and now -flto=thin for a while now on opt builds, but hadn't seen any problems.
(Switched to -flto=thin as soon as @jgavris tipped me off that NDK r21's (non-LLVM) linker supported it.)

@cpsauer
Copy link
Contributor Author

cpsauer commented Mar 11, 2022

IMO, we should really turn on thin LTO by default for opt, if we can.

Lots of users will never get deeper than requesting an optimized build. And I think they'd be surprised that opt didn't turn on link time optimization by default, since it's a key optimization lever. And with ThinLTO, it seems like there are even fewer downsides.

@jgavris
Copy link
Contributor

jgavris commented Mar 13, 2022

It's probably more likely to get through review and testing to just stick with implementing --features=thin_lto (opt-in)? opt hasn't had fat LTO by default even though it's been around for a long time because I'm guessing because using fat LTO at Google scale was not feasible (ThinLTO was primarily motivated by developing LLVM / Clang itself!).

@jgavris
Copy link
Contributor

jgavris commented Mar 13, 2022

@cpsauer you are also passing -fuse-ld=lld to linkopts on NDK 19? It is not quite yet the default linker.

https://developer.android.com/ndk/downloads/revision_history?authuser=1

Developers should begin testing their apps with LLD. AOSP has switched to using LLD by default and the NDK will use it by default in the next release. BFD and Gold will be removed once LLD has been through a release cycle with no major unresolved issues (estimated r21). Test LLD in your app by passing -fuse-ld=lld when linking. Note: lld does not currently support compressed symbols on Windows. Issue 888. Clang also cannot generate compressed symbols on Windows, but this can be a problem when using artifacts built from Darwin or Linux.

@keith
Copy link
Member

keith commented Mar 14, 2022

@keith, could I ask what breakage you saw? Same fatal linker error as in the issue you linked android/ndk#721?

Yes this one.

Asking bc we'd been running -flto and now -flto=thin for a while now on opt builds, but hadn't seen any problems.

What optimization level are you using? we're explicitly passing -Osize in this case, which results in a smaller dylib in our case than if you use a compatible -O and LTO in my testing.

@keith
Copy link
Member

keith commented Mar 14, 2022

Here's some thin_lto setup #15039 for Android, still opt-in

@cpsauer
Copy link
Contributor Author

cpsauer commented Mar 30, 2022

Woo! Thanks for doing, Keith.

Keith, re -Os & linker errors:

Aha! I realized how we'd been talking past each other. Heretofore, I'd just been specifying --compilation_mode=opt alongside the LTO options, which (from aquery) was creating a mix of -Os and -O2 when compiling. After some of monkeying around trying to induce that error by specifying --copt=-Os, I realized that you must have instead been referring to -linkopt=-Os. (And -Osizeis a typo, right? Since that didn't seem to be recognized by either.) -linkopt=-Os did indeed induce that same error, and additionally passing -fuse-ld=lld just changed things over to an equivalent error from lld.

FWIW, I'm getting significantly smaller .so's with LTO (thin or full) than I do linking with -Os. Like 25% smaller or so. (Differs, of course by the characteristics of the call graph!)

@keith, do you understand how (or if) the linker picks up optimization flags that were passed during compilation?
(Asking because it seems like the output binary is the same with or without the link line containing -O2, the optimization level used to compile. But it differs if other optimizations levels are passed when linking.)

Jason,

I've been using NDK 21 (as the latest supported by Bazel), which seems to use LLVM-plugin-enabled gold. No -fuse-ld=lld originally, but I just tested it and it does work. I think I'll leave it on, since that's the default in newer versions.

Is there something backpinning you to NDK 19? (I'm noticing references to NDK 19 in lots of your comments.)

[Sorry guys for being a little slow one the reply this time around.]

@keith
Copy link
Member

keith commented Mar 31, 2022

do you understand how (or if) the linker picks up optimization flags that were passed during compilation?

Interesting find! I would assume that bazel wouldn't treat them special at all, so if you didn't explicitly pass --linkopt=-Os you'd avoid it which I assume would be fine 🤔

@jgavris
Copy link
Contributor

jgavris commented Apr 5, 2022

@cpsauer at the time I wrote this, we were still on Bazel 4.2.2 (I don't think 5.0 had officially released yet). 4.2.2 only supported up to r19c.

@cpsauer
Copy link
Contributor Author

cpsauer commented Apr 5, 2022

@keith: Pretty wild, right! I feel like I read something about this long ago, but can't find doc on it for the life of me. If you do ever run into it, I'd love it if you'd say something!

@jgavris: Makes sense. We've been using Bazelisk to stay at or near HEAD with rolling releases. It's been pretty sweet. Huge thanks to Keith for (among many things) helping the rolling story get working with mobile.

@cpsauer
Copy link
Contributor Author

cpsauer commented Apr 5, 2022

@gregestren, sorry to add one more thing to your plate, but could we remove the support tag? This really isn't.

@cpsauer
Copy link
Contributor Author

cpsauer commented Sep 4, 2022

Update: The thin_lto feature is supported by the Google NDK toolchains, which Alex released as a replacement for the old, built-in ones that had ben falling behind! https://github.com/bazelbuild/rules_android_ndk

I'm leaving this issue open because I really do want to propose that optimized builds ought to have (thin) link time optimization on by default.

@keith, I think the new toolchain should also solve the -O# linking issue you mentioned above, by virtue of being NDK >=r22! Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.

@keith
Copy link
Member

keith commented Sep 6, 2022

Do you know offhand if thin_lto is already supported by the Apple toolchains? I didn't see it on a quick pass, but I could have missed it.

don't think so, although just passing --copt=-flto=thin does mostly work from my testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-EngProd Bazel CI, infrastructure, bootstrapping, release, and distribution tooling P2 We'll consider working on this in future. (Assignee optional) team-Android Issues for Android team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants