Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable LTO/remove unused symbols when compiling BIOS/firmware #682

Open
DurandA opened this issue Oct 21, 2020 · 32 comments
Open

Enable LTO/remove unused symbols when compiling BIOS/firmware #682

DurandA opened this issue Oct 21, 2020 · 32 comments
Assignees

Comments

@DurandA
Copy link
Contributor

DurandA commented Oct 21, 2020

The current GCC build (I did not try Clang) for BIOS/firmware does not strip away unused symbols, resulting in large binaries (especially when adding dependencies).

I tried adding various flags in common.mak according to answers on Stack Overflow but without much success. It either resulted in no effect or the build failed. With -flto I was not able to fix a plugin needed to handle lto object error.

@mithro
Copy link
Collaborator

mithro commented Oct 21, 2020

@DurandA -- There was some discussion about enabling link time option.

@mithro
Copy link
Collaborator

mithro commented Oct 21, 2020

See #401

@mithro
Copy link
Collaborator

mithro commented Oct 21, 2020

#253

@DurandA
Copy link
Contributor Author

DurandA commented Oct 21, 2020

@mithro Thanks for pointing it and sorry for the redundancy.

LTO support was originally merged but 979f98e reverted it. At least we have an example on how to enable it.

@DurandA DurandA closed this as completed Oct 21, 2020
@mithro
Copy link
Collaborator

mithro commented Oct 22, 2020

@DurandA - We most certainly want to get LTO enabled again. Maybe you could work with @mateusz-holenko to figure out what is needed to done to do that?

@DurandA
Copy link
Contributor Author

DurandA commented Oct 22, 2020

I am now into the realms of a GCC bug (riscvarchive/riscv-gcc#201) so that might not happen anytime soon. Note that precompiled GCC from SiFive (both version gcc-8.3.0-2019.08.0-x86_64 and gcc-8.3.0-2020.04.1) triggered another internal compiler error from the lto-wrapper.

@mithro
Copy link
Collaborator

mithro commented Oct 22, 2020

I'm going to reopen this issue so that @mateusz-holenko can give us an updated on the status around LTO and how to move forward here.

@mithro mithro reopened this Oct 22, 2020
@mateusz-holenko
Copy link
Collaborator

I guess that linked tasks/PRs provide all the status - we didn't work on the LTO issue lately.
As described in #253, using it with optimisations caused problems back in the past. We were able to find a combination of switches that worked*, but it turned out not to be a generic solution (* as it broke some other platforms). Maybe a newer toolchain might help here?

Another problem was that LiteX lacked good testing suite (is it better now?). Changes like globally enabling LTO influence a lot of targets and it was quite hard to make sure that they don't break anything :( Perhaps we should think about extending LiteX CI and add more test cases first?

@DurandA
Copy link
Contributor Author

DurandA commented Oct 26, 2020

@mateusz-holenko Did you ever encounter the internal error of the lto-wrapper described in riscvarchive/riscv-gcc#201?

@mateusz-holenko
Copy link
Collaborator

@DurandA No, I don't recall that error.

Our problem was that the linker couldn't find one of the functions that was a part of the code - it looked like it was wronly optimized-out by the LTO mechanism (event though it was actually used).

@DurandA
Copy link
Contributor Author

DurandA commented Oct 28, 2020

@mateusz-holenko Did you ever encounter the internal error of the lto-wrapper described in riscv/riscv-gcc#201?

I forgot to add $(CPUFLAGS) to the linker and that caused the issue.

For reference, here is the current issue when compiling the bios with LTO:

riscv64-unknown-elf-gcc -std=gnu99 -nostdlib -nodefaultlibs -Os -mno-save-restore -march=rv32im     -mabi=ilp32 -D__picorv32__  -L/home/duranda/devel/build/software/include -T /home/duranda/devel/litex/litex/soc/software/bios/linker.ld -N -o bios.elf \
	../libbase/crt0.o \
	isr.o boot-helper.o boot.o helpers.o cmd_bios.o cmd_mem.o cmd_boot.o cmd_i2c.o cmd_spiflash.o cmd_litedram.o cmd_liteeth.o cmd_litesdcard.o main.o complete.o readline.o \
	-L../libcompiler_rt \
	-L../libbase \
	-L../liblitedram \
	-L../libliteeth \
	-L../liblitespi \
	-L../liblitesdcard \
	-llitedram -lliteeth -llitespi -llitesdcard -lbase-nofloat -lcompiler_rt
/home/duranda/riscv/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /tmp/bios.elf.OQpKqZ.ltrans0.ltrans.o: in function `.L0 ':
/home/duranda/devel/litex/litex/soc/software/bios/cmds/cmd_mem.c:197: undefined reference to `__udivdi3'
/home/duranda/riscv/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /tmp/bios.elf.OQpKqZ.ltrans0.ltrans.o:/home/duranda/devel/litex/litex/soc/software/bios/cmds/cmd_mem.c:209: undefined reference to `__udivdi3'
collect2: error: ld returned 1 exit status
make: *** [/home/duranda/devel/litex/litex/soc/software/bios/Makefile:64: bios.elf] Error 1

It seems that the linker is dropping some libgcc functions when using LTO.

@mithro
Copy link
Collaborator

mithro commented Oct 28, 2020

@DurandA From what I understand, the issue is either;

  • a call to __udivdi3 is being added during the LTO process (probably to calculate an offset or similar) after the LTO has checked if symbols are in use. Thus the LTO doesn't see the fact __udivdi3 is actually used and eliminates it from the binary?
  • a call to __udivdi3 is being added during the LTO process (probably to calculate an offset or similar) and our standard C library doesn't provide that symbol?

@DurandA DurandA changed the title Remove unused symbols when compiling BIOS/firmware Enable LTO/remove unused symbols when compiling BIOS/firmware Nov 2, 2020
@DurandA
Copy link
Contributor Author

DurandA commented Nov 3, 2020

Selectively disabling the LTO for lib runtime and libc (DurandA@f15bcce) seems to solve all issues. Apparently, GCC optimizes out functions that are not in source without caring that it inserted calls to these functions (1) for arithmetic operations on architecture that do not support it natively and (2) during optimization process. See riscvarchive/riscv-gcc#207 and riscv-collab/riscv-gnu-toolchain#758.

Before doing any PR and proceeding with extensive testing, further validation is necessary (according to 979f98e):

@mithro
Copy link
Collaborator

mithro commented Nov 3, 2020

FYI - @mateusz-holenko

@mithro
Copy link
Collaborator

mithro commented Nov 3, 2020

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

@DurandA
Copy link
Contributor Author

DurandA commented Nov 3, 2020

I was not able to install a proper LM32 toolchain to check for potential issues. I compiled gcc-9.3.0 with ./configure --target=lm32-elf --enable-languages=c --disable-libssp --disable-libgcc --disable-libquadmath but these flags are not recognized:

flags = "-mbarrel-shift-enabled "
flags += "-mmultiply-enabled "
flags += "-mdivide-enabled "
flags += "-msign-extend-enabled "

If I remove them, I get a bunch of no such instruction errors. What is the recommanded way to install a working LM32 toolchain?

@DurandA
Copy link
Contributor Author

DurandA commented Nov 3, 2020

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

I am not sure if this is possible. My current approach would be to split libc into several files (i.e. at least memset.c and libc.c for others). I didn't investigate yet if GCC can generate calls to something else than memset during optimization. For lib runtime, I am not sure if we can do something clean as what is used depends on the architecture.

@DurandA
Copy link
Contributor Author

DurandA commented Nov 3, 2020

What is the recommanded way to install a working LM32 toolchain?

@enjoy-digital Can you please point to the LM32 toolchain you are using? 🙂

@mithro
Copy link
Collaborator

mithro commented Nov 3, 2020

We (@timvideos / litex-buildenv, @antmicro and @SymbiFlow) use the toolchain from https://github.com/litex-hub/litex-conda-compilers/tree/master

@ewenmcneill
Copy link
Contributor

ewenmcneill commented Nov 5, 2020

@DurandA -- We may be able to mark specific functions in lib runtime / libc as used (even though not referenced)?

I am not sure if this is possible.

Some (old, because that's what the search engine found first) GCC documentation implies that the externally_visible symbol attribute will prevent it being discarded during LTO. And this seems to be a fairly common work around for symbols not properly detected as "required", eg, https://stackoverflow.com/questions/46304742/how-to-combine-lto-with-symbol-versioning/54045851#54045851 and arduino/Arduino#660. Some people have specified the "used" attribute instead, which I guess has a similar effect.

The LLVM design document for LTO impiles it also looks for "externally visible" symbols, but I'm not sure if it respects the same attribute.

Hopefully there'd be relatively few such symbols that need marking like that. (The LM32 issues I had I think were more to do with Makefile complications that sort of enabled LTO but sort of didn't, for LM32, which then caused other things not to build due to confusion over compile/link flags; that part seems easier to resolve, even with just a "no LTO" off switch to try when there are compile/link problems.)

Ewen

@DurandA
Copy link
Contributor Author

DurandA commented Nov 5, 2020

Some (old, because that's what the search engine found first) GCC documentation implies that the externally_visible symbol attribute will prevent it being discarded during LTO.

@ewenmcneill Thanks for mentioning it. Unfortunately, if I remove the -fno-lto flag for libc.c and add __attribute__((externally_visible)) to memset, then some undefined reference to `__mulsi3' are thrown, undepending on if I add __attribute__((externally_visible)) to __mulsi3 or if mulsi3.c is compiled with -fno-lto.

@DurandA
Copy link
Contributor Author

DurandA commented Nov 5, 2020

I created a minimal repo with LTO to experiment with LTO behaviour and to be able to discuss about this with external people.

@enjoy-digital Why did you add mulsi3.c to the libcompiler_rt dir in 17f6cb1 and not to the pythondata-software-compiler_rt package? I am also looking for a minimal code snippet that will force GCC to use it on a rv32i target.

@enjoy-digital
Copy link
Owner

@DurandA: thanks for looking at this. Before using pythondata-software-compiler_rt we were using an external compiler_rt as a submodule (so were to avoid forking this local modification were made to libcompiler_rt). Avoiding this local mulsi3.c would indeed be better. I'll try to have a closer look at this in the next days.

@ewenmcneill
Copy link
Contributor

then some undefined reference to `__mulsi3' are thrown

:-( It sounds like the "compiler replacements for missing instructions" (which that looks like -- signed multiply for integers), are at the biggest risk of LTO not realising they're needed until it's too late. Your approach of creating a minimal example to experiment with that particular bit seems like a good approach. Thanks for tackling it. Good luck!

Ewen

@mithro
Copy link
Collaborator

mithro commented Nov 8, 2020

@ewenmcneill We really should be filing bugs against gcc / clang for this failure mode.

@ewenmcneill
Copy link
Contributor

We really should be filing bugs against gcc / clang for this failure mode.

If someone comes up with a minimal reproducible example that shows that the compiler-replacements of mul/div/etc are not surviving LTO optimisation (ideally with glibc), on a particular supported CPU architecture (it seems like RV32I might be the best modern option without MUL/DIV) then I agree it'd be worth reporting upstream.

I did wonder if the GNU (GCC/Glibc) versions of MUL/DIV replacements were being marked in some different manner that made them survive LTO, but that doesn't seem to be the case (eg, https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/microblaze/mulsi3.S, https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/epiphany/mulsi3.c to pick random examples searching found). And maybe that's part of the problem....

Ewen

PS: For future reference, possible symbol attributes in Clang; it seems that "externally visible" isn't a supported Clang attribute, nor can I see "used". So I'm unclear how one tweaks functions to be retained over LTO in Clang :-(

@DurandA
Copy link
Contributor Author

DurandA commented Nov 9, 2020

Here is a minimal example that will trigger both undefined reference to `memset' and undefined reference to `__mulsi3'.

A few remarks about it:

  • Not only memset can be inserted by the compiler but also memcpy, memmove and possibly more.
  • Here __mulsi3 is inserted because I am performing an integer multiplication on RV32I but GCC exhibit the same behaviour for other operations (floating point, ...) which are not handled natively.
  • The occurrence of these errors largely depends on the optimization level. This is because these functions will typically be inlined instead of being called. I found that these errors are triggered more often with -Os which makes sense.
  • In this minimal example, using __attribute__((externally_visible)) on memset and __mulsi3 prevents throwing away these functions. This doesn't seems to work with the LiteX bios. This might have to do with LTO partitioning but I didn't investigate on it yet.

This example shows the behaviour of GCC pretty clearly: calls to libc or libgcc are not marked to be retained by the LTO. This is an issue when using -fno-builtin in conjunction with -flto. I think we should discuss about this on the GCC mailing list. Since RISC-V is not upstreamed yet, we might want to convert the example to a different platform (ARM?).

As to what we can do about it, I have a branch with LTO enabled that sets -fno-lto for (1) libc.c and (2) everything in libcompiler_rt. For (1) I can split libc.c in two files or more with only memset, memcpy, memmove, ... (should investigate on the full list) being compiled with -fno-lto. For (2) I think it would be a real mess to conditionally compile with -fno-lto according to the target so I would suggest to use -fno-lto for everything even if that makes larger binaries. We might want to measure the binary size increase due to this.

@mithro @enjoy-digital Are you interested in a PR enabling LTO with the above changes? If yes, please tell me how do you want to split libc and I will make a draft PR. I will need some help to properly test this. I tested these changes on a few platforms that caused problems in previous revisions but not with litex-buildenv. Also I did not try using LTO with LLVM/Clang yet.

@ewenmcneill
Copy link
Contributor

Since RISC-V is not upstreamed yet, we might want to convert the example to a different platform (ARM?).

As far as I can remember/find, the ARM instruction set almost always includes at least some MUL (eg, 32x32) (since ARMv2 AFAICT), so it might not be the best choice for an upstream demo. (it looks like even Thumb normally includes a shorter MUL). If we could induce an implicit DIV, and have the same LTO issues, then ARM might work (as DIV is much less commonly implemented in ARM instruction sets than MUL).

It looks like MUL is optional in LM32 (Lattice Mico 32), and there seems to be a gcc flag for LM32 to indicate whether multiply instructions can be generated. As far as I can tell LM32 is still upstream in gcc. So maybe LM32 is still a better alternative? I'd still state that it was an issue found with RISC V first though, so they don't dismiss it as "just LM32"; RV32I is probably going to be a common "tiny CPU' target for the next 10 years. (FYI, from memory there's a gcc LM32 cross compiler in the same litex-hub compiler builds as the RISC V one; LM32 was the small soft CPU we were using for the RISC V soft CPU implementations became "production ready".)

Well done on all the other discoveries. It makes sense that the functions being inlined mostly hiding this LTO issue from the compiler/linker developers (and especially that they're mostly testing on CPU architectures which don't need MUL/DIV emulated :-) ).

Ewen

@DurandA
Copy link
Contributor Author

DurandA commented Feb 7, 2021

Sorry for the lack of updates. I still plan to create a PR once we decided what is the best strategy to solve this issue.

@gregdavill
Copy link
Contributor

Has this now been fixed with the introduction of picolibc? I see that -flto is a default enabled option with the current upstream code.

@enjoy-digital
Copy link
Owner

@gregdavill: Indeed, LTO has been enabled by default when integrating picolibc. We can now close this issue.

@enjoy-digital
Copy link
Owner

LTO still seems to cause random issue (ex #1259) so has been disabled by default with 3ab7eaa. This will have to be properly tested/investigated before being re-enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants