-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STM32 embedded debug binaries much larger with 0.14.0-dev.2851+b074fb7dd #22603
Comments
@marnix can you tell me a target triple that replicates the behavior? |
Just taking a guess that the right target/cpu is |
I will say that a 16K |
@dweiller Sorry I didn't answer this directly.
Almost, |
I know nothing about compiler-rt nor about LLVM optimizations, but with the same Zig 0.14.0-dev.2851+b074fb7dd, here is the size of
|
For comparison, it would be nice to see the same table for 0.13.0. |
Now with Zig 0.13.0, here is the size of
Just for fun, here are those last 54 bytes:
|
@dweiller Yes, I think that my minimalistic project only uses Ah, you're saying that your 18fcb3b for #18912 has improved the size for I will test once build 2882 or later lands on the Download page. Thanks! |
This is an easy fix. If the generic implementation under -OReleaseSmall does not generate the optimal machine code, then put a conditional compilation branch that checks the target and uses inline assembly. Just to make sure I understand, however, is it the case that LLVM is lowering this for (0..len) |i| {
dest.?[i] = src.?[i];
} to a 16K implementation? sounds like the stm32 backend of llvm is not great.
These symbols should be deleted by the linker if they are unused. Does the project use C++ (I sure hope not)? Looks to me like -OReleaseSmall memcpy in this target is indeed very small: https://zig.godbolt.org/z/337xxa9ze here you can see the rules for selecting compiler_rt optimization mode: Lines 6794 to 6798 in 015a5e1
Sounds like the project should be using -ORelaseSmall, not -OReleaseFast. |
A minimal reproduction with output: export fn _start(noalias dest: ?[*]u8, noalias src: ?[*]const u8, len: usize) callconv(.C) void {
@memcpy(dest.?[0..len], src.?[0..len]);
} compiled with [felix@xqwork tmp]$ ./zig-linux-x86_64-0.14.0-dev.2851+b074fb7dd/zig build-exe -OReleaseSmall -target thumb-freestanding-eabihf -mcpu cortex_m4+vfp4d16sp memcpy.zig -fno-strip
[felix@xqwork tmp]$ llvm-objdump memcoy -hpt
memcoy: file format elf32-littlearm
Program Header:
PHDR off 0x00000034 vaddr 0x00010034 paddr 0x00010034 align 2**2
filesz 0x000000a0 memsz 0x000000a0 flags r--
LOAD off 0x00000000 vaddr 0x00010000 paddr 0x00010000 align 2**16
filesz 0x000000f4 memsz 0x000000f4 flags r--
LOAD off 0x000000f4 vaddr 0x000200f4 paddr 0x000200f4 align 2**16
filesz 0x00001c34 memsz 0x00001c34 flags r-x
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**64
filesz 0x00000000 memsz 0x01000000 flags rw-
UNKNOWN off 0x000000d4 vaddr 0x000100d4 paddr 0x000100d4 align 2**2
filesz 0x00000020 memsz 0x00000020 flags r--
Dynamic Section:
Sections:
Idx Name Size VMA Type
0 00000000 00000000
1 .ARM.exidx 00000020 000100d4
2 .text 00001c34 000200f4 TEXT
3 .debug_abbrev 000004b4 00000000 DEBUG
4 .debug_info 0001b326 00000000 DEBUG
5 .debug_str 00007d45 00000000 DEBUG
6 .debug_pubnames 000045fd 00000000 DEBUG
7 .debug_pubtypes 00000cdc 00000000 DEBUG
8 .ARM.attributes 00000045 00000000
9 .debug_frame 00003fb4 00000000 DEBUG
10 .debug_line 0000f26e 00000000 DEBUG
11 .debug_loc 0002ae70 00000000 DEBUG
12 .debug_ranges 00003c28 00000000 DEBUG
13 .comment 00000067 00000000
14 .symtab 00000150 00000000
15 .shstrtab 000000bc 00000000
16 .strtab 000000d0 00000000
SYMBOL TABLE:
00000000 l df *ABS* 00000000 memcoy
000200f4 l .text 00000000 $t
00000000 l df *ABS* 00000000 compiler_rt
00021cf0 l F .text 00000006 OUTLINED_FUNCTION_94
000200f8 l .text 00000000 $t
000200fa l .text 00000000 $t
000200fe l .text 00000000 $t
00021cf6 l F .text 00000012 OUTLINED_FUNCTION_145
00021d1a l F .text 0000000e OUTLINED_FUNCTION_222
00021cde l F .text 00000012 OUTLINED_FUNCTION_13
00021d08 l F .text 00000012 OUTLINED_FUNCTION_155
00021cde l .text 00000000 $t
00021cf0 l .text 00000000 $t
00021cf6 l .text 00000000 $t
00021d08 l .text 00000000 $t
00021d1a l .text 00000000 $t
000200f4 g F .text 00000004 _start
000200fa w F .text 00000004 __aeabi_memcpy
000200f8 w F .text 00000002 __aeabi_unwind_cpp_pr0
000200fe w F .text 00001be0 memmove
[felix@xqwork tmp]$ llvm-size memcpy
text data bss dec hex filename
7252 0 0 7252 1c54 memcpy
[felix@xqwork tmp]$ which yields a 7136 large memmove implementation with a 7252 byte large executable. vs. export fn _start(noalias dest: ?[*]u8, noalias src: ?[*]const u8, len: usize) callconv(.C) void {
for (0..len) |i| {
dest.?[i] = src.?[i];
}
} compiled with [felix@xqwork tmp]$ ./zig-linux-x86_64-0.14.0-dev.2851+b074fb7dd/zig build-exe -OReleaseSmall -target thumb-freestanding-eabihf -mcpu cortex_m4+vfp4d16sp -fno-strip memcpy.zig
[felix@xqwork tmp]$ llvm-objdump memcpy -hpt
memcoy: file format elf32-littlearm
Program Header:
PHDR off 0x00000034 vaddr 0x00010034 paddr 0x00010034 align 2**2
filesz 0x000000a0 memsz 0x000000a0 flags r--
LOAD off 0x00000000 vaddr 0x00010000 paddr 0x00010000 align 2**16
filesz 0x000000e4 memsz 0x000000e4 flags r--
LOAD off 0x000000e4 vaddr 0x000200e4 paddr 0x000200e4 align 2**16
filesz 0x00000012 memsz 0x00000012 flags r-x
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**64
filesz 0x00000000 memsz 0x01000000 flags rw-
UNKNOWN off 0x000000d4 vaddr 0x000100d4 paddr 0x000100d4 align 2**2
filesz 0x00000010 memsz 0x00000010 flags r--
Dynamic Section:
Sections:
Idx Name Size VMA Type
0 00000000 00000000
1 .ARM.exidx 00000010 000100d4
2 .text 00000012 000200e4 TEXT
3 .debug_loc 0002aeb2 00000000 DEBUG
4 .debug_abbrev 000004b4 00000000 DEBUG
5 .debug_info 0001b32c 00000000 DEBUG
6 .debug_str 00007d45 00000000 DEBUG
7 .debug_pubnames 000045fd 00000000 DEBUG
8 .debug_pubtypes 00000cdc 00000000 DEBUG
9 .ARM.attributes 00000045 00000000
10 .debug_frame 00003fb4 00000000 DEBUG
11 .debug_line 0000f288 00000000 DEBUG
12 .debug_ranges 00003c28 00000000 DEBUG
13 .comment 00000067 00000000
14 .symtab 00000070 00000000
15 .shstrtab 000000bc 00000000
16 .strtab 00000038 00000000
SYMBOL TABLE:
00000000 l df *ABS* 00000000 memcoy
000200e4 l .text 00000000 $t
00000000 l df *ABS* 00000000 compiler_rt
000200f4 l .text 00000000 $t
000200e4 g F .text 00000010 _start
000200f4 w F .text 00000002 __aeabi_unwind_cpp_pr0
[felix@xqwork tmp]$ llvm-size memcpy
text data bss dec hex filename
34 0 0 34 22 memcpy
[felix@xqwork tmp]$ which does not force memmove or memcpy to exist, and is only 34 bytes large. |
This issue should be a release blocker btw as it would make Zig completely unusable on embedded devices as soon as one uses My guess is that it's the auto-devectorization that fails considering all these byte loads and stores. A quick test confirms this: export fn _start(noalias dest: ?[*]align(64) u8, noalias src: ?[*]const align(64) u8, len: usize) callconv(.C) void {
const vec_dst: *@Vector(32, u8) = @ptrCast(dest);
const vec_src: *const @Vector(32, u8) = @ptrCast(src);
vec_dst.* = vec_src.*;
_ = len;
} compiles down to
but wrapping it in a function explodes: export fn _start(noalias dest: ?[*]align(64) u8, noalias src: ?[*]const align(64) u8, len: usize) callconv(.C) void {
copyVector(@ptrCast(dest), @as(*const @Vector(32, u8), @ptrCast(src)).*);
_=len;
}
fn copyVector(dst: *volatile @Vector(32, u8), src: @Vector(32, u8)) void {
dst.* = src;
} compiles into this bloat:
which is creating a byte-by-byte copy instead of using whatever smart instruction is possible. I'd even say this is a problematic miscompilation which might affect a lot of other non-vectorized targets as well |
Agreed, it's a release blocker. Easy to fix, we simply need to drop a good implementation in there. |
Please check |
Just now I tested with Zig 0.14.0-dev.2989+bf6ee7cb3, so here again is the size of
So that is close to the Zig 0.13.0 sizes again, roughly (~110 bytes or ~45% larger, except small is 10 bytes smaller). |
Looks good then - any problems remaining? |
Zig Version
0.14.0-dev.2851+b074fb7dd
Steps to Reproduce and Observed Behavior
Build an STM32 binary in Debug mode (for the STM32F3DISCOVERY board in this case), with 0.14.0-dev.2851+b074fb7dd results in a 0x868a binary size, an increase of almost 16K over Zig 0.13.0 (and too large to fit in this chip's flash...).
It looks like there is a
__aeabi_unwind_cpp_pr0
+__aeabi_unwind_cpp_pr1
+ a ~16Kmemmove
that all weren't there with 0.13.0...We suspect that this may have been introduced with #22513.
Reproduction scenario: With https://github.com/marnix/microzig/tree/zig-issue-22603-0.14.0-dev.2851%2Bb074fb7dd (and Zig 0.14.0-dev.2851+b074fb7dd), in examples/stmicro/stm32, run
zig build
and inspect the resulting zig-out/firmware/stm32f3discovery.elf file, e.g. usingreadelf -S
which shows a 0x868a bytes .text segment.Expected Behavior
Building the same code (modulo trivial Zig 0.14.0 syntax changes, and leaving out the microzig 0.14.0 changes from marnix/microzig@cebfb1b) with Zig 0.13.0, the binary size was 0x49a2.
To show the original behavior, with https://github.com/marnix/microzig/tree/zig-issue-22603-0.13.0 (and Zig 0.13.0), do the same. The .text segment size was 0x49a2 bytes.
The text was updated successfully, but these errors were encountered: