Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass to add llvm annotations to avoid inlining #1142

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Feb 26, 2025

It doesn't make much sense to outline 64x64x64 matmuls to reduce program memory (PM), only to let LLVM then inline them and go out of PM! Let's see if this doesn't cause bad regressions.

Included here: put function names in alphabetical order in Passes.*

UPDATE

This causes serious slowdown in O3 outlined matmuls. Nice memory saving though. I/we should understand how inlining is helping, it seems like the compiler backend is having blind luck.

With PR:

matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_benchmark
--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
BM_matmul/process_time/real_time_mean         2909 us          123 us            5 items_per_second=343.923/s
BM_matmul/process_time/real_time_median       2893 us          120 us            5 items_per_second=345.623/s
BM_matmul/process_time/real_time_stddev       64.2 us         12.6 us            5 items_per_second=7.49528/s
--------------------------------------------------------------------------------------------------
The largest program memory size (read from byte 72 of elf files) is 3680 bytes

Before:

matmul_4096_512_512_bf16_f32_O3_npu1_4col_outline_benchmark
--------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------
BM_matmul/process_time/real_time_mean         1897 us          108 us            5 items_per_second=527.121/s
BM_matmul/process_time/real_time_median       1896 us          107 us            5 items_per_second=527.447/s
BM_matmul/process_time/real_time_stddev       11.0 us         5.43 us            5 items_per_second=3.0455/s
--------------------------------------------------------------------------------------------------
The largest program memory size (read from byte 72 of elf files) is 8464 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant