Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM 17 breaks inlining callees with compatible target attributes #65205

Closed
kalcutter opened this issue Sep 2, 2023 · 18 comments · Fixed by llvm/llvm-project-release-prs#708
Closed

Comments

@kalcutter
Copy link

kalcutter commented Sep 2, 2023

Consider the following code:

#include <immintrin.h>

__attribute__((target("avx512bw")))
static __attribute__((always_inline)) __m512i MM512_MASK_ADD_EPI8(__m512i src,
                                                                  __mmask64 k,
                                                                  __m512i a,
                                                                  __m512i b) {
    __asm__("vpaddb\t{%3, %2, %0 %{%1%}" : "+v"(src) : "Yk"(k), "v"(a), "v"(b));
    return src;
}

__attribute__((target("avx512bw")))
__m512i F(__m512i src, __mmask64 k, __m512i a, __m512i b) {
    return MM512_MASK_ADD_EPI8(src, k, a, b);
}

__attribute__((target("avx512bw,avx512dq")))
__m512i G(__m512i src, __mmask64 k, __m512i a, __m512i b) {
    return MM512_MASK_ADD_EPI8(src, k, a, b);
}

__attribute__((target("avx512bw,avx512vl")))
__m512i H(__m512i src, __mmask64 k, __m512i a, __m512i b) {
    return MM512_MASK_ADD_EPI8(src, k, a, b);
}

When compiling with previous versions of clang (up to and including version 16), MM512_MASK_ADD_EPI8 is inlined into F, G, and H (as expected) . Testing with LLVM 17 RC yields a different result: only F allows inlining and horrible code is generated for G and H. I believe this regression is caused by d6f994a.

Please revert that change or apply the appropriate fix to X86TargetTransformInfo.cpp for the LLVM 17 release.

@tru
Copy link
Collaborator

tru commented Sep 2, 2023

@kazutakahirata

@llvmbot
Copy link
Member

llvmbot commented Sep 2, 2023

@llvm/issue-subscribers-backend-x86

@marcauberer
Copy link
Member

@kalcutter kalcutter changed the title LLVM 17 breaks inlining callees with other target attributes LLVM 17 breaks inlining callees with compatible target attributes Sep 4, 2023
@nikic
Copy link
Contributor

nikic commented Sep 14, 2023

I think we should revert this change. I've seen this cause issues on multiple instances already. always_inline overriding the target feature check was a feature, not a bug.

@kazutakahirata
Copy link
Contributor

Sorry, I somehow missed this. Let me take a look.

@kazutakahirata
Copy link
Contributor

kazutakahirata commented Sep 14, 2023

It looks like X86TTIImpl::areInlineCompatible is rejecting the inlining opportunity because calls in the callee may become ABI-incompatible as a result of inlining. Eventually, we get to:

        // We don't know the target features of the callee,
        // assume it is incompatible.
        return false;

Now, the only call in the callee in this case is the inline asm, which shouldn't pose a problem in terms of the ABI compatibility. Disregarding inline asm like so around X86TargetTransformInfo.cpp:6063 fixes the problem:

  for (const Instruction &I : instructions(Callee)) {
    if (const auto *CB = dyn_cast<CallBase>(&I)) {
      if (CB->isInlineAsm())
        continue;

I'll turn this into a pull request with a test case.

@nikic
Copy link
Contributor

nikic commented Sep 14, 2023

That's a reasonable change, but this is only a specific issue. Others I've seen are calls being compatible in ways that LLVM does not understand (different target features that do not affect ABI) and the TTI hook not being implemented by all targets (IIRC PowerPC was one of the targets that doesn't, resulting in altivec inlining issues).

We should fix such cases to make general inlining more powerful, but IMHO the change to always_inline behavior is still questionable, at least without a comprehensive review of TTI hooks for accuracy.

@kalcutter
Copy link
Author

I agree that the new behaviour is questionable. If I annotate a function as always_inline, I want the compiler to try, to the best of its ability, to inline the function. Also, I want it to be visible when inlining fails for whatever reason. That is basically what we had before. It would try and maybe fail later with an error if the backend couldn't handle it.

@kazutakahirata
Copy link
Contributor

@nikic Are "others you've seen" are publicly available? If so, could you post pointers to some of those? (I'm aware of #65152 for the 32-bit ARM.)

One problem with this always_inline vs target issue is lack of clear specification and test cases. When I implemented https://reviews.llvm.org/D150396, I did quite a bit of archaeology in getAttributeBasedInliningDecision. I wasn't able to tell if it was intentional to proceed to inline in the presence of mismatching target attributes. No test in llvm/test broke when I changed the behavior. Maybe we should check in some of those test cases so that the compiler behavior stays consistent.

Skipping safety checks is concerning to me. In the best case, the compiler crashes (as seen in #62664), and the user would immediately know something is wrong (but not necessarily why something is wrong). In this particular case, the caller ends up with LLVM IR instructions/intrinsics that cannot be lowered to the target hardware instructions available for the specified target attributes, and the backend fails to select instructions.

Other cases could quietly lead to miscompilation. Ignoring an ABI compatibility like the one checked in X86TTIImpl::areInlineCompatible might be one such example.

I don't think if we should specify always_inline (in the presence of mismatching target attributes) as "inline despite potential miscompilation and/or instruction selection problems". That would be a bit too far and very unfriendly to users.

@kalcutter You might find -Rpass=inline helpful in examining inline decisions.

@dzaima
Copy link

dzaima commented Sep 15, 2023

Currently, a "normal" reason for failed inlining of a __attribute__((always_inline)) function produces a compile-time error: https://godbolt.org/z/cn1rssbd8 - thus, I would assume an error would be produced in similar cases, but that doesn't happen for the original code here, which is IMO quite unexpected/confusing. (also, -Rpass-missed=inline is the more useful thing for the missed inlines here)

I don't know what would be the best option here, but an always_inline function not being always inlined is quite weird at least.

edit side-note: looks like recursive always_inline functions also result in missed inlining without error: https://godbolt.org/z/jebKshhKf; GCC errors for this

@kalcutter
Copy link
Author

IMO missed inlining of always_inline functions should be an error always. If there is no agreement on that, I think at the very least it should be a warning that can be caught with -Werror. Making people rely on -Rpass-missed=inline doesn't seem reasonable for always_inline. The foot gun is too big. I only noticed this regression by luck while hand inspecting the generated code.

@nikic
Copy link
Contributor

nikic commented Sep 19, 2023

@nikic Are "others you've seen" are publicly available? If so, could you post pointers to some of those? (I'm aware of #65152 for the 32-bit ARM.)

The other x86 issues is https://reviews.llvm.org/D157826. I wasn't able to find the powerpc issue in my mail archive quickly.

In any case, we have at least 4 different regressions caused by this change, and probably more we're not aware of (because "missing inlining" may easily go unnoticed), so we should err on the side of reverting.

@kazutakahirata
Copy link
Contributor

Currently, a "normal" reason for failed inlining of a __attribute__((always_inline)) function produces a compile-time error: https://godbolt.org/z/cn1rssbd8 - thus, I would assume an error would be produced in similar cases, but that doesn't happen for the original code here, which is IMO quite unexpected/confusing. (also, -Rpass-missed=inline is the more useful thing for the missed inlines here)

Thanks for the gotbolt link. I wasn't aware of the language frontend capable of issuing error messages in certain cases. As far as the original code here goes, the backend, namely X86TTIImpl::areInlineCompatible, rejects the inlining opportunity because the callee contains a "call", namely inline asm, and the backend is afraid that the "call" could create an ABI issue in the presence of mismatching target attributes. In this particular case, I could teach the backend to not worry about the inline asm. Even then, the fact remains that the language frontend and the backend independently check inlinability, which do not necessarily agree with each other.

edit side-note: looks like recursive always_inline functions also result in missed inlining without error: https://godbolt.org/z/jebKshhKf; GCC errors for this

Thanks for this also. If I switch to GCC, I see:

<source>:2:43: error: inlining failed in call to 'always_inline' 'f': recursive inlining
    2 | static __attribute__((always_inline)) int f(int x) {

@nikic nikic moved this from Needs Triage to Needs Fix in LLVM Release Status Sep 21, 2023
kazutakahirata added a commit that referenced this issue Sep 21, 2023
This reverts commit d6f994a.

Several people have reported breakage resulting from this patch:

- #65152
- #65205
@kazutakahirata
Copy link
Contributor

I've flied an offshoot issue #67054, where the callee is declared is static inline instead of static __attribute__((always_inline)).

alessandrod pushed a commit to aya-rs/llvm-project that referenced this issue Sep 26, 2023
This reverts commit d6f994a.

Several people have reported breakage resulting from this patch:

- llvm#65152
- llvm#65205
@nikic
Copy link
Contributor

nikic commented Sep 26, 2023

/cherry-pick b4301df

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2023

/branch llvm/llvm-project-release-prs/issue65205

llvmbot pushed a commit to llvm/llvm-project-release-prs that referenced this issue Sep 26, 2023
This reverts commit d6f994a.

Several people have reported breakage resulting from this patch:

- llvm/llvm-project#65152
- llvm/llvm-project#65205

(cherry picked from commit b4301df61fc77a9d54ac236bc88742a731285f1c)
@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2023

/pull-request llvm/llvm-project-release-prs#708

@nikic nikic moved this from Needs Fix to Needs Review in LLVM Release Status Sep 26, 2023
tru pushed a commit to llvm/llvm-project-release-prs that referenced this issue Sep 27, 2023
This reverts commit d6f994a.

Several people have reported breakage resulting from this patch:

- llvm/llvm-project#65152
- llvm/llvm-project#65205

(cherry picked from commit b4301df61fc77a9d54ac236bc88742a731285f1c)
@tru tru moved this from Needs Review to Done in LLVM Release Status Sep 27, 2023
@RalfJung
Copy link
Contributor

This revert is keeping a soundness issue alive. LLVM will turn working code into broken code currently, and it's hard for frontends to work around that problem. I wonder if the approach outlined here could be a path forward towards fixing this inliner bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

8 participants