Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86] Suboptimal lowering of short vectors equality check: could use scalar types instead #53419

Closed
xortator opened this issue Jan 26, 2022 · 9 comments
Assignees
Labels
confirmed Verified by a second party llvm:codegen

Comments

@xortator
Copy link
Contributor

xortator commented Jan 26, 2022

Motivating case: https://godbolt.org/z/rbE3TzqdP

The original test

define i1 @vector_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
  %ptr1 = bitcast i8* %arg1 to <4 x i8>*
  %ptr2 = bitcast i8* %arg to <4 x i8>*
  %lhs = load <4 x i8>, <4 x i8>* %ptr1, align 1
  %rhs = load <4 x i8>, <4 x i8>* %ptr2, align 1
  %any_ne = icmp ne <4 x i8> %lhs, %rhs
  %any_ne_scalar = bitcast <4 x i1> %any_ne to i4
  %all_eq = icmp eq i4 %any_ne_scalar, 0
  ret i1 %all_eq
}

reads two short vector values and effectively checks that they are equal. Codegen generates vector code from it:

vector_version:                         # @vector_version
        vpmovzxbd       (%rsi), %xmm0           # xmm0 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
        vpmovzxbd       (%rdi), %xmm1           # xmm1 = mem[0],zero,zero,zero,mem[1],zero,zero,zero,mem[2],zero,zero,zero,mem[3],zero,zero,zero
        vpsubd  %xmm1, %xmm0, %xmm0
        vptest  %xmm0, %xmm0
        sete    %al
        retq

This code is semantically equivalent to its scalar counterpart

define i1 @scalar_version(i8* align 1 %arg, i8* align 1 %arg1, i32 %arg2) {
bb:
  %ptr1 = bitcast i8* %arg1 to i32*
  %ptr2 = bitcast i8* %arg to i32*
  %lhs = load i32, i32* %ptr1, align 1
  %rhs = load i32, i32* %ptr2, align 1
  %all_eq = icmp eq i32 %lhs, %rhs
  ret i1 %all_eq
}

which produces neater asm:

scalar_version:                         # @scalar_version
        movl    (%rsi), %eax
        cmpl    (%rdi), %eax
        sete    %al
        retq

Unfortunately we cannot use RM vector sub here as stated in #53416, but it looks like we could give up using vector registers at all.

Not sure what is the proper place for this - codegen or instcombine.

@xortator xortator changed the title [X86] Suboptimal lowering of short vectors: could use scalar types instead [X86] Suboptimal lowering of short vectors equality check: could use scalar types instead Jan 26, 2022
@xortator
Copy link
Contributor Author

Here is something in between: if we bitcast vector values to i32 after load, codegen can produce good code.
https://godbolt.org/z/bseaqYjjq

@RKSimon RKSimon self-assigned this Jan 26, 2022
@llvmbot
Copy link
Member

llvmbot commented Jan 26, 2022

@llvm/issue-subscribers-backend-x86

@xortator
Copy link
Contributor Author

xortator commented Jan 26, 2022

3271f43 tests.

@nunoplopes
Copy link
Member

Here is something in between: if we bitcast vector values to i32 after load, codegen can produce good code. https://godbolt.org/z/bseaqYjjq

You can't do that transformation at IR level; it's not sound because of poison values. It has to be delayed until the backend.

@nunoplopes
Copy link
Member

nunoplopes commented Jan 26, 2022

Ah, nervermind, you are doing an and of all the comparisons. That's fine, yes.
(sorry, I misread it as doing 4 individual comparisons)

@xortator
Copy link
Contributor Author

https://reviews.llvm.org/D118317 should address this pattern on IR level, but I still believe there should also be a codegen solution.

@RKSimon RKSimon added the confirmed Verified by a second party label Jan 30, 2022
@RKSimon
Copy link
Collaborator

RKSimon commented Jan 30, 2022

Several things need to be addressed:

MatchVectorAllZeroTest needs to be extended to handle the (icmp bitcast(iN (icmp vNiX V, 0) ), 0) style reduction pattern that we canonicalize to - this will catch the (legal) integer size patterns.

Extend MatchVectorAllZeroTest and LowerVectorAllZero so that they handle 'VectorAllEqual' patterns - PTEST lowering will need to perform a SUB(X,Y), but the MOVMSK can still use PCMPEQB.

The ExpandReductions pass should fold the allof(icmp(vector)) -> (icmp bitcast(iN (icmp (vector)) ), 0) canonicalization, we currently only perform this inside InstCombine so if anything else has generated this we might miss it.

@RKSimon RKSimon removed their assignment Jun 23, 2022
RKSimon added a commit that referenced this issue Feb 12, 2023
…mp_eq()) / any_of(icmp_ne()) to integers

Noticed while working on Issue #59867 and Issue #53419 - there's still more to do here, but for "all vector" comparisons, we should try to cast to a scalar integer for sub-128bit types
RKSimon added a commit that referenced this issue Mar 22, 2023
…fold to AVX512 targets

Extends 1bb95a3 to combine on AVX512 targets where the vXi1 type is legal

Continues work on addressing Issue #53419
RKSimon added a commit that referenced this issue Mar 23, 2023
…kortestw patterns

Another step toward #53419 - this is also another step towards expanding MatchVectorAllZeroTest to match any pair of vectors and merge EmitAVX512Test into it.
RKSimon added a commit that referenced this issue Mar 23, 2023
…kortestw patterns (REAPPLIED)

Another step toward #53419 - this is also another step towards expanding MatchVectorAllZeroTest to match any pair of vectors and merge EmitAVX512Test into it.
@RKSimon RKSimon self-assigned this Mar 29, 2023
@RKSimon
Copy link
Collaborator

RKSimon commented Mar 30, 2023

Several things need to be addressed:

MatchVectorAllZeroTest needs to be extended to handle the (icmp bitcast(iN (icmp vNiX V, 0) ), 0) style reduction pattern that we canonicalize to - this will catch the (legal) integer size patterns.

Candidate Patch: https://reviews.llvm.org/D147243

RKSimon added a commit that referenced this issue Mar 31, 2023
…,Y)),0) vector reduction patterns

Many allof/anyof/noneof reduction patterns are canonicalized by bitcasting a vXi1 vector comparison result to iN and compared against 0/-1.

This patch adds support for recognizing a icmp_ne vector comparison against 0, which matches an 'whole vectors are equal' comparison pattern.

There are a few more steps to follow in future patches - we need to add support to MatchVectorAllZeroTest for comparing against -1 (in some cases), and this initial refactoring of LowerVectorAllZero to LowerVectorAllEqual needs to be extended so we can fully merge with the similar combineVectorSizedSetCCEquality code (which deals with scalar integer memcmp patterns).

Another step towards Issue #53419

Differential Revision: https://reviews.llvm.org/D147243
@RKSimon
Copy link
Collaborator

RKSimon commented Apr 3, 2023

Final Candidate Patch: https://reviews.llvm.org/D147452

@RKSimon RKSimon closed this as completed in 00e3ae4 Apr 4, 2023
gysit pushed a commit to nextsilicon/llvm-project that referenced this issue Apr 27, 2023
… bitcast(<X x i1> V)) canonicalization

This already exists in InstCombine but was missing from the late stage ExpandReductions pass

Fixes llvm#53419
Fixes llvm#61923

Differential Revision: https://reviews.llvm.org/D147452
DianQK pushed a commit to DianQK/llvm-project that referenced this issue Oct 10, 2023
… bitcast(<X x i1> V)) canonicalization

This already exists in InstCombine but was missing from the late stage ExpandReductions pass

Fixes llvm#53419
Fixes llvm#61923

Differential Revision: https://reviews.llvm.org/D147452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed Verified by a second party llvm:codegen
Projects
None yet
Development

No branches or pull requests

5 participants