Skip to content

Commit

Permalink
[SelectionDAG] Flags are dropped when creating a new FMUL (#66701)
Browse files Browse the repository at this point in the history
While simplifying some vector operators in DAG combine, we may need to
create new instructions for simplified vectors. At that time, we need to
make sure that all the flags of the new instruction are copied/modified
from the old instruction.

If "contract" is dropped from an instruction like FMUL, it may not
generate FMA instruction which would impact performance.

Here's an example where "contract" flag is dropped when FMUL is created.

Replacing.2 t42: v2f32 = fmul contract t41, t38
With: t48: v2f32 = fmul t38, t38

Co-authored-by: Sirish Pande <[email protected]>
  • Loading branch information
srpande and srpande authored Sep 21, 2023
1 parent 6b4a1f2 commit e6f9483
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
5 changes: 3 additions & 2 deletions llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2990,8 +2990,9 @@ bool TargetLowering::SimplifyDemandedVectorElts(
SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
TLO.DAG, Depth + 1);
if (NewOp0 || NewOp1) {
SDValue NewOp = TLO.DAG.getNode(
Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
SDValue NewOp =
TLO.DAG.getNode(Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0,
NewOp1 ? NewOp1 : Op1, Op->getFlags());
return TLO.CombineTo(Op, NewOp);
}
return false;
Expand Down
9 changes: 4 additions & 5 deletions llvm/test/CodeGen/AMDGPU/fma.ll
Original file line number Diff line number Diff line change
Expand Up @@ -159,15 +159,14 @@ define float @fold_fmul_distributive(float %x, float %y) {
define amdgpu_kernel void @vec_mul_scalar_add_fma(<2 x float> %a, <2 x float> %b, float %c1, ptr addrspace(1) %inptr) {
; GFX906-LABEL: vec_mul_scalar_add_fma:
; GFX906: ; %bb.0:
; GFX906-NEXT: s_load_dword s8, s[0:1], 0x34
; GFX906-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
; GFX906-NEXT: s_waitcnt lgkmcnt(0)
; GFX906-NEXT: s_load_dword s5, s[0:1], 0x34
; GFX906-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x3c
; GFX906-NEXT: v_mov_b32_e32 v0, 0
; GFX906-NEXT: v_mov_b32_e32 v1, s6
; GFX906-NEXT: v_mul_f32_e32 v1, s4, v1
; GFX906-NEXT: s_waitcnt lgkmcnt(0)
; GFX906-NEXT: v_add_f32_e32 v1, s5, v1
; GFX906-NEXT: v_mov_b32_e32 v1, s8
; GFX906-NEXT: v_mov_b32_e32 v2, s6
; GFX906-NEXT: v_fmac_f32_e32 v1, s4, v2
; GFX906-NEXT: global_store_dword v0, v1, s[2:3] offset:4
; GFX906-NEXT: s_endpgm
%gep = getelementptr float, ptr addrspace(1) %inptr, i32 1
Expand Down

0 comments on commit e6f9483

Please sign in to comment.