-
Notifications
You must be signed in to change notification settings - Fork 12.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Add support for half (fp16) #109164
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
@llvm/pr-subscribers-backend-systemz Author: Jonas Paulsson (JonPsson1) ChangesMake sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions. Patch in progress... Notes:
Full diff: https://github.com/llvm/llvm-project/pull/109164.diff 3 Files Affected:
diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
"-v128:64-a:8:16-n32:64");
}
MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+ HasLegalHalfType = false; // Default=false
+ HalfArgsAndReturns = false; // Default=false
+ HasFloat16 = true; // Default=false
+
HasStrictFP = true;
}
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
+ bool useFP16ConversionIntrinsics() const override {
+ return false;
+ }
+
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::BITCAST, MVT::f32, Custom);
}
+ // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+ // stores in GPRs.
+ setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+ setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+ setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+ setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %1 = load half, ptr %Op1, align 2
+ %add = fadd half %0, %1
+ store half %add, ptr %Dst, align 2
+ ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK: # %bb.0: # %entry
+; CHECK-NEXT: stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT: .cfi_offset %r12, -64
+; CHECK-NEXT: .cfi_offset %r13, -56
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -168
+; CHECK-NEXT: .cfi_def_cfa_offset 328
+; CHECK-NEXT: std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT: .cfi_offset %f8, -168
+; CHECK-NEXT: llgh %r2, 0(%r2)
+; CHECK-NEXT: lgr %r13, %r4
+; CHECK-NEXT: lgr %r12, %r3
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: llgh %r2, 0(%r12)
+; CHECK-NEXT: ler %f8, %f0
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: aebr %f0, %f8
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: sth %r2, 0(%r13)
+; CHECK-NEXT: ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT: lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT: br %r14
+entry:
+ %0 = load half, ptr %Op0, align 2
+ %ext = fpext half %0 to float
+ %1 = load half, ptr %Op1, align 2
+ %ext1 = fpext half %1 to float
+ %add = fadd float %ext, %ext1
+ %res = fptrunc float %add to half
+ store half %res, ptr %Dst, align 2
+ ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK: # %bb.0: # %start
+; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT: .cfi_offset %r14, -48
+; CHECK-NEXT: .cfi_offset %r15, -40
+; CHECK-NEXT: aghi %r15, -160
+; CHECK-NEXT: .cfi_def_cfa_offset 320
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT: larl %r1, .LCPI2_0
+; CHECK-NEXT: deb %f0, 0(%r1)
+; CHECK-NEXT: brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT: risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT: lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT: br %r14
+start:
+ %self = fdiv half %f, 0xHC700
+ %_4 = bitcast half %self to i16
+ %_0 = icmp slt i16 %_4, 0
+ ret i1 %_0
+}
|
You can test this locally with the following command:git-clang-format --diff 3cb28522ba4c2b80fbaf0840377aab4fce985110 0504c821be9e5cd8bb2288b1690e0bd9a6999a69 --extensions h,cpp,c -- clang/test/CodeGen/SystemZ/Float16.c clang/test/CodeGen/SystemZ/fp16.c clang/lib/Basic/Targets/SystemZ.h clang/lib/CodeGen/Targets/SystemZ.cpp clang/test/CodeGen/SystemZ/systemz-abi.c llvm/lib/IR/RuntimeLibcalls.cpp llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.h llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp View the diff from clang-format here.diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index b4da2c9ce6..107eb6aafa 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -107,9 +107,7 @@ public:
unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
- bool useFP16ConversionIntrinsics() const override {
- return false;
- }
+ bool useFP16ConversionIntrinsics() const override { return false; }
void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override;
diff --git a/clang/lib/CodeGen/Targets/SystemZ.cpp b/clang/lib/CodeGen/Targets/SystemZ.cpp
index 021d764dbf..9830dd7e2a 100644
--- a/clang/lib/CodeGen/Targets/SystemZ.cpp
+++ b/clang/lib/CodeGen/Targets/SystemZ.cpp
@@ -185,7 +185,7 @@ bool SystemZABIInfo::isFPArgumentType(QualType Ty) const {
if (const BuiltinType *BT = Ty->getAs<BuiltinType>())
switch (BT->getKind()) {
- case BuiltinType::Float16: // _Float16
+ case BuiltinType::Float16: // _Float16
case BuiltinType::Float:
case BuiltinType::Double:
return true;
@@ -450,9 +450,9 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const {
if (isFPArgumentType(SingleElementTy)) {
assert(Size == 16 || Size == 32 || Size == 64);
return ABIArgInfo::getDirect(
- Size == 16 ? llvm::Type::getHalfTy(getVMContext())
- : Size == 32 ? llvm::Type::getFloatTy(getVMContext())
- : llvm::Type::getDoubleTy(getVMContext()));
+ Size == 16 ? llvm::Type::getHalfTy(getVMContext())
+ : Size == 32 ? llvm::Type::getFloatTy(getVMContext())
+ : llvm::Type::getDoubleTy(getVMContext()));
} else {
llvm::IntegerType *PassTy = llvm::IntegerType::get(getVMContext(), Size);
return Size <= 32 ? ABIArgInfo::getNoExtend(PassTy)
diff --git a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
index 7f52891885..8555034399 100644
--- a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
+++ b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
@@ -894,11 +894,15 @@ ParseStatus SystemZAsmParser::parseRegister(OperandVector &Operands,
case GRH32Reg: Regs = SystemZMC::GRH32Regs; break;
case GR64Reg: Regs = SystemZMC::GR64Regs; break;
case GR128Reg: Regs = SystemZMC::GR128Regs; break;
- case FP16Reg: Regs = SystemZMC::FP16Regs; break;
+ case FP16Reg:
+ Regs = SystemZMC::FP16Regs;
+ break;
case FP32Reg: Regs = SystemZMC::FP32Regs; break;
case FP64Reg: Regs = SystemZMC::FP64Regs; break;
case FP128Reg: Regs = SystemZMC::FP128Regs; break;
- case VR16Reg: Regs = SystemZMC::VR16Regs; break;
+ case VR16Reg:
+ Regs = SystemZMC::VR16Regs;
+ break;
case VR32Reg: Regs = SystemZMC::VR32Regs; break;
case VR64Reg: Regs = SystemZMC::VR64Regs; break;
case VR128Reg: Regs = SystemZMC::VR128Regs; break;
diff --git a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
index 291b6789c7..ca3f417e9f 100644
--- a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
+++ b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
@@ -62,11 +62,10 @@ const unsigned SystemZMC::GR128Regs[16] = {
};
const unsigned SystemZMC::FP16Regs[16] = {
- SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H,
- SystemZ::F4H, SystemZ::F5H, SystemZ::F6H, SystemZ::F7H,
- SystemZ::F8H, SystemZ::F9H, SystemZ::F10H, SystemZ::F11H,
- SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H
-};
+ SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H,
+ SystemZ::F4H, SystemZ::F5H, SystemZ::F6H, SystemZ::F7H,
+ SystemZ::F8H, SystemZ::F9H, SystemZ::F10H, SystemZ::F11H,
+ SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H};
const unsigned SystemZMC::FP32Regs[16] = {
SystemZ::F0S, SystemZ::F1S, SystemZ::F2S, SystemZ::F3S,
@@ -90,15 +89,13 @@ const unsigned SystemZMC::FP128Regs[16] = {
};
const unsigned SystemZMC::VR16Regs[32] = {
- SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H,
- SystemZ::F4H, SystemZ::F5H, SystemZ::F6H, SystemZ::F7H,
- SystemZ::F8H, SystemZ::F9H, SystemZ::F10H, SystemZ::F11H,
- SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H,
- SystemZ::F16H, SystemZ::F17H, SystemZ::F18H, SystemZ::F19H,
- SystemZ::F20H, SystemZ::F21H, SystemZ::F22H, SystemZ::F23H,
- SystemZ::F24H, SystemZ::F25H, SystemZ::F26H, SystemZ::F27H,
- SystemZ::F28H, SystemZ::F29H, SystemZ::F30H, SystemZ::F31H
-};
+ SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H, SystemZ::F4H,
+ SystemZ::F5H, SystemZ::F6H, SystemZ::F7H, SystemZ::F8H, SystemZ::F9H,
+ SystemZ::F10H, SystemZ::F11H, SystemZ::F12H, SystemZ::F13H, SystemZ::F14H,
+ SystemZ::F15H, SystemZ::F16H, SystemZ::F17H, SystemZ::F18H, SystemZ::F19H,
+ SystemZ::F20H, SystemZ::F21H, SystemZ::F22H, SystemZ::F23H, SystemZ::F24H,
+ SystemZ::F25H, SystemZ::F26H, SystemZ::F27H, SystemZ::F28H, SystemZ::F29H,
+ SystemZ::F30H, SystemZ::F31H};
const unsigned SystemZMC::VR32Regs[32] = {
SystemZ::F0S, SystemZ::F1S, SystemZ::F2S, SystemZ::F3S,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index fb159236ec..5c5b1c6db5 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -6159,7 +6159,7 @@ SDValue SystemZTargetLowering::lowerFP_EXTEND(SDValue Op,
SelectionDAG &DAG) const {
SDValue In = Op.getOperand(Op->isStrictFPOpcode() ? 1 : 0);
if (In.getSimpleValueType() != MVT::f16)
- return Op; // Legal
+ return Op; // Legal
return SDValue(); // Let legalizer emit the libcall.
}
@@ -6179,18 +6179,18 @@ SDValue SystemZTargetLowering::lowerLoadF16(SDValue Op,
} else {
LoadSDNode *Ld = cast<LoadSDNode>(Op.getNode());
assert(EVT(RegVT) == Ld->getMemoryVT() && "Unhandled f16 load");
- NewLd = DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Ld->getChain(),
- Ld->getBasePtr(), Ld->getPointerInfo(),
- MVT::i16, Ld->getOriginalAlign(),
- Ld->getMemOperand()->getFlags());
+ NewLd =
+ DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Ld->getChain(),
+ Ld->getBasePtr(), Ld->getPointerInfo(), MVT::i16,
+ Ld->getOriginalAlign(), Ld->getMemOperand()->getFlags());
}
// Load as integer, shift and then insert into upper 2 bytes of the FP
// register.
SDValue Shft = DAG.getNode(ISD::SHL, DL, MVT::i32, NewLd,
DAG.getConstant(16, DL, MVT::i32));
SDValue BCast = DAG.getNode(ISD::BITCAST, DL, MVT::f32, Shft);
- SDValue F16Val = DAG.getTargetExtractSubreg(SystemZ::subreg_h16,
- DL, MVT::f16, BCast);
+ SDValue F16Val =
+ DAG.getTargetExtractSubreg(SystemZ::subreg_h16, DL, MVT::f16, BCast);
return DAG.getMergeValues({F16Val, NewLd.getValue(1)}, DL);
}
@@ -6203,19 +6203,20 @@ SDValue SystemZTargetLowering::lowerStoreF16(SDValue Op,
// Move into a GPR, shift and store the 2 bytes.
SDLoc DL(Op);
SDNode *U32 = DAG.getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, MVT::f32);
- SDValue In32 = DAG.getTargetInsertSubreg(SystemZ::subreg_h16, DL,
- MVT::f32, SDValue(U32, 0), StoredVal);
+ SDValue In32 = DAG.getTargetInsertSubreg(SystemZ::subreg_h16, DL, MVT::f32,
+ SDValue(U32, 0), StoredVal);
SDValue BCast = DAG.getNode(ISD::BITCAST, DL, MVT::i32, In32);
SDValue Shft = DAG.getNode(ISD::SRL, DL, MVT::i32, BCast,
DAG.getConstant(16, DL, MVT::i32));
if (auto *AtomicSt = dyn_cast<AtomicSDNode>(Op.getNode()))
return DAG.getAtomic(ISD::ATOMIC_STORE, DL, MVT::i16, AtomicSt->getChain(),
- Shft, AtomicSt->getBasePtr(), AtomicSt->getMemOperand());
+ Shft, AtomicSt->getBasePtr(),
+ AtomicSt->getMemOperand());
StoreSDNode *St = cast<StoreSDNode>(Op.getNode());
- return DAG.getTruncStore(St->getChain(), DL, Shft, St->getBasePtr(),
- MVT::i16, St->getMemOperand());
+ return DAG.getTruncStore(St->getChain(), DL, Shft, St->getBasePtr(), MVT::i16,
+ St->getMemOperand());
}
SDValue SystemZTargetLowering::lowerIS_FPCLASS(SDValue Op,
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
index 470543824d..b0c0d76faa 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
@@ -1007,16 +1007,16 @@ void SystemZInstrInfo::storeRegToStackSlot(
Register GR64Reg = MRI.createVirtualRegister(&SystemZ::GR64BitRegClass);
Register FP64Reg = MRI.createVirtualRegister(&SystemZ::FP64BitRegClass);
BuildMI(MBB, MBBI, DL, get(SystemZ::COPY))
- .addReg(FP64Reg, RegState::DefineNoRead, SystemZ::subreg_h16)
- .addReg(SrcReg, getKillRegState(isKill));
+ .addReg(FP64Reg, RegState::DefineNoRead, SystemZ::subreg_h16)
+ .addReg(SrcReg, getKillRegState(isKill));
BuildMI(MBB, MBBI, DL, get(SystemZ::LGDR), GR64Reg)
- .addReg(FP64Reg, RegState::Kill);
+ .addReg(FP64Reg, RegState::Kill);
BuildMI(MBB, MBBI, DL, get(SystemZ::SRLG), GR64Reg)
- .addReg(GR64Reg)
- .addReg(0)
- .addImm(48);
+ .addReg(GR64Reg)
+ .addReg(0)
+ .addImm(48);
addFrameReference(BuildMI(MBB, MBBI, DL, get(SystemZ::STH))
- .addReg(GR64Reg, RegState::Kill, SystemZ::subreg_l32),
+ .addReg(GR64Reg, RegState::Kill, SystemZ::subreg_l32),
FrameIdx);
return;
}
@@ -1046,18 +1046,18 @@ void SystemZInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
"Expected non-SSA form with virtual registers.");
Register GR64Reg = MRI.createVirtualRegister(&SystemZ::GR64BitRegClass);
Register FP64Reg = MRI.createVirtualRegister(&SystemZ::FP64BitRegClass);
- addFrameReference(BuildMI(MBB, MBBI, DL, get(SystemZ::LH))
- .addReg(GR64Reg, RegState::DefineNoRead,
- SystemZ::subreg_l32),
- FrameIdx);
+ addFrameReference(
+ BuildMI(MBB, MBBI, DL, get(SystemZ::LH))
+ .addReg(GR64Reg, RegState::DefineNoRead, SystemZ::subreg_l32),
+ FrameIdx);
BuildMI(MBB, MBBI, DL, get(SystemZ::SLLG), GR64Reg)
- .addReg(GR64Reg)
- .addReg(0)
- .addImm(48);
+ .addReg(GR64Reg)
+ .addReg(0)
+ .addImm(48);
BuildMI(MBB, MBBI, DL, get(SystemZ::LDGR), FP64Reg)
- .addReg(GR64Reg, RegState::Kill);
+ .addReg(GR64Reg, RegState::Kill);
BuildMI(MBB, MBBI, DL, get(SystemZ::COPY), DestReg)
- .addReg(FP64Reg, RegState::Kill, SystemZ::subreg_h16);
+ .addReg(FP64Reg, RegState::Kill, SystemZ::subreg_h16);
return;
}
|
Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations. |
Thanks for pointing that out - patch updated. |
I think we should define and implement a proper ABI for the half type as well. |
Patch updated after some progress... With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers. At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h... Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability. Any other comments also welcome... |
My understanding is that in GCC's From your first two sentences it sounds like @uweigand mentioned figuring out an ABI for A quick check seems to show that GCC 13 does not support Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975. |
From what I can see in the libgcc sources, I never see
Yes, we're working on that. What we're planning to do is to have
Yes, we'd have to add those. I don't think we want
Thanks for pointing this out! |
I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that
That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been. |
Patch reworked:
(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.) Still some more points to go over, but it seems to be working fairly well at this point.
|
Patch improved further:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a full review, but some general comments inline.
setOperationAction(ISD::FCOS, VT, Expand); | ||
setOperationAction(ISD::FSINCOS, VT, Expand); | ||
setOperationAction(ISD::FREM, VT, Expand); | ||
setOperationAction(ISD::FPOW, VT, Expand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, if there are no f16 libcalls it works to have them be promoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.
Updated per review.
Note on compiler-rt: not sure how to build llvm conversion functions and link them (have not tried this yet), but added the mapping in RuntimeLibcalls.cpp. |
if (TT.isSystemZ()) { | ||
setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2"); | ||
setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these names need to be set, aren't these the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may be right - as I wrote above I have not really tried this before. Would you happen to know how to build and link these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see they default to the __gnu_
functions in this file. Some targets (wasm, hexagon) manually set it to __extendhfsf2
and __truncsfhf2
in *SelfLowering.cpp
but why do targets like x86 correctly lower to these as well without an override either in this file or in selflowering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding how to build and link, they are in compiler-rt if that can be built
COMPILER_RT_ABI NOINLINE dst_t __truncsfhf2(float a) { |
__trunc
and __extend
are what you want to emit here, I'm just not sure what exactly this file needs to do because it seems like HasLegalHalfType
controls __extend
/__trunc
vs. __gnu_
lowering somehow #109164 (comment).
Improved handling to utilize vector instructions when present. New VR16 regclass, but v8f16 not legal. It might make sense to have it as a legal type and e.g. do VL;VST when moving vectors in memory, and also set all vector ops to "Expand". Not sure how trivial that change would be, given some special handlings of vector nodes, so not done as of now: only the scalar f16 is legal. Seems to work fine to add "16" versions for loads, stores and lzer/lcdfr in case of vector support. Without vector support, it might make sense to have load/store pseudos with an extra GPR def operand, so that these loads/stores can be expanded as a PostRA pseudo. Then it would only need handling in one place, but OTOH having a second explicit def operand is also undesired, maybe. f16 immediates handled like f32:
Should fp16 inline asm operands also be supported at this point? |
Thanks!
Agreed. I don't think we need vector f16 at this point.
I don't think we need to spend much effort optimizing for pre-z13 machines at this point.
The sign-operations are the only instructions we even could semantically use with f16, right? We certainly could do so, but I agree it's probably not important.
Good point. I think so, yes. Also, looks like the clang-format check is complaining a bit ... |
@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC, | |||
} else if (RC == &SystemZ::FP128BitRegClass) { | |||
LoadOpcode = SystemZ::LX; | |||
StoreOpcode = SystemZ::STX; | |||
} else if (RC == &SystemZ::FP16BitRegClass || | |||
RC == &SystemZ::VR16BitRegClass) { | |||
LoadOpcode = SystemZ::VL16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Do these even work on FP16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for instance in spill-half-01.mir.
Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.
Patch in progress...
Fixes #50374