[SystemZ] Add support for half (fp16) #109164

JonPsson1 · 2024-09-18T15:45:05Z

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Fixes #50374

llvmbot · 2024-09-18T15:45:38Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Jonas Paulsson (JonPsson1)

Changes

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Notes:


`Clang FE:

TargetInfo {
  /// Determine whether the _Float16 type is supported on this target.
  bool HasFloat16;
  ; If false gives an error message on _Float16 in C program.

  bool HasLegalHalfType; // True if the backend supports operations on the half
                         // LLVM IR type. 
  ; If false, Half:s are extended and ops are done in float, if true, ops are
  ; done in Half (by Clang). 

  -ffloat16-excess-precision=[standard,fast,none]
  "Allows control over excess precision on targets where native support for the
   precision types is not available. By default, excess precision is used to
   calculate intermediate results following the rules specified in ISO C99."
  ; =&gt; Even though we need to deal with Half operations coming from other
       languages in the BE, we still should to let Clang insert the required
       emulation (trunc/extend) instructions as required by C (_Float16). So
       HasLegalHalfType needs to be set to 'false'.
  ; =&gt; C code will have fpext/fptrunc inserted in many places to emulate
       _Float16, and operations are done in Float.
  ; =&gt; Other languages will emit Half operations, which has to be emulated by
       fpext/fptrunc in BE and then done in Float.

  /// Check whether llvm intrinsics such as llvm.convert.to.fp16 should be used
  /// to convert to and from __fp16.
  /// FIXME: This function should be removed once all targets stop using the
  /// conversion intrinsics.
  virtual bool useFP16ConversionIntrinsics() const {
    return true;
  }
  ; Use either conversion intrinsics or fpext/fptrunc from Clang.
  ; =&gt; Given the comment and the fact that other languages emit 'half' it
  ;    seems ideal to not use these.

  bool HalfArgsAndReturns;
  ; Should be true if ABI says that half values are passed / returned.
  ; - What does the SystemZ ABI require? Pass/return in float regs?
}

Middle End:
 ; Middle-End does not do much especially with half:s/conversion intrinsics it
   seems (some constant folding).

  ; InstCombiner removed an fptrunc before sub and converted the Float fsub
    to a Half fsub. =&gt; Middle end does not (at least currently) seem to care
    about the Clang HasLegalHalfType flag.

CodeGen:
  ; Common-code expansions available:
  ; The expansion of ISD::FP16_TO_FP / FP_TO_FP16 generates libcalls.
  ; The expansion of extloads/truncstores handles these as integer values
    in conjunction with the libcalls.

  ; Library calls:
    LLVM libcalls: llvm/include/llvm/IR/RuntimeLibcalls.def
    got 'undefined reference' from linker at first try...

  Conversions:
   - could NNP instructions (z16) be used (vcfn / vcnf)?
     (clang/test/CodeGen/SystemZ/builtins-systemz-zvector4.c)

- There are also corresponding strict fp nodes that probably should be handled
   as well just the same.

- The exact semantics of _Float16 in C is hopefully handled by Clang FE per the value of -ffloat16-excess-precision.
`

Full diff: https://github.com/llvm/llvm-project/pull/109164.diff

3 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.h (+9)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+7)
(added) llvm/test/CodeGen/SystemZ/fp-half.ll (+100)

diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                       "-v128:64-a:8:16-n32:64");
     }
     MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+    HasLegalHalfType = false;    // Default=false
+    HalfArgsAndReturns = false;  // Default=false
+    HasFloat16 = true;           // Default=false
+
     HasStrictFP = true;
   }
 
   unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
 
+  bool useFP16ConversionIntrinsics() const override {
+    return false;
+  }
+
   void getTargetDefines(const LangOptions &Opts,
                         MacroBuilder &Builder) const override;
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::BITCAST, MVT::f32, Custom);
   }
 
+  // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+  // stores in GPRs.
+  setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+  setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
   // VASTART and VACOPY need to deal with the SystemZ-specific varargs
   // structure, but VAEND is a no-op.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %ext = fpext half %0 to float
+  %1 = load half, ptr %Op1, align 2
+  %ext1 = fpext half %1 to float
+  %add = fadd float %ext, %ext1
+  %res = fptrunc float %add to half
+  store half %res, ptr %Dst, align 2
+  ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -160
+; CHECK-NEXT:    .cfi_def_cfa_offset 320
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    larl %r1, .LCPI2_0
+; CHECK-NEXT:    deb %f0, 0(%r1)
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT:    lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT:    br %r14
+start:
+  %self = fdiv half %f, 0xHC700
+  %_4 = bitcast half %self to i16
+  %_0 = icmp slt i16 %_4, 0
+  ret i1 %_0
+}

llvmbot · 2024-09-18T15:45:39Z

@llvm/pr-subscribers-backend-systemz

Author: Jonas Paulsson (JonPsson1)

Changes

Make sure that fp16<=>float conversions are expanded to libcalls and that 16-bit fp values can be loaded and stored properly via GPRs. With this patch the Half IR Type used in operations should be handled correctly with the help of pre-existing ISD node expansions.

Patch in progress...

Notes:


`Clang FE:

TargetInfo {
  /// Determine whether the _Float16 type is supported on this target.
  bool HasFloat16;
  ; If false gives an error message on _Float16 in C program.

  bool HasLegalHalfType; // True if the backend supports operations on the half
                         // LLVM IR type. 
  ; If false, Half:s are extended and ops are done in float, if true, ops are
  ; done in Half (by Clang). 

  -ffloat16-excess-precision=[standard,fast,none]
  "Allows control over excess precision on targets where native support for the
   precision types is not available. By default, excess precision is used to
   calculate intermediate results following the rules specified in ISO C99."
  ; =&gt; Even though we need to deal with Half operations coming from other
       languages in the BE, we still should to let Clang insert the required
       emulation (trunc/extend) instructions as required by C (_Float16). So
       HasLegalHalfType needs to be set to 'false'.
  ; =&gt; C code will have fpext/fptrunc inserted in many places to emulate
       _Float16, and operations are done in Float.
  ; =&gt; Other languages will emit Half operations, which has to be emulated by
       fpext/fptrunc in BE and then done in Float.

  /// Check whether llvm intrinsics such as llvm.convert.to.fp16 should be used
  /// to convert to and from __fp16.
  /// FIXME: This function should be removed once all targets stop using the
  /// conversion intrinsics.
  virtual bool useFP16ConversionIntrinsics() const {
    return true;
  }
  ; Use either conversion intrinsics or fpext/fptrunc from Clang.
  ; =&gt; Given the comment and the fact that other languages emit 'half' it
  ;    seems ideal to not use these.

  bool HalfArgsAndReturns;
  ; Should be true if ABI says that half values are passed / returned.
  ; - What does the SystemZ ABI require? Pass/return in float regs?
}

Middle End:
 ; Middle-End does not do much especially with half:s/conversion intrinsics it
   seems (some constant folding).

  ; InstCombiner removed an fptrunc before sub and converted the Float fsub
    to a Half fsub. =&gt; Middle end does not (at least currently) seem to care
    about the Clang HasLegalHalfType flag.

CodeGen:
  ; Common-code expansions available:
  ; The expansion of ISD::FP16_TO_FP / FP_TO_FP16 generates libcalls.
  ; The expansion of extloads/truncstores handles these as integer values
    in conjunction with the libcalls.

  ; Library calls:
    LLVM libcalls: llvm/include/llvm/IR/RuntimeLibcalls.def
    got 'undefined reference' from linker at first try...

  Conversions:
   - could NNP instructions (z16) be used (vcfn / vcnf)?
     (clang/test/CodeGen/SystemZ/builtins-systemz-zvector4.c)

- There are also corresponding strict fp nodes that probably should be handled
   as well just the same.

- The exact semantics of _Float16 in C is hopefully handled by Clang FE per the value of -ffloat16-excess-precision.
`

Full diff: https://github.com/llvm/llvm-project/pull/109164.diff

3 Files Affected:

(modified) clang/lib/Basic/Targets/SystemZ.h (+9)
(modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+7)
(added) llvm/test/CodeGen/SystemZ/fp-half.ll (+100)

diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index f05ea473017bec..6566b63d4587ee 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -91,11 +91,20 @@ class LLVM_LIBRARY_VISIBILITY SystemZTargetInfo : public TargetInfo {
                       "-v128:64-a:8:16-n32:64");
     }
     MaxAtomicPromoteWidth = MaxAtomicInlineWidth = 128;
+
+    HasLegalHalfType = false;    // Default=false
+    HalfArgsAndReturns = false;  // Default=false
+    HasFloat16 = true;           // Default=false
+
     HasStrictFP = true;
   }
 
   unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
 
+  bool useFP16ConversionIntrinsics() const override {
+    return false;
+  }
+
   void getTargetDefines(const LangOptions &Opts,
                         MacroBuilder &Builder) const override;
 
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index 582a8c139b2937..fd3dcebba1eca7 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -704,6 +704,13 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
     setOperationAction(ISD::BITCAST, MVT::f32, Custom);
   }
 
+  // Expand FP16 <=> FP32 conversions to libcalls and handle FP16 loads and
+  // stores in GPRs.
+  setOperationAction(ISD::FP16_TO_FP, MVT::f32, Expand);
+  setOperationAction(ISD::FP_TO_FP16, MVT::f32, Expand);
+  setLoadExtAction(ISD::EXTLOAD, MVT::f32, MVT::f16, Expand);
+  setTruncStoreAction(MVT::f32, MVT::f16, Expand);
+
   // VASTART and VACOPY need to deal with the SystemZ-specific varargs
   // structure, but VAEND is a no-op.
   setOperationAction(ISD::VASTART, MVT::Other, Custom);
diff --git a/llvm/test/CodeGen/SystemZ/fp-half.ll b/llvm/test/CodeGen/SystemZ/fp-half.ll
new file mode 100644
index 00000000000000..393ba2f620ff6e
--- /dev/null
+++ b/llvm/test/CodeGen/SystemZ/fp-half.ll
@@ -0,0 +1,100 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 | FileCheck %s
+;
+; Tests for FP16 (Half).
+
+; A function where everything is done in Half.
+define void @fun0(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun0:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %1 = load half, ptr %Op1, align 2
+  %add = fadd half %0, %1
+  store half %add, ptr %Dst, align 2
+  ret void
+}
+
+; A function where Half values are loaded and extended to float and then
+; operated on.
+define void @fun1(ptr %Op0, ptr %Op1, ptr %Dst) {
+; CHECK-LABEL: fun1:
+; CHECK:       # %bb.0: # %entry
+; CHECK-NEXT:    stmg %r12, %r15, 96(%r15)
+; CHECK-NEXT:    .cfi_offset %r12, -64
+; CHECK-NEXT:    .cfi_offset %r13, -56
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -168
+; CHECK-NEXT:    .cfi_def_cfa_offset 328
+; CHECK-NEXT:    std %f8, 160(%r15) # 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_offset %f8, -168
+; CHECK-NEXT:    llgh %r2, 0(%r2)
+; CHECK-NEXT:    lgr %r13, %r4
+; CHECK-NEXT:    lgr %r12, %r3
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    llgh %r2, 0(%r12)
+; CHECK-NEXT:    ler %f8, %f0
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    aebr %f0, %f8
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    sth %r2, 0(%r13)
+; CHECK-NEXT:    ld %f8, 160(%r15) # 8-byte Folded Reload
+; CHECK-NEXT:    lmg %r12, %r15, 264(%r15)
+; CHECK-NEXT:    br %r14
+entry:
+  %0 = load half, ptr %Op0, align 2
+  %ext = fpext half %0 to float
+  %1 = load half, ptr %Op1, align 2
+  %ext1 = fpext half %1 to float
+  %add = fadd float %ext, %ext1
+  %res = fptrunc float %add to half
+  store half %res, ptr %Dst, align 2
+  ret void
+}
+
+; Test case with a Half incoming argument.
+define zeroext i1 @fun2(half noundef %f) {
+; CHECK-LABEL: fun2:
+; CHECK:       # %bb.0: # %start
+; CHECK-NEXT:    stmg %r14, %r15, 112(%r15)
+; CHECK-NEXT:    .cfi_offset %r14, -48
+; CHECK-NEXT:    .cfi_offset %r15, -40
+; CHECK-NEXT:    aghi %r15, -160
+; CHECK-NEXT:    .cfi_def_cfa_offset 320
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    brasl %r14, __gnu_h2f_ieee@PLT
+; CHECK-NEXT:    larl %r1, .LCPI2_0
+; CHECK-NEXT:    deb %f0, 0(%r1)
+; CHECK-NEXT:    brasl %r14, __gnu_f2h_ieee@PLT
+; CHECK-NEXT:    risbg %r2, %r2, 63, 191, 49
+; CHECK-NEXT:    lmg %r14, %r15, 272(%r15)
+; CHECK-NEXT:    br %r14
+start:
+  %self = fdiv half %f, 0xHC700
+  %_4 = bitcast half %self to i16
+  %_0 = icmp slt i16 %_4, 0
+  ret i1 %_0
+}

github-actions · 2024-09-18T15:48:46Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 3cb28522ba4c2b80fbaf0840377aab4fce985110 0504c821be9e5cd8bb2288b1690e0bd9a6999a69 --extensions h,cpp,c -- clang/test/CodeGen/SystemZ/Float16.c clang/test/CodeGen/SystemZ/fp16.c clang/lib/Basic/Targets/SystemZ.h clang/lib/CodeGen/Targets/SystemZ.cpp clang/test/CodeGen/SystemZ/systemz-abi.c llvm/lib/IR/RuntimeLibcalls.cpp llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.h llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

View the diff from clang-format here.

diff --git a/clang/lib/Basic/Targets/SystemZ.h b/clang/lib/Basic/Targets/SystemZ.h
index b4da2c9ce6..107eb6aafa 100644
--- a/clang/lib/Basic/Targets/SystemZ.h
+++ b/clang/lib/Basic/Targets/SystemZ.h
@@ -107,9 +107,7 @@ public:
 
   unsigned getMinGlobalAlign(uint64_t Size, bool HasNonWeakDef) const override;
 
-  bool useFP16ConversionIntrinsics() const override {
-    return false;
-  }
+  bool useFP16ConversionIntrinsics() const override { return false; }
 
   void getTargetDefines(const LangOptions &Opts,
                         MacroBuilder &Builder) const override;
diff --git a/clang/lib/CodeGen/Targets/SystemZ.cpp b/clang/lib/CodeGen/Targets/SystemZ.cpp
index 021d764dbf..9830dd7e2a 100644
--- a/clang/lib/CodeGen/Targets/SystemZ.cpp
+++ b/clang/lib/CodeGen/Targets/SystemZ.cpp
@@ -185,7 +185,7 @@ bool SystemZABIInfo::isFPArgumentType(QualType Ty) const {
 
   if (const BuiltinType *BT = Ty->getAs<BuiltinType>())
     switch (BT->getKind()) {
-    case BuiltinType::Float16:  // _Float16
+    case BuiltinType::Float16: // _Float16
     case BuiltinType::Float:
     case BuiltinType::Double:
       return true;
@@ -450,9 +450,9 @@ ABIArgInfo SystemZABIInfo::classifyArgumentType(QualType Ty) const {
     if (isFPArgumentType(SingleElementTy)) {
       assert(Size == 16 || Size == 32 || Size == 64);
       return ABIArgInfo::getDirect(
-          Size == 16 ? llvm::Type::getHalfTy(getVMContext())
-                     : Size == 32 ? llvm::Type::getFloatTy(getVMContext())
-                                  : llvm::Type::getDoubleTy(getVMContext()));
+          Size == 16   ? llvm::Type::getHalfTy(getVMContext())
+          : Size == 32 ? llvm::Type::getFloatTy(getVMContext())
+                       : llvm::Type::getDoubleTy(getVMContext()));
     } else {
       llvm::IntegerType *PassTy = llvm::IntegerType::get(getVMContext(), Size);
       return Size <= 32 ? ABIArgInfo::getNoExtend(PassTy)
diff --git a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
index 7f52891885..8555034399 100644
--- a/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
+++ b/llvm/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
@@ -894,11 +894,15 @@ ParseStatus SystemZAsmParser::parseRegister(OperandVector &Operands,
   case GRH32Reg: Regs = SystemZMC::GRH32Regs; break;
   case GR64Reg:  Regs = SystemZMC::GR64Regs;  break;
   case GR128Reg: Regs = SystemZMC::GR128Regs; break;
-  case FP16Reg:  Regs = SystemZMC::FP16Regs;  break;
+  case FP16Reg:
+    Regs = SystemZMC::FP16Regs;
+    break;
   case FP32Reg:  Regs = SystemZMC::FP32Regs;  break;
   case FP64Reg:  Regs = SystemZMC::FP64Regs;  break;
   case FP128Reg: Regs = SystemZMC::FP128Regs; break;
-  case VR16Reg:  Regs = SystemZMC::VR16Regs;  break;
+  case VR16Reg:
+    Regs = SystemZMC::VR16Regs;
+    break;
   case VR32Reg:  Regs = SystemZMC::VR32Regs;  break;
   case VR64Reg:  Regs = SystemZMC::VR64Regs;  break;
   case VR128Reg: Regs = SystemZMC::VR128Regs; break;
diff --git a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
index 291b6789c7..ca3f417e9f 100644
--- a/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
+++ b/llvm/lib/Target/SystemZ/MCTargetDesc/SystemZMCTargetDesc.cpp
@@ -62,11 +62,10 @@ const unsigned SystemZMC::GR128Regs[16] = {
 };
 
 const unsigned SystemZMC::FP16Regs[16] = {
-  SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H,
-  SystemZ::F4H, SystemZ::F5H, SystemZ::F6H, SystemZ::F7H,
-  SystemZ::F8H, SystemZ::F9H, SystemZ::F10H, SystemZ::F11H,
-  SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H
-};
+    SystemZ::F0H,  SystemZ::F1H,  SystemZ::F2H,  SystemZ::F3H,
+    SystemZ::F4H,  SystemZ::F5H,  SystemZ::F6H,  SystemZ::F7H,
+    SystemZ::F8H,  SystemZ::F9H,  SystemZ::F10H, SystemZ::F11H,
+    SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H};
 
 const unsigned SystemZMC::FP32Regs[16] = {
   SystemZ::F0S, SystemZ::F1S, SystemZ::F2S, SystemZ::F3S,
@@ -90,15 +89,13 @@ const unsigned SystemZMC::FP128Regs[16] = {
 };
 
 const unsigned SystemZMC::VR16Regs[32] = {
-  SystemZ::F0H, SystemZ::F1H, SystemZ::F2H, SystemZ::F3H,
-  SystemZ::F4H, SystemZ::F5H, SystemZ::F6H, SystemZ::F7H,
-  SystemZ::F8H, SystemZ::F9H, SystemZ::F10H, SystemZ::F11H,
-  SystemZ::F12H, SystemZ::F13H, SystemZ::F14H, SystemZ::F15H,
-  SystemZ::F16H, SystemZ::F17H, SystemZ::F18H, SystemZ::F19H,
-  SystemZ::F20H, SystemZ::F21H, SystemZ::F22H, SystemZ::F23H,
-  SystemZ::F24H, SystemZ::F25H, SystemZ::F26H, SystemZ::F27H,
-  SystemZ::F28H, SystemZ::F29H, SystemZ::F30H, SystemZ::F31H
-};
+    SystemZ::F0H,  SystemZ::F1H,  SystemZ::F2H,  SystemZ::F3H,  SystemZ::F4H,
+    SystemZ::F5H,  SystemZ::F6H,  SystemZ::F7H,  SystemZ::F8H,  SystemZ::F9H,
+    SystemZ::F10H, SystemZ::F11H, SystemZ::F12H, SystemZ::F13H, SystemZ::F14H,
+    SystemZ::F15H, SystemZ::F16H, SystemZ::F17H, SystemZ::F18H, SystemZ::F19H,
+    SystemZ::F20H, SystemZ::F21H, SystemZ::F22H, SystemZ::F23H, SystemZ::F24H,
+    SystemZ::F25H, SystemZ::F26H, SystemZ::F27H, SystemZ::F28H, SystemZ::F29H,
+    SystemZ::F30H, SystemZ::F31H};
 
 const unsigned SystemZMC::VR32Regs[32] = {
   SystemZ::F0S, SystemZ::F1S, SystemZ::F2S, SystemZ::F3S,
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index fb159236ec..5c5b1c6db5 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -6159,7 +6159,7 @@ SDValue SystemZTargetLowering::lowerFP_EXTEND(SDValue Op,
                                               SelectionDAG &DAG) const {
   SDValue In = Op.getOperand(Op->isStrictFPOpcode() ? 1 : 0);
   if (In.getSimpleValueType() != MVT::f16)
-    return Op;  // Legal
+    return Op;      // Legal
   return SDValue(); // Let legalizer emit the libcall.
 }
 
@@ -6179,18 +6179,18 @@ SDValue SystemZTargetLowering::lowerLoadF16(SDValue Op,
   } else {
     LoadSDNode *Ld = cast<LoadSDNode>(Op.getNode());
     assert(EVT(RegVT) == Ld->getMemoryVT() && "Unhandled f16 load");
-    NewLd = DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Ld->getChain(),
-                           Ld->getBasePtr(), Ld->getPointerInfo(),
-                           MVT::i16, Ld->getOriginalAlign(),
-                           Ld->getMemOperand()->getFlags());
+    NewLd =
+        DAG.getExtLoad(ISD::EXTLOAD, DL, MVT::i32, Ld->getChain(),
+                       Ld->getBasePtr(), Ld->getPointerInfo(), MVT::i16,
+                       Ld->getOriginalAlign(), Ld->getMemOperand()->getFlags());
   }
   // Load as integer, shift and then insert into upper 2 bytes of the FP
   // register.
   SDValue Shft = DAG.getNode(ISD::SHL, DL, MVT::i32, NewLd,
                              DAG.getConstant(16, DL, MVT::i32));
   SDValue BCast = DAG.getNode(ISD::BITCAST, DL, MVT::f32, Shft);
-  SDValue F16Val = DAG.getTargetExtractSubreg(SystemZ::subreg_h16,
-                                              DL, MVT::f16, BCast);
+  SDValue F16Val =
+      DAG.getTargetExtractSubreg(SystemZ::subreg_h16, DL, MVT::f16, BCast);
   return DAG.getMergeValues({F16Val, NewLd.getValue(1)}, DL);
 }
 
@@ -6203,19 +6203,20 @@ SDValue SystemZTargetLowering::lowerStoreF16(SDValue Op,
   // Move into a GPR, shift and store the 2 bytes.
   SDLoc DL(Op);
   SDNode *U32 = DAG.getMachineNode(TargetOpcode::IMPLICIT_DEF, DL, MVT::f32);
-  SDValue In32 = DAG.getTargetInsertSubreg(SystemZ::subreg_h16, DL,
-                                           MVT::f32, SDValue(U32, 0), StoredVal);
+  SDValue In32 = DAG.getTargetInsertSubreg(SystemZ::subreg_h16, DL, MVT::f32,
+                                           SDValue(U32, 0), StoredVal);
   SDValue BCast = DAG.getNode(ISD::BITCAST, DL, MVT::i32, In32);
   SDValue Shft = DAG.getNode(ISD::SRL, DL, MVT::i32, BCast,
                              DAG.getConstant(16, DL, MVT::i32));
 
   if (auto *AtomicSt = dyn_cast<AtomicSDNode>(Op.getNode()))
     return DAG.getAtomic(ISD::ATOMIC_STORE, DL, MVT::i16, AtomicSt->getChain(),
-                         Shft, AtomicSt->getBasePtr(), AtomicSt->getMemOperand());
+                         Shft, AtomicSt->getBasePtr(),
+                         AtomicSt->getMemOperand());
 
   StoreSDNode *St = cast<StoreSDNode>(Op.getNode());
-  return DAG.getTruncStore(St->getChain(), DL, Shft, St->getBasePtr(),
-                           MVT::i16, St->getMemOperand());
+  return DAG.getTruncStore(St->getChain(), DL, Shft, St->getBasePtr(), MVT::i16,
+                           St->getMemOperand());
 }
 
 SDValue SystemZTargetLowering::lowerIS_FPCLASS(SDValue Op,
diff --git a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
index 470543824d..b0c0d76faa 100644
--- a/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
@@ -1007,16 +1007,16 @@ void SystemZInstrInfo::storeRegToStackSlot(
     Register GR64Reg = MRI.createVirtualRegister(&SystemZ::GR64BitRegClass);
     Register FP64Reg = MRI.createVirtualRegister(&SystemZ::FP64BitRegClass);
     BuildMI(MBB, MBBI, DL, get(SystemZ::COPY))
-      .addReg(FP64Reg, RegState::DefineNoRead, SystemZ::subreg_h16)
-      .addReg(SrcReg, getKillRegState(isKill));
+        .addReg(FP64Reg, RegState::DefineNoRead, SystemZ::subreg_h16)
+        .addReg(SrcReg, getKillRegState(isKill));
     BuildMI(MBB, MBBI, DL, get(SystemZ::LGDR), GR64Reg)
-      .addReg(FP64Reg, RegState::Kill);
+        .addReg(FP64Reg, RegState::Kill);
     BuildMI(MBB, MBBI, DL, get(SystemZ::SRLG), GR64Reg)
-      .addReg(GR64Reg)
-      .addReg(0)
-      .addImm(48);
+        .addReg(GR64Reg)
+        .addReg(0)
+        .addImm(48);
     addFrameReference(BuildMI(MBB, MBBI, DL, get(SystemZ::STH))
-                        .addReg(GR64Reg, RegState::Kill, SystemZ::subreg_l32),
+                          .addReg(GR64Reg, RegState::Kill, SystemZ::subreg_l32),
                       FrameIdx);
     return;
   }
@@ -1046,18 +1046,18 @@ void SystemZInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
            "Expected non-SSA form with virtual registers.");
     Register GR64Reg = MRI.createVirtualRegister(&SystemZ::GR64BitRegClass);
     Register FP64Reg = MRI.createVirtualRegister(&SystemZ::FP64BitRegClass);
-    addFrameReference(BuildMI(MBB, MBBI, DL, get(SystemZ::LH))
-                        .addReg(GR64Reg, RegState::DefineNoRead,
-                                SystemZ::subreg_l32),
-                      FrameIdx);
+    addFrameReference(
+        BuildMI(MBB, MBBI, DL, get(SystemZ::LH))
+            .addReg(GR64Reg, RegState::DefineNoRead, SystemZ::subreg_l32),
+        FrameIdx);
     BuildMI(MBB, MBBI, DL, get(SystemZ::SLLG), GR64Reg)
-      .addReg(GR64Reg)
-      .addReg(0)
-      .addImm(48);
+        .addReg(GR64Reg)
+        .addReg(0)
+        .addImm(48);
     BuildMI(MBB, MBBI, DL, get(SystemZ::LDGR), FP64Reg)
-      .addReg(GR64Reg, RegState::Kill);
+        .addReg(GR64Reg, RegState::Kill);
     BuildMI(MBB, MBBI, DL, get(SystemZ::COPY), DestReg)
-      .addReg(FP64Reg, RegState::Kill, SystemZ::subreg_h16);
+        .addReg(FP64Reg, RegState::Kill, SystemZ::subreg_h16);
     return;
   }

nikic · 2024-09-19T10:39:42Z

Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations.

JonPsson1 · 2024-09-19T11:43:26Z

Note that you need to also have softPromoteHalfType return true to get correct legalization for half operations.

Thanks for pointing that out - patch updated.

uweigand · 2024-09-30T15:02:31Z

I think we should define and implement a proper ABI for the half type as well.

JonPsson1 · 2024-10-02T13:08:34Z

Patch updated after some progress...

With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers.

At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h...

Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability.

Any other comments also welcome...

clang/test/CodeGen/SystemZ/fexcess-precision.c

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

llvm/test/CodeGen/SystemZ/fp-half.ll

tgross35 · 2024-10-23T09:16:42Z

With this version, the fp16 values are passed to conversion functions as integer, which seems to be the default. It is however a bit tricky to do this and at the same time pass half values in FP registers.

At this point I wonder for one thing if it would be better to pass FP16 values to the conversion functions as _Float16 instead? It seems this may be possible to change in the configurations by looking at COMPILER_RT_HAS_FLOAT16 / compiler-rt/lib/builtins/extendhfsf2.c / fp_extend.h...

Not really sure if those conversion functions are supposed to be built and only used for soft-promotion of fp16, or if there are any external implications, for instance gcc compatability.

My understanding is that in GCC's __gnu_h2f_ieee/__gnu_f2h_ieee is always i32<->i16 (integer ABI), then __extendhfsf2/__truncsfhf2 uses either int16_t or _Float16 on a per-target basis as controlled by __LIBGCC_HAS_HF_MODE__ (I don't know where this gets set). In LLVM compiler-rt, COMPILER_RT_HAS_FLOAT16 is the control to do the same thing but it affects extend/trunc as well as h2f/f2h. I think the discrepancy works out here because if a target has _Float16, it will never be calling __gnu_h2f_ieee __gnu_f2h_ieee.

From your first two sentences it sounds like f16 is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass f16 as i16 and avoid the FP registers, or make _Float16 available so COMPILER_RT_HAS_FLOAT16 can be used.

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

A quick check seems to show that GCC 13 does not support _Float16 on s390x, nor does the crossbuild libgcc.a provide __gnu_h2f_ieee, __gnu_f2h_ieee, __extendhfsf2, or __truncsfhf2. So I think LLVM will be the one to set the precedent here.

Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.

uweigand · 2024-10-23T18:44:58Z

My understanding is that in GCC's __gnu_h2f_ieee/__gnu_f2h_ieee is always i32<->i16 (integer ABI), then __extendhfsf2/__truncsfhf2 uses either int16_t or _Float16 on a per-target basis as controlled by __LIBGCC_HAS_HF_MODE__ (I don't know where this gets set). In LLVM compiler-rt, COMPILER_RT_HAS_FLOAT16 is the control to do the same thing but it affects extend/trunc as well as h2f/f2h. I think the discrepancy works out here because if a target has _Float16, it will never be calling __gnu_h2f_ieee __gnu_f2h_ieee.

From what I can see in the libgcc sources, __gnu_h2f_ieee/__gnu_f2h_ieee is indeed always i32<->i16, but it is only present on 32-bit ARM, no other platforms. On AArch64, GCC will always use inline instructions to perform the conversion. On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use __extendhfsf2/__truncsfhf2 with a HFmode argument (this corresponds to _Float16, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.

I never see __extendhfsf2/__truncsfhf2 being used with int16_t, even in principle, on any platform in libgcc. There is indeed a setting __LIBGCC_HAS_HF_MODE__ (controlled indirectly by the GCC target back-end's TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (__mulhc3 / __divhc3) are being built. Am I missing something here?

From your first two sentences it sounds like f16 is getting passed in a FP register but going FP->GPR->__gnu_h2f_ieee->FP->some_math_op->FP->__gnu_f2h_ieee->GPR->FP? I think it makes sense to either always pass f16 as i16 and avoid the FP registers, or make _Float16 available so COMPILER_RT_HAS_FLOAT16 can be used.

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

Yes, we're working on that. What we're planning to do is to have _Float16 be passed and returned in the same way as float and double, i.e. using (part of) certain floating-point registers. These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).

A quick check seems to show that GCC 13 does not support _Float16 on s390x, nor does the crossbuild libgcc.a provide __gnu_h2f_ieee, __gnu_f2h_ieee, __extendhfsf2, or __truncsfhf2. So I think LLVM will be the one to set the precedent here.

Yes, we'd have to add those. I don't think we want __gnu_h2f_ieee or __gnu_f2h_ieee as those are ARM-only. We'd be defining and using __extendhfsf2 and __truncsfhf2, which would be defined with _Float16 arguments passed in floating-point registers. Either way, we should define the same set of routines (with the same ABI) in libgcc and compiler-rt.

Note that there are some common issues with these conversions, would probably be good to test against them if possible #97981 #97975.

Thanks for pointing this out!

tgross35 · 2024-10-23T21:46:55Z

From what I can see in the libgcc sources, __gnu_h2f_ieee/__gnu_f2h_ieee is indeed always i32<->i16, but it is only present on 32-bit ARM, no other platforms. On AArch64, GCC will always use inline instructions to perform the conversion. On 32-bit and 64-bit Intel, the compiler will use inline instructions if AVX512-FP16 is available; if not, but SSE2 is available, the compiler will use __extendhfsf2/__truncsfhf2 with a HFmode argument (this corresponds to _Float16, i.e. it is passed in SSE2 registers, not like an integer); if not even SSE2 is available, using the type will result in an error.

I never see __extendhfsf2/__truncsfhf2 being used with int16_t, even in principle, on any platform in libgcc. There is indeed a setting __LIBGCC_HAS_HF_MODE__ (controlled indirectly by the GCC target back-end's TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P setting), but the only thing that appears to be controlled by this flag is whether routines for complex multiplication and division (__mulhc3 / __divhc3) are being built. Am I missing something here?

I think this is accurate, libgcc just appears to (reasonably) not provide any f16-related symbols on platforms where GCC doesn't support _Float16. LLVM does seem to use __gnu_h2f_ieee and __gnu_f2h_ieee though, on targets where Clang doesn't have _Float16 (e.g. PowerPC, Wasm, x86-32 without SSE), which is why it shows up in the current state of this PR. Presumably this is HasLegalHalfType?

For that reason we just always provide the symbols in rust's compiler-builtins (though we let LLVM figure out that f16 is i16).

@uweigand mentioned figuring out an ABI for _Float16, is this possible? That seems like the best option.

Yes, we're working on that. What we're planning to do is to have _Float16 be passed and returned in the same way as float and double, i.e. using (part of) certain floating-point registers. These registers are available on every SystemZ architecture level, so we would not have to guard their use (like Intel does with the SSE2 registers).

That is great news, especially considering how problematic the target-feature-dependent ABI on x86-32 has been.

clang/lib/Sema/SemaExpr.cpp

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

llvm/test/CodeGen/SystemZ/fp-half.ll

JonPsson1 · 2024-10-31T12:36:49Z

Patch reworked:

Make f16 a legal type using FP16BitRegClass in order to properly model in/out args with physregs.
Add FP16 register class (not "VR16"), and have everything work correctly also without vector support.
Conversion functions added as libcalls, taking args in fp registers.

(twoaddr-kill.mir test updated as the hard-coded register class enum value for GRH32BitRegClass has changed.)

Still some more points to go over, but it seems to be working fairly well at this point.

Todo:
- vector f16..?
- Support strict F16 as well?
- atomic memops?
- Maybe check SystemZTTI cost functions to make sure they do not give low costs for vector operations?
- F16 vector constants, loads ands stores are not needed (at least currently).

JonPsson1 · 2024-11-06T20:23:03Z

Patch improved further:

Atomic memops handled.
Spill/reload
Handled in loadRegFromStackSlot() and storeRegToStackSlot(). VRegs can be used here which
makes it straightforward, but special sequences needed (without using VSTE/VLE).
__fp16:
HalfArgsAndReturns=true => __fp16 arguments allowed.
Tests added.
f16 vectors:
Tests added. All seems to work.
strict fp:
Again the question of conversion functions:
IDS::STRICT_FP_ROUND/STRICT_FP_EXTEND needs to be lowered to something, but not sure
if that requires special treatment, or if the same conversion functions can be used.
Maybe wait with strict fp16?

uweigand

Not a full review, but some general comments inline.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

uweigand · 2024-11-19T13:36:34Z

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

+      setOperationAction(ISD::FCOS, VT, Expand);
+      setOperationAction(ISD::FSINCOS, VT, Expand);
+      setOperationAction(ISD::FREM, VT, Expand);
+      setOperationAction(ISD::FPOW, VT, Expand);


Shouldn't these be Promote just like all the other f16 operations? Expand triggers a libcall, which doesn't match the excess-precision setting - also, we actually don't have f16 libcalls in libm ...

ok, if there are no f16 libcalls it works to have them be promoted.

Just crosslinking that there is an effort to add f16 libcalls #95250 but I have no clue what the plan is as far as lowering to them.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

clang/lib/Basic/Targets/SystemZ.h

clang/lib/CodeGen/Targets/SystemZ.cpp

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

JonPsson1 · 2024-11-20T01:55:28Z

Updated per review.

strict fp-round / fp-extend added with tests.
math functions like fsin promoted instead of expanded to non-existing fsinh.
conversion functions from e.g. f16 -> f64 used instead of separate steps.
__fp16 argument/return values removed (and tests in systemz-abi.c removed).
docs/LanguageExtensions: SystemZ added as supporting _Float16.

Note on compiler-rt: not sure how to build llvm conversion functions and link them (have not tried this yet), but added the mapping in RuntimeLibcalls.cpp.

tgross35 · 2024-11-20T02:08:39Z

llvm/lib/IR/RuntimeLibcalls.cpp

+  if (TT.isSystemZ()) {
+    setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");
+    setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");
+  }


Why do these names need to be set, aren't these the default?

you may be right - as I wrote above I have not really tried this before. Would you happen to know how to build and link these?

Hm, I see they default to the __gnu_ functions in this file. Some targets (wasm, hexagon) manually set it to __extendhfsf2 and __truncsfhf2 in *SelfLowering.cpp but why do targets like x86 correctly lower to these as well without an override either in this file or in selflowering?

Regarding how to build and link, they are in compiler-rt if that can be built

llvm-project/compiler-rt/lib/builtins/truncsfhf2.c

Line 15 in fa22100

COMPILER_RT_ABI NOINLINE dst_t __truncsfhf2(float a) {

. __trunc and __extend are what you want to emit here, I'm just not sure what exactly this file needs to do because it seems like HasLegalHalfType controls __extend/__trunc vs. __gnu_ lowering somehow #109164 (comment).

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

JonPsson1 · 2024-11-26T19:11:21Z

Improved handling to utilize vector instructions when present. New VR16 regclass, but v8f16 not legal. It might make sense to have it as a legal type and e.g. do VL;VST when moving vectors in memory, and also set all vector ops to "Expand". Not sure how trivial that change would be, given some special handlings of vector nodes, so not done as of now: only the scalar f16 is legal.

Seems to work fine to add "16" versions for loads, stores and lzer/lcdfr in case of vector support.

Without vector support, it might make sense to have load/store pseudos with an extra GPR def operand, so that these loads/stores can be expanded as a PostRA pseudo. Then it would only need handling in one place, but OTOH having a second explicit def operand is also undesired, maybe.

f16 immediates handled like f32:

Basic support added for fp 0.0/-0.0 and generation of vector constants (which should always work btw given their size with vrepih).
Single-lane vector instructions like WFLCSB not used for fp16 (yet), even though it should be possible to add _16 variants. Doesn't seem important, so skipping.

Should fp16 inline asm operands also be supported at this point?

uweigand · 2024-11-27T10:59:03Z

Improved handling to utilize vector instructions when present.

Thanks!

New VR16 regclass, but v8f16 not legal. It might make sense to have it as a legal type and e.g. do VL;VST when moving vectors in memory, and also set all vector ops to "Expand". Not sure how trivial that change would be, given some special handlings of vector nodes, so not done as of now: only the scalar f16 is legal.

Agreed. I don't think we need vector f16 at this point.

Without vector support, it might make sense to have load/store pseudos with an extra GPR def operand, so that these loads/stores can be expanded as a PostRA pseudo. Then it would only need handling in one place, but OTOH having a second explicit def operand is also undesired, maybe.

I don't think we need to spend much effort optimizing for pre-z13 machines at this point.

f16 immediates handled like f32:

* Basic support added for fp 0.0/-0.0 and generation of vector constants (which should always work btw given their size with vrepih).

* Single-lane vector instructions like WFLCSB not used for fp16 (yet), even though it should be possible to add _16 variants. Doesn't seem important, so skipping.

The sign-operations are the only instructions we even could semantically use with f16, right? We certainly could do so, but I agree it's probably not important.

Should fp16 inline asm operands also be supported at this point?

Good point. I think so, yes.

Also, looks like the clang-format check is complaining a bit ...

uweigand · 2024-11-27T10:59:50Z

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

@@ -1883,6 +1931,10 @@ void SystemZInstrInfo::getLoadStoreOpcodes(const TargetRegisterClass *RC,
  } else if (RC == &SystemZ::FP128BitRegClass) {
    LoadOpcode = SystemZ::LX;
    StoreOpcode = SystemZ::STX;
+  } else if (RC == &SystemZ::FP16BitRegClass ||
+             RC == &SystemZ::VR16BitRegClass) {
+    LoadOpcode = SystemZ::VL16;


Hmm. Do these even work on FP16?

Yes, for instance in spill-half-01.mir.

llvm/lib/Target/SystemZ/SystemZInstrFP.td

JonPsson1 requested a review from uweigand September 18, 2024 15:45

llvmbot added clang Clang issues not falling into any other category backend:SystemZ clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 18, 2024

JonPsson1 force-pushed the FP16 branch from e0bb2ad to c70d421 Compare September 19, 2024 11:42

JonPsson1 force-pushed the FP16 branch from c70d421 to ae1e35c Compare October 2, 2024 12:56

llvmbot added the clang:codegen label Oct 2, 2024

JonPsson1 mentioned this pull request Oct 7, 2024

SystemZ Backend: Add support for operations such as FP16_TO_FP and FP_TO_FP16 #50374

Open

JonPsson1 requested a review from arsenm October 7, 2024 06:58

arsenm reviewed Oct 7, 2024

View reviewed changes

uweigand reviewed Oct 25, 2024

View reviewed changes

JonPsson1 force-pushed the FP16 branch from ae1e35c to 44dffa4 Compare October 31, 2024 12:30

llvmbot added compiler-rt compiler-rt:builtins llvm:ir labels Oct 31, 2024

JonPsson1 force-pushed the FP16 branch from 44dffa4 to 26660a6 Compare November 6, 2024 20:15

uweigand mentioned this pull request Nov 18, 2024

Regression: native builds broken on s390x rust-lang/rust#133177

Closed

uweigand reviewed Nov 19, 2024

View reviewed changes

JonPsson1 force-pushed the FP16 branch from 26660a6 to a128da7 Compare November 20, 2024 01:51

tgross35 reviewed Nov 20, 2024

View reviewed changes

uweigand reviewed Nov 20, 2024

View reviewed changes

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp Outdated Show resolved Hide resolved

JonPsson1 added 7 commits November 25, 2024 12:15

Initial experiments (with integer regs for fp16).

68bbb0e

Experiment with soft-promotion in FP regs (not working).

60f6dd2

Try to make f16 legal instead

6f2b4bb

Atomic loads/stores, spill/reload, tests for __fp16 and half vectors.

204da86

strict f16 with tests.

c700b33

Review

18e1dee

Make use of vector facility if present.

0504c82

JonPsson1 force-pushed the FP16 branch from a128da7 to 0504c82 Compare November 26, 2024 19:08

uweigand reviewed Nov 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SystemZ] Add support for half (fp16) #109164

[SystemZ] Add support for half (fp16) #109164

JonPsson1 commented Sep 18, 2024 •

edited

Loading

llvmbot commented Sep 18, 2024 •

edited

Loading

llvmbot commented Sep 18, 2024

github-actions bot commented Sep 18, 2024 •

edited

Loading

nikic commented Sep 19, 2024

JonPsson1 commented Sep 19, 2024

uweigand commented Sep 30, 2024

JonPsson1 commented Oct 2, 2024

tgross35 commented Oct 23, 2024

uweigand commented Oct 23, 2024

tgross35 commented Oct 23, 2024 •

edited

Loading

JonPsson1 commented Oct 31, 2024

JonPsson1 commented Nov 6, 2024

uweigand left a comment

uweigand Nov 19, 2024

JonPsson1 Nov 19, 2024 •

edited

Loading

tgross35 Nov 20, 2024

JonPsson1 commented Nov 20, 2024

tgross35 Nov 20, 2024

JonPsson1 Nov 20, 2024

tgross35 Nov 20, 2024

tgross35 Nov 20, 2024

JonPsson1 commented Nov 26, 2024

uweigand commented Nov 27, 2024

uweigand Nov 27, 2024

JonPsson1 Nov 27, 2024

[SystemZ] Add support for half (fp16) #109164

Are you sure you want to change the base?

[SystemZ] Add support for half (fp16) #109164

Conversation

JonPsson1 commented Sep 18, 2024 • edited Loading

llvmbot commented Sep 18, 2024 • edited Loading

llvmbot commented Sep 18, 2024

github-actions bot commented Sep 18, 2024 • edited Loading

nikic commented Sep 19, 2024

JonPsson1 commented Sep 19, 2024

uweigand commented Sep 30, 2024

JonPsson1 commented Oct 2, 2024

tgross35 commented Oct 23, 2024

uweigand commented Oct 23, 2024

tgross35 commented Oct 23, 2024 • edited Loading

JonPsson1 commented Oct 31, 2024

JonPsson1 commented Nov 6, 2024

uweigand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonPsson1 Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonPsson1 commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonPsson1 commented Nov 26, 2024

uweigand commented Nov 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonPsson1 commented Sep 18, 2024 •

edited

Loading

llvmbot commented Sep 18, 2024 •

edited

Loading

github-actions bot commented Sep 18, 2024 •

edited

Loading

tgross35 commented Oct 23, 2024 •

edited

Loading

JonPsson1 Nov 19, 2024 •

edited

Loading