Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float16 broken on Julia 1.9 (LLVM14) on Intel Sapphire Rapids #51686

Closed
BioTurboNick opened this issue Oct 12, 2023 · 5 comments
Closed

Float16 broken on Julia 1.9 (LLVM14) on Intel Sapphire Rapids #51686

BioTurboNick opened this issue Oct 12, 2023 · 5 comments
Labels

Comments

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Oct 12, 2023

Tracking down the cause of a segmentation fault on Sapphire Rapids #51482, I may have found the root issue. Float16 is broken on this platform, possibly due to bad LLVM output.

The issue does not occur with Julia 1.8 (LLVM 13) or Julia 1.10-beta3 (LLVM 15).

# Julia 1.9.3 on Sapphire Rapids:
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Xeon(R) Platinum 8488C
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, sapphirerapids)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_PKG_USE_CLI_GIT = true

julia> Float16(BigInt(4))
Float16(0.0)
julia> Float32(BigInt(4))
4.0f0
julia> Inf16
Float16(0.0)
julia> @info Inf16 # or any Float16
ERROR: BoundsError: attempt to access 23-element Vector{UInt8} at index [24]
Stacktrace:
 [1] setindex!
   @ ./array.jl:969 [inlined]
 [2] writeshortest(buf::Vector{UInt8}, pos::Int64, x::Float16, plus::Bool, space::Bool, hash::Bool, precision::Int64, expchar::UInt8, padexp::Bool, decchar::UInt8, typed::Bool, compact::Bool)
   @ Base.Ryu ./ryu/shortest.jl:267
 [3] string(x::Float16)
   @ Base.Ryu ./ryu/Ryu.jl:123
 [4] handle_message(logger::Logging.ConsoleLogger, level::Base.CoreLogging.LogLevel, message::Any, _module::Any, group::Any, id::Any, filepath::Any, line::Any; kwargs::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}})
   @ Logging ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/Logging/src/ConsoleLogger.jl:119
 [5] handle_message(logger::Logging.ConsoleLogger, level::Base.CoreLogging.LogLevel, message::Any, _module::Any, group::Any, id::Any, filepath::Any, line::Any)
   @ Logging ~/.julia/juliaup/julia-1.9.3+0.x64.linux.gnu/share/julia/stdlib/v1.9/Logging/src/ConsoleLogger.jl:106
 [6] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Base ./essentials.jl:819
 [7] invokelatest(::Any, ::Any, ::Vararg{Any})
   @ Base ./essentials.jl:816
 [8] top-level scope
   @ logging.jl:353

# Julia 1.9.3 on Raptor Lake:
Julia Version 1.9.3
Commit bed2cd540a (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900KF
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, goldmont)
  Threads: 1 on 32 virtual cores

julia> Float16(BigInt(4))
Float16(4.0)
julia> Float32(BigInt(4))
4.0f0
julia> Inf16
Inf16
julia> @info Inf16
[ Info: Inf
@giordano
Copy link
Contributor

giordano commented Oct 13, 2023

Much simpler reproducer on 1.9.3:

julia> Float16(Float32(4.0))
Float16(0.0)

julia> @code_llvm Float16(Float32(4.0))
;  @ float.jl:256 within `Float16`
define half @julia_Float16_138(float %0) #0 {
top:
  %1 = fptrunc float %0 to half
  ret half %1
}
julia> @code_native Float16(Float32(4.0))
      .text
        .file        "Float16"
        .globl       julia_Float16_171               # -- Begin function julia_Float16_171
        .p2align     4, 0x90
        .type        julia_Float16_171,@function
julia_Float16_171:                      # @julia_Float16_171
; ┌ @ float.jl:256 within `Float16`
        .cfi_startproc
# %bb.0:                                # %top
        pushq    %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq     %rsp, %rbp
        .cfi_def_cfa_register %rbp
        vcvtss2sh        %xmm0, %xmm0, %xmm0
        popq     %rbp
        .cfi_def_cfa %rsp, 8
        retq
.Lfunc_end0:
        .size        julia_Float16_171, .Lfunc_end0-julia_Float16_171
        .cfi_endproc
; └
                                        # -- End function
        .section     ".note.GNU-stack","",@progbits

But the LLVM IR looks legitimate.

On a not too old nightly build (f066500) this works correctly:

julia> Float16(Float32(4.0))
Float16(4.0)

julia> @code_llvm debuginfo=:none Float16(Float32(4.0))
; Function Signature: (::Type{Float16})(Float32)
define half @julia_Float16_1174(float %"x::Float32") #0 {
top:
  %0 = fptrunc float %"x::Float32" to half
  ret half %0
}
julia> @code_native debuginfo=:none Float16(Float32(4.0))
      .text
        .file        "Float16"
        .globl       julia_Float16_1306              # -- Begin function julia_Float16_1306
        .p2align     4, 0x90
        .type        julia_Float16_1306,@function
julia_Float16_1306:                     # @julia_Float16_1306
; Function Signature: (::Type{Float16})(Float32)
# %bb.0:                                # %top
        #DEBUG_VALUE: Float16:x <- $xmm0
        push     rbp
        mov      rbp, rsp
        vcvtss2sh        xmm0, xmm0, xmm0
        pop      rbp
        ret
.Lfunc_end0:
        .size        julia_Float16_1306, .Lfunc_end0-julia_Float16_1306
                                        # -- End function
        .type        ".L+Core.Float16#1308",@object  # @"+Core.Float16#1308"
        .section     .rodata,"a",@progbits
        .p2align     3
".L+Core.Float16#1308":
        .quad        ".L+Core.Float16#1308.jit"
        .size        ".L+Core.Float16#1308", 8

.set ".L+Core.Float16#1308.jit", 140549906661152
        .size        ".L+Core.Float16#1308.jit", 0
        .section     ".note.GNU-stack","",@progbits

@gbaraldi
Copy link
Member

It's an ABI issue ;)

@giordano
Copy link
Contributor

It's an ABI issue ;)

Is there concretely something that can be done to fix this on v1.9 with LLVM 14 or we have to live with the fact that series will be always broken but 1.10+ will work ok?

@BioTurboNick
Copy link
Contributor Author

Hey, look what I found in release-1.9!

Optional<bool> always_have_fp16() {
#if defined(_CPU_X86_) || defined(_CPU_X86_64_)
// x86 doesn't support fp16
// TODO: update for sapphire rapids when it comes out
return false;
#else
return {};
#endif
}

This section was revamped in Julia 1.10 to not have this always-false method based on compiler constants.

I replaced the function with the one in 1.10 and it seems to be working. At least, the simple examples here are working; CSV.jl is still crashing during precompilation.

@giordano
Copy link
Contributor

giordano commented Aug 21, 2024

https://discourse.julialang.org/t/julia-1-9-on-intel-sapphire-rapid-cpu-doesnt-work/118444 confirmed everything works in v1.10 and v1.11. Since it's unlikely there will be a new release in the v1.9 series at this point, I'm going to close this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants