-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rounding errors in Normed
to Float
conversions
#129
Comments
I was underestimating this issue. It is not so easy to fix this without performance regression. WIP: function (::Type{T})(x::Normed) where {T <: AbstractFloat}
# The following optimization for constant division may cause rounding errors.
# y = reinterpret(x)*(one(rawtype(x))/convert(T, rawone(x)))
# Therefore, we use a simple form here.
y = reinterpret(x) / convert(promote_type(T, floattype(x)), rawone(x))
convert(T, y) # needed for types like Float16 which promote arithmetic to Float32
end
function (::Type{T})(x::Normed{Ti,f}) where {T <: Union{Float32, Float16}, Ti <: Union{UInt8, UInt16}, f}
f == 1 && return convert(T, reinterpret(x))
r = Int64(reinterpret(x))
s = UInt64(1)<<48
k = Int64(s ÷ rawone(Normed{Ti,f})) # since `rawone()>=3>2`, `r*k` is always less than 2^63.
y = Float32(r * k) * (1.0f0 / s)
convert(T, y)
end The length of the mantissa part of julia> Float32(0x01010101) == 0x1.010100p24 # I would like to round half up (`0x1.010102p24`), though.
true Fortunately, all x86_64 CPUs have Of course, only for |
Normed
to Float32
conversionsNormed
to Float
conversions
Although the conversions from integer julia> Float64(7.4N4f4)
7.3999999999999995
julia> Float64(reinterpret(7.4N4f4) / 15)
7.4 The results of the following method agree with the ideal values, but I think the following is not so good. (see Edit2) function Base.Float64(x::Normed{Ti,f}) where {Ti <: Union{UInt8, UInt16}, f}
f == 1 && return Float64(reinterpret(x))
r = Int64(reinterpret(x))
s1 = UInt64(1)<<((53 - 16 + f - 1)÷f * f)
s2 = UInt64(1)<<(((53 - 16)÷f + 1) * f)
k = Int64(s1 ÷ rawone(Normed{Ti,f}))
f64 = Float64(r * k)
f64 * (1.0 / s1) + f64 * (1.0 / s1 / s2)
end Edit: function Base.Float64(x::Normed{UInt8,f}) where f
f == 1 && return Float64(reinterpret(x))
r = Int64(reinterpret(x))
s = UInt64(1)<<(64 - 8 - 2 + f)
k = Int64(s ÷ rawone(Normed{UInt8,f}))
i64 = r * k
i32 = unsafe_trunc(UInt32, i64) >> (f*2)
Float64((i64>>12<<12) | i32) * (1.0 / s)
end Edit2: function Base.Float64(x::Normed{Ti,f}) where {Ti <: Union{UInt8, UInt16}, f}
f == 1 && return Float64(reinterpret(x))
r = Int64(reinterpret(x))
s = UInt64(1)<<(((53 - 16)÷f + 1) * f)
k = Int64(s ÷ rawone(Normed{Ti,f}))
Float64(r * k) * (1.0 / s + 1.0 / s / s)
end In the case of |
In the case of function Base.Float32(x::Normed{UInt8,2})
r = Int32(reinterpret(x))
s = 0b0100_0000_0000_0000
k = Int32(s ÷ rawone(Normed{UInt8,2}))
Float32(r * k) * (1.0f0 / s + 1.0f0 / s / s)
end However, on the x86_64 CPU with AVX/AVX2 extensions, the
Edit: Edit2: function (::Type{T})(x::Normed{UInt8,f}) where {T <: Union{Float32, Float16}, f}
f == 1 && return T(reinterpret(x))
if f == 8
k = Int32(0xF0F1) # 0xF0F1 * 0xFF < 2^24
return T(Float32(Int32(reinterpret(x)) * k) * Float32(0x1.1111p-24))
end
s = UInt32(1)<<((f * 3÷2 + 11)÷f * f)
k = Int32(s ÷ rawone(Normed{UInt8,f}))
T(Float32(Int32(reinterpret(x)) * k) * (1.0f0 / s + 1.0f0 / s / s))
end |
Implementation statusFYI, the results on Julia v1.3.0-rc4 seem to be the same as on Julia v1.2.0. Julia v1.2.0 x86_64
Edit: ToleranceThe rounding mode is assumed to be
"1 lsb" means that |
Do you have any ideas to simplify the following method, and to calculate it fast and perfectly, where function Base.Float64(x::Normed{UInt32,f}) where f
f == 1 && return Float64(reinterpret(x))
r = Int64(reinterpret(x))
f == 2 && return Float64(r * 0x040001) * 0x15555000015555p-72
f == 3 && return Float64(r * 0x108421) * 0x11b6db76924929p-75
f == 4 && return Float64(r * 0x010101) * 0x11000011000011p-72
f == 5 && return Float64(r * 0x108421) * 0x04000002000001p-75
f == 6 && return Float64(r * 0x09dfb1) * 0x1a56b8e38e6d91p-78
f == 7 && return Float64(r * 0x000899) * 0x0f01480001e029p-70
f == 8 && return Float64(r * 0x0a5a5b) * 0x18d300000018d3p-80
f == 9 && return Float64(r * 0x001001) * 0x080381c8e3f201p-72
f == 10 && return Float64(r * 0x100001) * 0x04010000000401p-80
f == 11 && return Float64(r * 0x000009) * 0x0e3aaae3955639p-66
f == 12 && return Float64(r * 0x0a8055) * 0x186246e46e4cfdp-84
f == 13 && return Float64(r * 0x002001) * 0x10000004000001p-78
f == 14 && return Float64(r * 0x03400d) * 0x13b13b14ec4ec5p-84
f == 15 && return Float64(r * 0x000259) * 0x06d0c5a4f3a5e9p-75
f == 16 && return Float64(r * 0x011111) * 0x00f000ff00fff1p-80
f == 18 && return Float64(r * 0x0b06d1) * 0x17377445dd1231p-90
f == 19 && return Float64(r * 0x080001) * 0x00004000000001p-76
f == 20 && return Float64(r * 0x000101) * 0x0ff010ef10ff01p-80
f == 21 && return Float64(r * 0x004001) * 0x01fff8101fc001p-84
f == 22 && return Float64(r * 0x002945) * 0x18d0000000018dp-88
f == 23 && return Float64(r * 0x044819) * 0x07794a23729429p-92
f == 27 && return Float64(r * 0x000a21) * 0x0006518c7df9e1p-81
f == 28 && return Float64(r * 0x00000d) * 0x13b13b14ec4ec5p-84
f == 30 && return Float64(r * 0x001041) * 0x00fc003f03ffc1p-90
f == 32 && return Float64(r * 0x010101) * 0x00ff0000ffff01p-96
f == 17 && return Float64(r) * 0x8000400020001p-68 # imperfect
f == 24 && return Float64(r) * 0x1000001000001p-72 # imperfect
f == 25 && return Float64(r) * 0x4000002000001p-75 # imperfect
f == 26 && return Float64(r) * 0x10000004000001p-78 # imperfect
f == 29 && return Float64(r) * 0x20000001p-58 # imperfect
f == 31 && return Float64(r) * 0x80000001p-62 # imperfect
end Edit: using Primes
b = ones(Int128, 32)
# _____________________________________________________--------------------
b[ 2] = 0b10101010101010101010101010101010101010101010101010101010101010101010101
b[ 3] = 0b1001001001001001001001001001001001001001001001001001001001001001001001001
b[ 4] = 0b100010001000100010001000100010001000100010001000100010001000100010001
b[ 5] = 0b10000100001000010000100001000010000100001000010000100001000010000100001
b[ 6] = 0b1000001000001000001000001000001000001000001000001000001000001000001000001
b[ 7] = 0b1000000100000010000001000000100000010000001000000100000010000001
b[ 8] = 0b1000000010000000100000001000000010000000100000001000000010000000100000001
b[ 9] = 0b1000000001000000001000000001000000001000000001000000001000000001
b[10] = 0b10000000001000000000100000000010000000001000000000100000000010000000001
b[11] = 0b10000000000100000000001000000000010000000000100000000001
b[12] = 0b1000000000001000000000001000000000001000000000001000000000001000000000001
b[13] = 0b100000000000010000000000001000000000000100000000000010000000000001
b[14] = 0b10000000000000100000000000001000000000000010000000000000100000000000001
b[15] = 0b1000000000000001000000000000001000000000000001000000000000001
b[16] = 0b10000000000000001000000000000000100000000000000010000000000000001
# _____________________________________________________--------------------
b[17] = 0b1000000000000000010000000000000000100000000000000001 #imperfect
b[18] = 0b1000000000000000001000000000000000001000000000000000001000000000000000001
b[19] = 0b1000000000000000000100000000000000000010000000000000000001
b[20] = 0b1000000000000000000010000000000000000000100000000000000000001
b[21] = 0b1000000000000000000001000000000000000000001000000000000000000001
b[22] = 0b1000000000000000000000100000000000000000000010000000000000000000001
b[23] = 0b1000000000000000000000010000000000000000000000100000000000000000000001
b[24] = 0b1000000000000000000000001000000000000000000000001 # imperfect
b[25] = 0b100000000000000000000000010000000000000000000000001 # imperfect
b[26] = 0b10000000000000000000000000100000000000000000000000001 # imperfect
b[27] = 0b1000000000000000000000000001000000000000000000000000001
b[28] = 0b100000000000000000000000000010000000000000000000000000001
b[29] = 0b100000000000000000000000000001 # imperfect
b[30] = 0b1000000000000000000000000000001000000000000000000000000000001
b[31] = 0b10000000000000000000000000000001 # imperfect
b[32] = 0b10000000000000000000000000000000100000000000000000000000000000001
factors = [factor(Vector,b[i]) for i=1:32];
@show f = factors[31] # select
k1 = []
k2 = []
for i=1:2^length(f)
ka = UInt64(1)
kb = UInt64(1)
for k = 0:length(f)-1
if (i >> k) & 1 != 0
ka *= f[k+1]
else
kb *= f[k+1]
end
end
ka * 0xFFFFFFFF >= 2^53 && continue
kb >= 2^53 && continue
if !(ka in k1)
push!(k1, ka)
push!(k2, kb)
end
end
sort!(k1)
reverse!(sort!(k2))
for i =1:length(k1)
bc = length(replace(bitstring(k1[i]),"0"=>""))
#bc > 3 && continue
println(bc, ", ", string(k1[i], base=16),", ", string(k2[i], base=16))
end Edit2: Base.Float64(x::Normed{UInt32, 2}) = (Float64(reinterpret(x)) * 0x040001) * 0x15555000015555p-72
Base.Float64(x::Normed{UInt32, 3}) = (Float64(reinterpret(x)) * 0x108421) * 0x11b6db76924929p-75
Base.Float64(x::Normed{UInt32, 4}) = (Float64(reinterpret(x)) * 0x010101) * 0x11000011000011p-72
... FYI, to convert Edit3: function u32_to_f64(x::UInt32)
i32 = unsafe_trunc(Int32, x)
# Float64(i32) + (i32 < 0 ? 0x1p32 : 0.0)
Float64(i32) + reinterpret(Float64, Int64(i32>>31) & reinterpret(UInt64, 0x1p32))
end |
BenchmarkScriptThis script enables us to get a glimpse of problems of #125. The core conversion codes should be almost OS-independent, but the broadcast or memory allocation is OS-dependent. In particular, there seems to be a problem with the broadcast on Windows. using BenchmarkTools
using FixedPointNumbers
struct Vec3{T <: Real}
x::T; y::T; z::T
end
struct Vec4{T <: Real}
x::T; y::T; z::T; w::T
end
Vec3{T}(v::Vec3{T}) where {T} = v
Vec3{T}(v::Vec3{U}) where {T, U} = Vec3{T}(v.x, v.y, v.z)
Vec4{T}(v::Vec4{T}) where {T} = v
Vec4{T}(v::Vec4{U}) where {T, U} = Vec4{T}(v.x, v.y, v.z, v.w)
Base.rand(::Type{Vec3{T}}) where {T} = Vec3{T}(rand(T), rand(T), rand(T))
Base.rand(::Type{Vec4{T}}) where {T} = Vec4{T}(rand(T), rand(T), rand(T), rand(T))
function Base.rand(::Type{T}, sz::Dims) where {T <: Union{Vec3, Vec4}}
A = Array{T}(undef, sz)
for i in eachindex(A); A[i] = rand(T); end
return A
end
const N0f128 = Normed{UInt128,128}
const N125f3 = Normed{UInt128,3}
Ts = (N0f8, N5f3, N0f16, N13f3, N0f32, N8f24, N29f3, N0f64, N61f3, N0f128, N125f3)
# The impact of the difference in random numbers should be less than that of the CPU throttling or cache.
vec1s = [rand(T, 64 * 64 * 4) for T in Ts];
mat3s = [rand(Vec3{T}, 64, 64) for T in Ts];
mat4s = [rand(Vec4{T}, 64, 64) for T in Ts];
# vec1s = [[rand(T) for i = 1:64 * 64 * 4] for T in Ts]; # workaround for #125
for vec in vec1s
println(eltype(vec), "-> Float32")
@btime Float32.(view($vec,:))
end
for vec in vec1s
println(eltype(vec), "-> Float64")
@btime Float64.(view($vec,:))
end
for mat in mat3s
println(eltype(mat), "-> Float32")
@btime Vec3{Float32}.(view($mat,:,:))
end
for mat in mat3s
println(eltype(mat), "-> Float64")
@btime Vec3{Float64}.(view($mat,:,:))
end
for mat in mat4s
println(eltype(mat), "-> Float32")
@btime Vec4{Float32}.(view($mat,:,:))
end
for mat in mat4s
println(eltype(mat), "-> Float64")
@btime Vec4{Float64}.(view($mat,:,:))
end The values in the tables below follow the display of Julia v1.2.0 x86_64-w64-mingw32julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Vector (unit: μs)
The current Matrix of Vec3 (unit: μs)
The current Matrix of Vec4 (unit: μs)
Julia v1.2.0 x86_64-pc-linux-gnu (Debian on WSL)julia> versioninfo()
Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Vector (unit: μs)
In cases of Matrix of Vec3 (unit: μs)
Matrix of Vec4 (unit: μs)
Linux 32-bit (ARMv7)see #129 (comment) Windows 32-bit (i686) (WoW64)see #129 (comment) |
Although the conversion from So, I'll try to avoid Edit: function (::Type{T})(x::Normed{UInt16,f}) where {T <: Union{Float32, Float16}, f}
f == 1 && return T(reinterpret(x))
f32 = Float32(Int32(reinterpret(x)))
f == 2 && return T(f32 * Float32(0x55p-8) + f32 * Float32(0x555555p-32))
f == 3 && return T(f32 * Float32(0x49p-9) + f32 * Float32(0x249249p-33))
f == 4 && return T(f32 * Float32(0x11p-8) + f32 * Float32(0x111111p-32))
f == 5 && return T(f32 * Float32(0x21p-10) + f32 * Float32(0x108421p-35))
f == 6 && return T(f32 * Float32(0x41p-12) + f32 * Float32(0x041041p-36))
f == 7 && return T(f32 * Float32(0x81p-14) + f32 * Float32(0x204081p-42))
f == 16 && return T(f32 * Float32(0x1p-16) + f32 * Float32(0x010001p-48))
s = UInt64(1)<<(64 - 16)
k = Int64(s ÷ rawone(Normed{UInt16,f})) # since `rawone()>=3>2`, `x.i*k` is always less than 2^63.
T(Float32(Int64(reinterpret(x)) * k) * (1.0f0 / s))
end |
Wow, I've been focused on other tasks and missed all of this. Truly awesome analysis, @kimikage!
I may own one of those older machines myself, but I think we should target the future more than the present.
Not off the top of my head. Your lookup table idea is genius. I'm sure you know that the method will be statically optimized. What if we use a slow generic method for the troublesome cases and adopt your magic-number approach for the rest? At least for the common cases this should be a huge improvement. This actually seems worth writing up as a paper. Have you considered that? I am a fine one to talk because I never get around to doing that for any of my Julia work, though I am trying to change that now. |
Thank you for your favorable feedback. My concern was (is) that this is out of balance between the benefit and the amount of codes. (I see this as a puzzle game, so I think this is worth the time I spent.)
Yes, I know that, and I just learned that Julia may give up the optimization due to just a little things. Although the compiling time and debugging of the C++ templates really disgust me, I'd like something like the
Yes, that is the plan. I'm going to finish this, but I will do my best for
When I started this, I searched GitHub for some distinctive magic numbers, but I could not find a similar technique. (Well, the search engine of GitHub is somewhat poor, though.) |
You could always write those methods as i64(x) = Int64(reinterpret(x))
Base.Float64(x::Normed{UInt32,1}) = Float64(reinterpret(x))
Base.Float64(x::Normed{UInt32,2}) = Float64(i64(x) * 0x040001) * 0x15555000015555p-72 and then there's no risk they won't be optimized. Not really any longer, either. |
Well, the optimization for eliminating dead branches is quite excellent. I have no complaints or inconvenience about it. My complaint is more about inlining and compile-time calculation of constants. This is not a place to say a request, but I hope |
Inlining is definitely a tricky & sensitive optimization. If you haven't found it, there's a bit of documentation about the internals: https://docs.julialang.org/en/latest/devdocs/inference/#The-inlining-algorithm-(inline_worthy)-1. |
I'm going to add a simple macro macro f32(x::Float64) # for hexadecimal floating-point literals
:(Float32($x))
end I think it makes the macro more convenient to accept expressions as well as literals, but I'm not going to write a compiler.:smile:
For i32 = Int32(reinterpret(x))
f == 13 && return f32 * @f32(0x01p-13) + Float32(i32 * 0x07) * @f32(0x924db7p-52)
f == 14 && return f32 * @f32(0x01p-14) + Float32(i32 * 0x15) * @f32(0xc30f3dp-56)
f == 15 && return f32 * @f32(0x01p-15) + Float32(i32 * 0x49) * @f32(0xe071f9p-60) Edit: |
Benchmark 2Julia v1.0.3 arm-linux-gnueabihf (Raspbian Buster on RPi2 v1.2)julia> versioninfo()
Julia Version 1.0.3
Platform Info:
OS: Linux (arm-linux-gnueabihf)
CPU: ARMv7 Processor rev 4 (v7l)
WORD_SIZE: 32
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, cortex-a53) The NEON instruction set has native support for both Matrix of Vec4 (unit: μs)
Julia 1.0.x doesn't seem to apply the constant propagation before the inlining. |
Wow, those are all over the map. This is mostly due to the architecture or the Julia version? |
I think the inlining problem is mostly due to the Julia version. However, I don't know which commit changed the behavior. (JuliaLang/julia#24362 has been merged also into Julia v1.0.x branch.) |
Note to Self: Workarounds for the inlining problem
|
Where Do We Come From? What Are We? Where Are We Going? 😵 macro exp2(n)
:(_exp2(Val($(esc(n)))))
end
@generated _exp2(::Val{N}) where {N} = (x = exp2(N); :($x))
# for Julia v1.0, which does not fold `div_float` before inlining
@generated inv_rawone(::T) where {T <: Normed} = (x = 1.0 / rawone(T); :($x)) Edit: see: #138 (comment) |
Benchmark 3Julia v1.0.5 i686-w64-mingw32 (WoW64)julia> versioninfo()
Julia Version 1.0.5
Commit 3af96bcefc (2019-09-09 19:06 UTC)
Platform Info:
OS: Windows (i686-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 32
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, skylake) Vector (unit: μs)
Matrix of Vec4 (unit: μs)
|
BTW, I think there is a problem in interoperability with specialization using integer literal parameters. julia> Normed{UInt32,Int64(1)} === Normed{UInt32,Int32(1)}
false
julia> Normed{UInt32,Int64(1)} == Normed{UInt32,Int32(1)}
false I don't know how serializers (e.g. JLD2) do (de-)serialization. |
Well, you might wonder, "What about the FixedPointNumbers.jl/src/fixed.jl Lines 63 to 64 in ee5bd54
I could not believe my eyes. It far exceeds my understanding, so I'm not going to touch it.:see_no_evil: |
Definitely not beyond your understanding, and indeed the Fixed cases are easier: they are scaled by 2^f rather than 2^f-1, so conversion operations are just bit-shifts. Here are the pieces:
|
I apologize that I probably misled you. I mean..., I don't know the reason for not doing the following specialization. Base.Float32(x::Fixed{T,f}) where {T <: Union{Int8, Int16}, f} = Float32(x.i) * (1.0f0 / (1<<f))
Base.Float64(x::Fixed{T,f}) where {T <: Union{Int8, Int16, Int32}, f} = Float64(x.i) * (1.0 / (Int64(1)<<f)) Oops! 🙊 |
I also wondered if it was more complicated than it needed to be. Assuming it works, seems worth changing. I've not really paid much attention (as you can tell) to |
FYI, the problem with rounding error also occurs in julia> 5.0N32f32 == 5.0
false
julia> Float64(5.0N32f32) == 5.0
true
julia> promote_type(N32f32, Float64)
BigFloat
julia> BigFloat(5.0N32f32)
5.000000000000000000000000000000000000000000000000000000000000000000000000000069 |
While I was reviewing the PR #123, I found a minor bug.
I think this is a problem with the division optimization.
FixedPointNumbers.jl/src/normed.jl
Lines 75 to 78 in da39318
The rounding errors can occur everywhere, but especially the errors at integer values are critical.
The text was updated successfully, but these errors were encountered: