Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Acceptable input ranges #156

Closed
kimikage opened this issue Dec 26, 2019 · 2 comments
Closed

[RFC] Acceptable input ranges #156

kimikage opened this issue Dec 26, 2019 · 2 comments

Comments

@kimikage
Copy link
Collaborator

When fixing the issue #102 (PR #131), I changed the acceptable input range for Normed as follows:

function _convert(::Type{U}, x::Tf) where {T, f, U <: Normed{T,f}, Tf <: Union{Float32, Float64}}
if T == UInt128 && f == 53
0 <= x <= Tf(3.777893186295717e22) || throw_converterror(U, x)
else
0 <= x <= Tf((typemax(T)-rawone(U))/rawone(U)+1) || throw_converterror(U, x)
end

(typemax(T)-rawone(U))/rawone(U)+1 essentially means typemax(T)/rawone(U).

The decision is based, for example, on the idea that the range of N0f8 should be [0, 1]. In particular, regarding the lower bound, I believe that the Normeds should not accept negative inputs since the current Normeds are unsigned types.

However, the following is also true:

julia> round(UInt8, 1.0019607843137253 * 255) # typemax(UInt8)/rawone(N0f8) == 1
0xff

julia> round(UInt8, -0.00196078431372549 * 255)
0x00

So, I don't know whether the acceptable input range of Q0f7 should be [-1, 127/128] , or [-128.5/128, 127.5/128).
Accepting inputs ​​less than typemin or greater than typemax causes a bit confusing, and it is nonsense especially if the input type is not based on the binary numeral system. On the other hand, it seems to be a desirable property that the difference between the upper and lower bounds is a power of two.

@kimikage
Copy link
Collaborator Author

kimikage commented Dec 27, 2019

BTW, I do not fully understand the original intent of this test. (0.498 ≈ 127.488/256, 127/256 ≈ 0.49609375)

@testset "reductions" begin
F8 = Fixed{Int8,8}
a = F8[0.498, 0.1]
acmp = Float64(a[1]) + Float64(a[2])
@test sum(a) == acmp
@test sum(a, dims=1) == [acmp]

Edit:
The test code will be changed to a = Q0f7[0.75, 0.5] by PR #159.

@kimikage
Copy link
Collaborator Author

kimikage commented Feb 2, 2020

For the time being, I decided to allow the wider range than [typemin, typemax] for Fixed types.

@kimikage kimikage closed this as completed Feb 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant