Finally a number type that you can count with your fingers. Super Mario and Zelda would be proud.
Comes in two flavours: Float8
has 3 exponent bits and 4 fraction bits, Float8_4
has 4 exponent bits and 3 fraction bits.
Both rely on conversion to Float32 to perform any arithmetic operation, similar to Float16
.
julia> using Float8s
julia> a = Float8(4)
Float8(4.0)
julia> b = Float8(3.14159)
Float8(3.125)
julia> a+b
Float8(7.0)
julia> sqrt(a)
Float8(2.0)
julia> a^2
Inf8
Most arithmetic operations are implemented. If you would like to have an additional feature, raise an issue.
Float8s.jl
is not yet registered, for the time being do
(v1.3) pkg> add https://github.com/milankl/Float8s.jl
julia> using BenchmarkTools
julia> A = Float8.(randn(300,300));
julia> @btime Float32.($A);
413.303 μs (2 allocations: 351.64 KiB)
julia> 413.303/300^2*1000
4.592255555555555
Conversions from Float8 to Float32 take about 4.5ns, conversions in the other direction are about 2x slower and slightly slower than for Float16
.
julia> A = Float32.(randn(300,300));
julia> @btime Float16.($A);
674.123 μs (2 allocations: 175.89 KiB)
julia> @btime Float8.($A);
955.196 μs (2 allocations: 88.02 KiB)