Float8s.jl

Finally a number type that you can count with your fingers. Super Mario and Zelda would be proud.

Comes in two flavours: Float8 has 3 exponent bits and 4 fraction bits, Float8_4 has 4 exponent bits and 3 fraction bits. Both rely on conversion to Float32 to perform any arithmetic operation, similar to Float16.

Example use

julia> using Float8s

julia> a = Float8(4)
Float8(4.0)

julia> b = Float8(3.14159)
Float8(3.125)

julia> a+b
Float8(7.0)

julia> sqrt(a)
Float8(2.0)

julia> a^2
Inf8

Most arithmetic operations are implemented. If you would like to have an additional feature, raise an issue.

Installation

Float8s.jl is not yet registered, for the time being do

(v1.3) pkg> add https://github.com/milankl/Float8s.jl

Benchmarking

julia> using BenchmarkTools

julia> A = Float8.(randn(300,300));

julia> @btime Float32.($A);
  413.303 μs (2 allocations: 351.64 KiB)

julia> 413.303/300^2*1000
4.592255555555555

Conversions from Float8 to Float32 take about 4.5ns, conversions in the other direction are about 2x slower and slightly slower than for Float16.

julia> A = Float32.(randn(300,300));

julia> @btime Float16.($A);
  674.123 μs (2 allocations: 175.89 KiB)

julia> @btime Float8.($A);
  955.196 μs (2 allocations: 88.02 KiB)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Float8s.jl

Example use

Installation

Benchmarking

Files

README.md

Latest commit

History

README.md

File metadata and controls

Float8s.jl

Example use

Installation

Benchmarking