Skip to content

Floating Point Precision Issues

Julia Sloan edited this page Oct 19, 2023 · 8 revisions

We aim to use Float64 and Float32 precision in CliMA's ESM. Here is a summary of some challenges / things to be aware of:

Basics (IEEE 754 standard)

  1. Float32 (single precision): a 32-bit float can represent up to 7 decimal numbers (log10(2^24))
  • 1 sign bit, 8 exponent bits, 23 mantissa/fraction bits
  1. Float64 (double precision): precision around 16 decimal numbers (log10(2^53))
  • 1 sign bit, 11 exponent bits, 52 mantissa/fraction bits

eps() in Julia

julia> eps(zero(Float64))
5.0e-324

julia> eps(one(Float64))
2.220446049250313e-16

julia> eps(zero(Float32))
1.0f-45

julia> eps(one(Float32))
1.1920929f-7

Order of operations matters

https://github.com/CliMA/ClimaCoupler.jl/issues/271

ClimaTimeSteppers

  • (discovered using the dss! callback)
  • setting all time to Float32, but integrator.t gets converted somewhere to Float64 during step!. For now, t always needs to be stored as a Float64 because Float32 does not have enough bits to accurately track time without roundoff error.

Refs