Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-derivative functions on dual numbers should only return the primal #167

Closed
adrhill opened this issue Aug 13, 2024 · 11 comments · Fixed by #169
Closed

Zero-derivative functions on dual numbers should only return the primal #167

adrhill opened this issue Aug 13, 2024 · 11 comments · Fixed by #169

Comments

@adrhill
Copy link
Owner

adrhill commented Aug 13, 2024

This issue was raised in the context of round(Int, x) on Dual numbers in #164.

@adrhill
Copy link
Owner Author

adrhill commented Aug 14, 2024

If we do this, then only on functions that have zero-derivatives globally. Otherwise, it would lead to type instabilities.

@gdalle
Copy link
Collaborator

gdalle commented Aug 14, 2024

I agree with that. Now that we got rid of the connectivity tracer, we can do this for:

  • convert(::Type{<:Integer}, x)
  • round, floor, ceil
  • all the euclidean division functions
  • what else?

@adrhill
Copy link
Owner Author

adrhill commented Aug 14, 2024

I'm actually not yet fully convinced about convert(::Type{<:Integer}, x).
I haven't come up with a counter example yet, but it essentially is the identity function. And identity(x::Dual{<:Integer}) also shouldn't just return the primal value. ForwardDiff also throws an InexactError instead of simply returning the primal.

@adrhill
Copy link
Owner Author

adrhill commented Aug 15, 2024

Viewing ForwardDiff and our Duals as dual numbers, it makes sense that $x + \varepsilon y$ should not simply be cast to $x$, independent of data type. Unless $y$ is zero (an empty tracer).

@adrhill
Copy link
Owner Author

adrhill commented Aug 15, 2024

To put this in terms of the Faà di Bruno based theory: the operator convert(T, x) is generally speaking once-differentiable.

If we start to make exceptions for argument types like convert(Int, x), then you could argue that *(a::Int, b::Int) also "isn't differentiable".

At that point you could even argue that SparseConnectivityTracer.Dual{<:Integer,T} and ForwardDiff.Dual{<:Integer,P} should be prohibited in the first place.

@gdalle
Copy link
Collaborator

gdalle commented Aug 15, 2024

I don't find the last statement completely absurd. If you construct a dual from an integer, you might as well keep that integer cause derivatives won't propagate at all from that moment on

@adrhill
Copy link
Owner Author

adrhill commented Aug 15, 2024

It is absurd, since it's totally viable to seed an dual number $x = a + \varepsilon b$ with Int primal and compute derivatives of $\sqrt{x}$.
It's correct to get and want the following:

julia> jacobian_sparsity(sqrt, 1, TracerLocalSparsityDetector())
1×1 SparseArrays.SparseMatrixCSC{Bool, Int64} with 1 stored entry:
 1

But we digress. My point is that classifying differentiability of operators on a per-method basis (i.e. looking at argument types) is a huge can of worms.

@gdalle
Copy link
Collaborator

gdalle commented Aug 15, 2024

Okay I can get behind that. But then doesn't round also have methods for Float types? Why should we add in a special integer case there?

@adrhill
Copy link
Owner Author

adrhill commented Aug 15, 2024

round is always an operator with zero-derivatives, so here it makes sense to only return the primal–regardless of type.
I think round(T, ...) is only defined on Integers and Missing:

julia> methods(round, [Type, Any])
# 12 methods for generic function "round" from Base:
  [1] round(::Type{Bool}, x::AbstractFloat)
     @ float.jl:391
  [2] round(::Type{>:Missing}, ::Missing)
     @ missing.jl:144
  [3] round(::Type{T}, x::Rational{Bool}) where T>:Missing
     @ missing.jl:150
  [4] round(::Type{T}, ::Missing) where T
     @ missing.jl:145
  [5] round(::Type{T}, x) where T>:Missing
     @ missing.jl:147
  [6] round(::Type{T}, x::Rational{Bool}) where T
     @ rational.jl:500
  [7] round(::Type{T}, x::Rational{Tr}) where {T>:Missing, Tr}
     @ missing.jl:149
  [8] round(::Type{T}, x::Rational{Tr}) where {T, Tr}
     @ rational.jl:493
  [9] round(::Type{T}, x::BigFloat) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:364
 [10] round(::Type{Integer}, x::BigFloat)
     @ Base.MPFR mpfr.jl:365
 [11] round(::Type{T}, x::Integer) where T<:Integer
     @ int.jl:691
 [12] round(::Type{T}, x::AbstractFloat) where T<:Integer
     @ float.jl:385

julia> methods(round, [Type, Any, Any])
# 14 methods for generic function "round" from Base:
  [1] round(::Type{BigInt}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:353
  [2] round(::Type{BigInt}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode})
     @ Base.MPFR mpfr.jl:344
  [3] round(::Type{T}, x::BigFloat, r::RoundingMode) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:351
  [4] round(::Type{<:Integer}, x::BigFloat, r::RoundingMode)
     @ Base.MPFR mpfr.jl:355
  [5] round(::Type{T}, x::BigFloat, r::Union{Base.MPFR.MPFRRoundingMode, RoundingMode}) where T<:Union{Signed, Unsigned}
     @ Base.MPFR mpfr.jl:335
  [6] round(::Type{T}, x::AbstractFloat, r::RoundingMode) where T<:Integer
     @ floatfuncs.jl:122
  [7] round(::Type{Dates.Date}, x::Union{Dates.Day, Dates.Week, Dates.TimePeriod, Dates.TimeType}, ::Type{P}) where P<:Dates.Period
     @ Dates ~/.julia/juliaup/julia-1.10.4+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Dates/src/rounding.jl:294
  [8] round(::Type{>:Missing}, ::Missing, ::RoundingMode)
     @ missing.jl:144
  [9] round(::Type{T}, x::Rational{Bool}, r::RoundingMode) where T>:Missing
     @ missing.jl:150
 [10] round(::Type{T}, ::Missing, ::RoundingMode) where T
     @ missing.jl:145
 [11] round(::Type{T}, x, r::RoundingMode) where T>:Missing
     @ missing.jl:147
 [12] round(::Type{T}, x::Rational{Bool}, ::RoundingMode) where T
     @ rational.jl:500
 [13] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T>:Missing, Tr}
     @ missing.jl:149
 [14] round(::Type{T}, x::Rational{Tr}, r::RoundingMode) where {T, Tr}
     @ rational.jl:493

@gdalle
Copy link
Collaborator

gdalle commented Aug 15, 2024

My reasoning was that if there's a round defined for the closest Float32 for instance (apparently there isn't?) then there are two ways to interpret this:

  • piecewise constant because the closest Float32 is almost surely not equal to the current Float64
  • linear equal to the identity because if we disregard finite precision, rounding to the closest float is basically the identity

See the duality?

@adrhill
Copy link
Owner Author

adrhill commented Aug 15, 2024

Yeah, a theoretical myround(Float32, x::Float64) would be a weird edge case. But even there, my point applies: we don't have the manpower to overload operators on a per-method basis. We should therefore usually just pick the most conservative type of differentiability over the set of all methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants