-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove pure function annotation #509
Conversation
Do you have an example of the stackoverflow problem? |
It's my fault. But the error looks interesting when using using ForwardDiff: ForwardDiff, Tag
struct MyTag end
ForwardDiff.:≺(::Type{Tag{F1,V1}}, ::Type{Tag{MyTag,V2}}) where {F1, V1, V2} = true
# Uncomment the next line to fix the error
#ForwardDiff.:≺(::Type{Tag{MyTag,V2}}, ::Type{Tag{F1,V1}}) where {F1, V1, V2} = false
dual1 = ForwardDiff.GradientConfig(MyTag(), [1.0]).duals[1]
dual2 = ForwardDiff.GradientConfig(sum, [1.0]).duals[1]
promote_type(typeof(dual1), typeof(dual2)) |
I guess the assumption here is that |
So is there really a problem here? If you overload an internal function like this, well, caveat emptor. |
There is a problem that sometimes you do need to overload this internal function to specify order between custom tags. So the purity assumption is violated in a valid use case. |
Are there any objections to this? |
I will wait 2 more days then merge and release if no one objects. |
Presumably the pure was there for a reason. Did you check that no longer applies? |
The pure seems to have been introduced in #247 back in 2017 as part of a big change to tackle perturbation confusion. It's hard to guess the intentions of @simonbyrne or @jrevels back then especially given that it was a very different Julia version and compiler back then. |
So you are saying that the compiler has improved enough that the |
I don't know if the compiler has improved enough that pure is not needed because I don't know why it was needed to begin with. There was no justification provided back then. Julia 0.6 was used back then. If you have a use case that could have benefited from |
it's been a very long time since I looked at a ForwardDiff PR 😁
I wouldn't be surprised if that were added back then to avoid potential perf issues w.r.t. calling https://github.com/JuliaDiff/ForwardDiff.jl/pull/247/files#diff-aeb036d3acc3ca0b52368fdcc3dbc8bca465b98bc269a1a5b4668b1dcb2e2f7eR12 and https://github.com/JuliaDiff/ForwardDiff.jl/pull/247/files#diff-aeb036d3acc3ca0b52368fdcc3dbc8bca465b98bc269a1a5b4668b1dcb2e2f7eR21 Given the compiler advances since then, it would be great (and unsurprising) if it's no longer useful, but as @KristofferC mentioned, this is probably worth benchmarking in a nested differentiation case that hits this codepath to confirm it won't induce regressions in common cases |
I too suspect it would now work without it. |
That's not really how things work. When you go in and remove a performance annotation it is on you to put at least some level of effort to show that the annotation is not needed anymore. For example, a benchmark where the function is used or looking at the generated code with the annotation there and removed. |
I used the following code to benchmark: using ForwardDiff, BenchmarkTools
f(x, y) = ForwardDiff.derivative(y -> ForwardDiff.derivative(x -> (x + y)^2, x), y)
g(x, y) = ForwardDiff.jacobian(y -> ForwardDiff.gradient(x -> sum(x + y)^2, x), y)
h(x, y, z) = ForwardDiff.derivative(z -> ForwardDiff.derivative(y -> ForwardDiff.derivative(x -> (x + y + z)^2, x), y), z)
@btime f(2.0, 2.0);
@btime g($(ones(10)), $(ones(10)));
@btime h(1.0, 2.0, 3.0);
julia> @btime f(2.0, 2.0);
0.001 ns (0 allocations: 0 bytes)
julia> @btime g($(ones(10)), $(ones(10)));
3.896 μs (15 allocations: 16.83 KiB)
julia> @btime h(1.0, 2.0, 3.0);
1.500 ns (0 allocations: 0 bytes)
julia> @btime f(2.0, 2.0);
0.001 ns (0 allocations: 0 bytes)
julia> @btime g($(ones(10)), $(ones(10)));
3.854 μs (15 allocations: 16.83 KiB)
julia> @btime h(1.0, 2.0, 3.0);
1.500 ns (0 allocations: 0 bytes) The second number changes a fair bit every time I run it with and without pure but there seems to be no difference. The code native generated in both cases:
julia> @code_native h(1.0, 2.0, 3.0)
.section __TEXT,__text,regular,pure_instructions
; ┌ @ REPL[61]:1 within `h'
subq $120, %rsp
; │┌ @ derivative.jl:14 within `derivative'
; ││┌ @ REPL[61]:1 within `#137'
; │││┌ @ derivative.jl:14 within `derivative'
; ││││┌ @ REPL[61]:1 within `#138'
movsd %xmm1, 24(%rsp)
movabsq $4607182418800017408, %rax ## imm = 0x3FF0000000000000
movq %rax, 32(%rsp)
movsd %xmm2, 40(%rsp)
movq %rax, 48(%rsp)
; │││││┌ @ derivative.jl:14 within `derivative'
; ││││││┌ @ dual.jl:66 within `Dual' @ dual.jl:62 @ dual.jl:55 @ dual.jl:19
movsd %xmm0, 8(%rsp)
movq %rax, 16(%rsp)
; ││││││└
movabsq $"#139", %rax
leaq 56(%rsp), %rdi
leaq 24(%rsp), %rsi
leaq 8(%rsp), %rdx
callq *%rax
; │└└└└└
movsd 112(%rsp), %xmm0 ## xmm0 = mem[0],zero
addq $120, %rsp
retq
nopw %cs:(%rax,%rax)
nop
; └
julia> @code_native h(1.0, 2.0, 3.0)
.section __TEXT,__text,regular,pure_instructions
; ┌ @ REPL[61]:1 within `h'
subq $120, %rsp
; │┌ @ derivative.jl:14 within `derivative'
; ││┌ @ REPL[61]:1 within `#137'
; │││┌ @ derivative.jl:14 within `derivative'
; ││││┌ @ REPL[61]:1 within `#138'
movsd %xmm1, 24(%rsp)
movabsq $4607182418800017408, %rax ## imm = 0x3FF0000000000000
movq %rax, 32(%rsp)
movsd %xmm2, 40(%rsp)
movq %rax, 48(%rsp)
; │││││┌ @ derivative.jl:14 within `derivative'
; ││││││┌ @ dual.jl:66 within `Dual' @ dual.jl:62 @ dual.jl:55 @ dual.jl:19
movsd %xmm0, 8(%rsp)
movq %rax, 16(%rsp)
; ││││││└
movabsq $"#139", %rax
leaq 56(%rsp), %rdi
leaq 24(%rsp), %rsi
leaq 8(%rsp), %rdx
callq *%rax
; │└└└└└
movsd 112(%rsp), %xmm0 ## xmm0 = mem[0],zero
addq $120, %rsp
retq
nopw %cs:(%rax,%rax)
nop
; └ Looks identical to me. |
Can I please get an approval here to merge? |
Ok let me request the opposite! If you have any objections, please object in the next 24 hours. If there are no objections, I will merge. |
According to conventional Julia wisdom, the use of
@pure
is a reserved right to Jameson and Jameson only. According to the docs:This assumption seems to be violated here. The use of
@pure
can also lead to peculiar compiler failures when normal runtime errors such as stack overflow are encountered. This bit me recently pretty hard.