AD with ForwardMode is not thread-safe #6

sethaxen · 2023-01-31T13:24:32Z

When using multiple threads, the gradient computations disagree with the gradients one gets when using a single thread. Here's an MWE:

using ForwardDiff, LinearAlgebra, LogDensityProblems, LogDensityProblemsAD

struct LogDensityFunction{F}
    logp::F
    dim::Int
end
function LogDensityProblems.capabilities(::Type{<:LogDensityFunction})
    return LogDensityProblems.LogDensityOrder{0}()
end
LogDensityProblems.dimension(ℓ::LogDensityFunction) = ℓ.dim
LogDensityProblems.logdensity(ℓ::LogDensityFunction, x) = ℓ.logp(x)

dim = 10
n = 10^6
P = LogDensityFunction(x -> -sum(abs2, x) / 2, dim)
ℓ = ADgradient(:ForwardDiff, P)
x = randn(dim, n)

results_single = zeros(dim, n)
results_threaded = zeros(dim, n)

for j in 1:n
    results_single[:, j] = LogDensityProblems.logdensity_and_gradient(ℓ, x[:, j])[2]
end

Threads.@threads for j in 1:n
    results_threaded[:, j] = LogDensityProblems.logdensity_and_gradient(ℓ, x[:, j])[2]
end

On my machine (using 4 threads), I get the following result:

julia> norm(results_single .- results_threaded)
2588.220046954836

This appears to be the same issue as JuliaDiff/ForwardDiff.jl#573, where the signature ForwardDiff.gradient(f, x, cfg), which is used here, is shown to be not thread-safe. By comparison, ReverseDiff seems to be fine.

The text was updated successfully, but these errors were encountered:

sethaxen · 2023-01-31T13:29:40Z

I suspect the issue is with a shared GradientConfig. AbstractDifferentiation creates a GradientConfig in its gradient function and I don't think suffers from this issue: https://github.com/JuliaDiff/AbstractDifferentiation.jl/blob/eb5d913b9e4cbd31465af4ee2a75acb7f69ded91/src/forwarddiff.jl#LL47C57-L47C57

devmotion · 2023-01-31T13:34:25Z

IMO that seems to be a ForwardDiff issue? At least it seems somewhat surprising that gradient evaluation mutates the gradient config but that seems what happens here: https://github.com/JuliaDiff/ForwardDiff.jl/blob/4b143a199f541e7c1248dc5bc29b397be938e81d/src/apiutils.jl#L34-L46

devmotion · 2023-01-31T13:37:44Z

Regarding AbstractDifferentiation, if I remember correctly the only reason why it creates a config at all is to set the chunk size. Otherwise they would not be needed.

sethaxen · 2023-01-31T13:56:12Z

At least it seems somewhat surprising that gradient evaluation mutates the gradient config but that seems what happens here: https://github.com/JuliaDiff/ForwardDiff.jl/blob/4b143a199f541e7c1248dc5bc29b397be938e81d/src/apiutils.jl#L34-L46

This is the documented behavior. It seems the main purpose of the config is to allocate work buffers: https://juliadiff.org/ForwardDiff.jl/stable/user/api/#Preallocating/Configuring-Work-Buffers. See also this comment: JuliaDiff/ForwardDiff.jl#573 (comment)

devmotion · 2023-01-31T14:03:47Z

I see. Then probably we should add a note in LogDensityProblemsAD as well that it is not threadsafe.

devmotion · 2023-01-31T14:08:23Z

Since it is much more efficient to preallocate a config, as stated in the ForwardDiff docs

ForwardDiff's basic API methods will allocate these types automatically by default, but you can drastically reduce memory usage if you preallocate them yourself.

I think generally one wants to create and use the config object, also within LogDensityProblemsAD.
To me it seems that it's not thread-safe seems to be a design (issue?) of ForwardDiff, so users should be aware of it; but making it inefficient for everyone - similar to the code in AbstractDifferentiation - even though here in contrast to AbstractDifferentiation we know the function AND the size of the input, seems undesirable to me.

tpapp · 2023-01-31T14:23:52Z

I can think of the following (not mutually exclusive) solutions:

warn users about this, suggesting that they create an AD'd object for each thread, but that only covers simple use cases like MCMC (however, that may just work for 99% of our users),
provide an option w/o preallocated buffers, slower but safe

Or perhaps allocate a buffer for each thread, and have the call write to it based on Threads.threadid(), but I have to think through the invariants.

devmotion · 2023-01-31T14:31:09Z

Or perhaps allocate a buffer for each thread, and have the call write to it based on Threads.threadid(), but I have to think through the invariants.

That is not guaranteed to work anymore in newer Julia versions, generally it is not safe to use Threads.threadid() in such cases. See e.g. https://juliafolds.github.io/FLoops.jl/stable/explanation/faq/#faq-state-threadid and the discussion in https://discourse.julialang.org/t/behavior-of-threads-threads-for-loop/76042?u=devmotion.

devmotion · 2023-01-31T14:34:00Z

provide an option w/o preallocated buffers, slower but safe

Maybe the easiest option here would be to support passing gradientconfig = nothing in ADgradient, and handling it by constructing a new config object (to support setting chunk size nevertheless) when calling logdensity_and_gradient.

tpapp · 2023-02-12T08:24:17Z

I would suggest the following solution:

make the default threadsafe, if potentially inefficient, for all backends and document this,
offer the faster solution in ADgradient for the relevant backends, explicitly mentioning that it is not thread-safe,
implement a method for Base.copy that allows independently usable copies where applicable (eg gradients created with the ForwardDiff backend), and is otherwise a no-op.

This would allow users to take care of threading as they wish --- most MCMC applications would just create as many copies as they need, and sample in threads.

Incidentally, I do not understand how ReverseDiff is not affected. It uses the same kind of buffers.

sethaxen mentioned this issue Jan 31, 2023

Accept LogDensityProblems objects mlcolab/Pathfinder.jl#122

Merged

tpapp added a commit that referenced this issue Feb 13, 2023

Fixes #6.

9883355

tpapp mentioned this issue Feb 13, 2023

Don't preallocate GradientConfig in ForwardDiff backend by default #8

Merged

tpapp closed this as completed in #8 Mar 8, 2023

FelixNoessler mentioned this issue Feb 16, 2024

A tutorial on how to use threads with preallocation would be good for the documentation #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD with ForwardMode is not thread-safe #6

AD with ForwardMode is not thread-safe #6

sethaxen commented Jan 31, 2023

sethaxen commented Jan 31, 2023

devmotion commented Jan 31, 2023

devmotion commented Jan 31, 2023

sethaxen commented Jan 31, 2023

devmotion commented Jan 31, 2023

devmotion commented Jan 31, 2023

tpapp commented Jan 31, 2023

devmotion commented Jan 31, 2023 •

edited

Loading

devmotion commented Jan 31, 2023

tpapp commented Feb 12, 2023

AD with ForwardMode is not thread-safe #6

AD with ForwardMode is not thread-safe #6

Comments

sethaxen commented Jan 31, 2023

sethaxen commented Jan 31, 2023

devmotion commented Jan 31, 2023

devmotion commented Jan 31, 2023

sethaxen commented Jan 31, 2023

devmotion commented Jan 31, 2023

devmotion commented Jan 31, 2023

tpapp commented Jan 31, 2023

devmotion commented Jan 31, 2023 • edited Loading

devmotion commented Jan 31, 2023

tpapp commented Feb 12, 2023

devmotion commented Jan 31, 2023 •

edited

Loading