-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AD with ForwardMode is not thread-safe #6
Comments
I suspect the issue is with a shared |
IMO that seems to be a ForwardDiff issue? At least it seems somewhat surprising that gradient evaluation mutates the gradient config but that seems what happens here: https://github.com/JuliaDiff/ForwardDiff.jl/blob/4b143a199f541e7c1248dc5bc29b397be938e81d/src/apiutils.jl#L34-L46 |
Regarding AbstractDifferentiation, if I remember correctly the only reason why it creates a config at all is to set the chunk size. Otherwise they would not be needed. |
This is the documented behavior. It seems the main purpose of the config is to allocate work buffers: https://juliadiff.org/ForwardDiff.jl/stable/user/api/#Preallocating/Configuring-Work-Buffers. See also this comment: JuliaDiff/ForwardDiff.jl#573 (comment) |
I see. Then probably we should add a note in LogDensityProblemsAD as well that it is not threadsafe. |
Since it is much more efficient to preallocate a config, as stated in the ForwardDiff docs
I think generally one wants to create and use the config object, also within LogDensityProblemsAD. |
I can think of the following (not mutually exclusive) solutions:
Or perhaps allocate a buffer for each thread, and have the call write to it based on |
That is not guaranteed to work anymore in newer Julia versions, generally it is not safe to use |
Maybe the easiest option here would be to support passing |
I would suggest the following solution:
This would allow users to take care of threading as they wish --- most MCMC applications would just create as many copies as they need, and sample in threads. Incidentally, I do not understand how ReverseDiff is not affected. It uses the same kind of buffers. |
When using multiple threads, the gradient computations disagree with the gradients one gets when using a single thread. Here's an MWE:
On my machine (using 4 threads), I get the following result:
This appears to be the same issue as JuliaDiff/ForwardDiff.jl#573, where the signature
ForwardDiff.gradient(f, x, cfg)
, which is used here, is shown to be not thread-safe. By comparison, ReverseDiff seems to be fine.The text was updated successfully, but these errors were encountered: