Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perturbation confusion #247

Merged
merged 9 commits into from
Sep 26, 2017
Merged

Conversation

simonbyrne
Copy link
Contributor

This is an initial stab at fixing perturbation confusion. It relies on a global iterator which is incremented for each (function, signature) pair: since higher-order derivatives require earlier derivatives to be defined first, these should appear on the "outside" of any nested Dual objects.

As I note in the comments, this could cause problems when using multiple processes, e.g.

ForwardDiff.gradient(x) do x
    @parallel for i = 1:n
        ...
        ForwardDiff.gradient(y) do y
            # something involving x
        end
    end
end

@ChrisRackauckas
Copy link
Member

As I note in the comments, this could cause problems when using multiple processes, e.g.

👎 I think this may be too common of a use case. Is there an easy way to opt out for parallel usage?

@simonbyrne
Copy link
Contributor Author

Well, it's not a common use case at the moment, because it won't work!

There are a couple of options:

  1. Nothing, and just tell users not to do it.
  2. We could stick processid in the tag info, and throw an error if Duals come from different processes
  3. We could always fetch new tag ids from the master process (this is only done at compilation time, so shouldn't be too burdensome).

@ChrisRackauckas
Copy link
Member

Well, it's not a common use case at the moment, because it won't work!

Oh wait, it only is when you parallel take the gradient inside of a function you're taking a gradient of? I read it as though all uses inside of @parallel would break. My bad.

@simonbyrne
Copy link
Contributor Author

It occurs when Duals are generated by different processes. The example I gave was the most likely one I could think of

@jrevels
Copy link
Member

jrevels commented Aug 7, 2017

It's pretty crazy/cool that this works. When I tried implementing the global counter approach back in the days of #83, I got hung up on the insane metaprogramming/compile time regressions induced by the static tag selection logic; I never thought to use ForwardDiff's existing conversion methods to handle this! Good idea.

We can merge this PR if you make the following tweaks:

  1. Fix the parallel safety problem by adding the process ID to the tag. If two tags share the same count, but have different process IDs, then throw a TagMismatchError. Without this fix, this tagging system is strictly less safe than our current tagging system (since our current tagging system doesn't allow such calculations at all).

  2. As this PR is now, the application of the promotion rule is only implicitly enforced by an assumption that it will be encountered during computation/construction. If that assumption is violated, a differential operator could erroneously extract partials that it doesn't own, causing silently incorrect answers. We should thus enforce this rule explicitly by using tag-checked partial extraction. You can do this by adding the following methods:

# You'll have to move the Tag definition into dual.jl and adjust 
# the TagMismatchError code to accommodate these methods.

@inline value(::Tag, x) = value(x)
@inline value(t::Tag, d::Dual) = throw(TagMismatchError(t, d))
@inline value(::T, d::Dual{T}) where {T<:Tag} = value(d)

@inline partials(::Tag, x, i...) = partials(x, i...)
@inline partials(t::Tag, d::Dual, i...) = throw(TagMismatchError(t, d))
@inline partials(::T, d::Dual{T}, i...) where {T<:Tag} = partials(d, i...)

...and then use those methods in the API implementation, instead of the unchecked versions.

@simonbyrne
Copy link
Contributor Author

Sorry it's taken me a while to get around to this. Out of curiosity, why the for R in REAL_TYPES thing, rather than simply ::Real?

@jrevels
Copy link
Member

jrevels commented Sep 20, 2017

Out of curiosity, why the for R in REAL_TYPES thing, rather than simply ::Real?

Because otherwise, we'd introduce a large number of method ambiguities. Traditionally, this multiple-dispatch-centric problem is solved by promoting to single dispatch, but this promotion is too costly for the general case (which is one of the motivations behind Cassette, where the problem is solved via contextual dispatch).

This is an initial stab at fixing perturbation confusion. It relies on a global iterator which is incremented for each (function, signature) pair: since higher-order derivatives require earlier derivatives to be defined first, these should appear on the "outside" of any nested `Dual` objects.

As I note in the comments, this could cause problems when using multiple processes, e.g.

```
ForwardDiff.gradient(x) do x
    @parallel for i = 1:n
        ...
        ForwardDiff.gradient(y) do y
            # something involving x
        end
    end
end
```
Tags are process-local (will throw an error if you attempt to mix tags from different processes). Also changed how tags are generated (function is no longer part of the signature).
test/DualTest.jl Outdated
# @test Dual{1}(FDNUM) / FDNUM2 === Dual{1}(FDNUM / FDNUM2)
# @test FDNUM / Dual{1}(FDNUM2) === Dual{1}(FDNUM / FDNUM2)
# @test Dual{1}(FDNUM / PRIMAL, FDNUM2 / PRIMAL) === Dual{1}(FDNUM, FDNUM2) / PRIMAL
# end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to disable these tests since I got rid of some of the binary methods (the dispatch is now handled by the promotion machinery).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests are pretty important to keep working - they're one of the few places where we stress nested semantics directly. Assuming this PR is correct, it should be possible to modify these tests to pass without violating their original intent.

@simonbyrne
Copy link
Contributor Author

Okay, hopefully this should now work. I removed the function type from the Tag, so had to modify some of the methods, but it does make the code a bit cleaner.

@simonbyrne
Copy link
Contributor Author

Any suggestions about the appveyor failure?

@jrevels
Copy link
Member

jrevels commented Sep 21, 2017

I removed the function type from the Tag, so had to modify some of the methods, but it does make the code a bit cleaner.

We still actually need the function type in the tag. This is to prevent people from accidentally reusing tagged <:AbstractConfig objects with the wrong functions. Following that, it seems like you have removed the explicit opt-out mechanism (where I was using Void as the tag before). After adding back in the function type, you'll need to add back in the explicit opt-out mechanism so that we can support "unsafe" <:AbstractConfig preallocation for reuse across multiple target functions.

@simonbyrne
Copy link
Contributor Author

Okay, this has been a bit of a reworking. Now the tag is parametrised by (function, eltype), and a generated function gives each tag a unique sequence number which is used for comparison.

This has the side-benefit that it should now work across processes (since each process should safely generate its own sequence).

Thoughts?

@tkoolen
Copy link

tkoolen commented Sep 25, 2017

We just ran into another perturbation confusion case (one part of JuliaRobotics/RigidBodyDynamics.jl#347) which is fixed by this PR (thanks!).

I submitted a PR against sb/confused that adds a test case distilled from the RigidBodyDynamics issue: simonbyrne#1.

@jrevels
Copy link
Member

jrevels commented Sep 25, 2017

We just ran into another perturbation confusion case

I'm not sure what the bug actually is - I only skimmed the RigidBodyDynamics issue - but that test you filed doesn't involve any perturbation confusion AFAICT. The error message ForwardDiff is giving giving shows that it's an API problem; a config was "incorrectly" constructed/applied:

The provided configuration (of type [...omitted this for brevity...]) was constructed for a function other than the current target function. 
ForwardDiff cannot safely perform differentiation in this context; see the following issue for details: https://github.com/JuliaDiff/ForwardDiff.jl/issues/83. 
You can resolve this problem by constructing and using a configuration with the appropriate target function, e.g. `ForwardDiff.GradientConfig(#4, x)`

ForwardDiff should've worked on that case without this PR. In fact, it does work if you get rid of the let block, which makes me think it's an inference problem of some sort (or possibly that ForwardDiff is incorrectly relying on inference behavior in some way).

I'm going to file a separate issue for figuring out what that bug actually is. Thanks for the test!

@jrevels
Copy link
Member

jrevels commented Sep 26, 2017

@simonbyrne Is this ready to merge? It LGTM - awesome work!

EDIT: Ah, I see, there's still the issue of how custom/opt-out tags promote...

@simonbyrne
Copy link
Contributor Author

Yeah, basically if you use opt out tags you have to specify your own promotion. Not sure if there is much we can do about that.

@jrevels
Copy link
Member

jrevels commented Sep 26, 2017

Fair enough; the easy way to see whether or not this breaks downstream code will be to merge it 😛

@jrevels jrevels merged commit 292bfdc into JuliaDiff:master Sep 26, 2017
@simonbyrne simonbyrne deleted the sb/confused branch September 26, 2017 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants