-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure pullback of exp works for immutable arrays #381
Conversation
yes! thanks! |
src/rulesets/LinearAlgebra/matfun.jl
Outdated
∂A = _matfun_frechet_adjoint!(exp, ΔX, A, X, intermediates) | ||
# Ensures ∂X is mutable. The outer `adjoint` is unwrapped without copy by | ||
# the default _matfun_frechet_adjoint! | ||
∂X = Matrix(ΔX')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly copy
promises to return something mutable and is probably cleaner than this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess copy
would always allocate, where a type conversion won't? or should there be some dispatch using a mutable trait ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On checking, I was apparently wrong about copy
.
It doesn't nesc return something mutable.
I was confused by the fact that it makes some view-like things (including Adjoint
and SubArray
, but not including Diagonal
) into an arrays.
convert
vs Constructors is a thing though.
https://docs.julialang.org/en/v1/manual/conversion-and-promotion/#Mutable-collections
convert(T, x)
is expected to return the originalx
ifx
is already of typeT
. In contrast, ifT
is a mutable collection type thenT(x)
should always make a new collection (copying elements fromx
).
So perhaps this should be a convert
if we want to avoid allocating unnesc?
ChainRulesCore does actually have a trait that might be suitable for this. If we really wanted.
It's part of the mechanics for doing inplace gradient accumulation
is_inplaceable_destination(x) -> Bool
Returns true if
x
is suitable for for storing inplace accumulation of gradients.
For arrays this boils downx .= y
if will work to mutatex
, ify
is an appropriate differential.
Wrapper array types do not need to overload this if they overloadBase.parent
, and areis_inplaceable_destination
if and only if their parent array is.
Other types should overload this, as it defaults tofalse
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up going with a mixture of these both: Using the trait to decide if we should do anything and then using convert
, which in this case will always allocate a copy. (convert(Matrix, X')
will allocate unless X
is an Adjoint{T,Matrix{T}}
, but since that type is inplaceable, the trait will bypass the convert
anyways).
Codecov Report
@@ Coverage Diff @@
## master #381 +/- ##
===========================================
- Coverage 97.72% 87.29% -10.43%
===========================================
Files 19 19
Lines 1495 1244 -251
===========================================
- Hits 1461 1086 -375
- Misses 34 158 +124
Continue to review full report at Codecov.
|
@Roger-luo can you confirm that the latest version still works for you? |
∂A = _matfun_frechet_adjoint!(exp, ΔX, A, X, intermediates) | ||
# Ensures ∂X is mutable. The outer `adjoint` is unwrapped without copy by | ||
# the default _matfun_frechet_adjoint! | ||
∂X = ChainRulesCore.is_inplaceable_destination(ΔX) ? ΔX : convert(Matrix, ΔX')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just:
∂X = ChainRulesCore.is_inplaceable_destination(ΔX) ? ΔX : convert(Matrix, ΔX')' | |
∂X = ChainRulesCore.is_inplaceable_destination(ΔX) ? ΔX : convert(Matrix, ΔX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if ∂X'
is an Adjoint
, then _matfun_frechet_adjoint!
will copy
it to make it non-Adjoint
. This way, we do only one allocation instead of 2. An alternative is to bypass _matfun_frechet_adjoint!
to call _matfun_frechet!
directly, but to me this seems cleaner. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if it is an Adjoint
then isn't it going to be mutable?
Or not becuase it might be an Adjoint{FillArray}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this makes sense, leave a comment to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest a change that would make this comment clearer?
ChainRules.jl/src/rulesets/LinearAlgebra/matfun.jl
Lines 132 to 133 in 608f3af
# Ensures ∂X is mutable. The outer `adjoint` is unwrapped without copy by | |
# the default _matfun_frechet_adjoint! |
Actually I and Roger discussed a lot about whether it is an issue of |
I'm not sure I understand the question. This will fix for the case where a user passes an immutable array to the pullback for |
Does this mutate what's received by the pullback? Is that the cause of this:
|
Fixes #380 by ensuring that the cotangent used by the pullback of
exp
is mutable.@Roger-luo can you confirm that this fixes the bug?