Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_vec doesn't preserve equality #132

Open
willtebbutt opened this issue Jan 3, 2021 · 7 comments
Open

to_vec doesn't preserve equality #132

willtebbutt opened this issue Jan 3, 2021 · 7 comments

Comments

@willtebbutt
Copy link
Member

willtebbutt commented Jan 3, 2021

Firstly, I'm very fond of to_vec -- it's served us well over the last couple of years. We know it has some shortcomings though, that we will need to address at some point.

Presently we have implementations of to_vec like this instead of implementations that look like this.

IIRC the reason for this is that if you define things in the latter fashion then you'll find if you do something like

A = randn(5, 4)
A_vec, _ = to_vec(A)
At_vec, _ = to_vec(transpose(A))

then A_vec == At_vec. My suspicion is that at some point in time, we were trying to compare the equality of two things based on their to_vec representation and it came up that things weren't being compared properly. We do a similar thing with Diagonal etc.

Unfortunately, it seems quite straightforward to construct examples which don't admit such a straightforward resolution as we employed for Adjoint, Transpose, and Diagonal. For a trivial example, consider that

A = randn(5)
A == (A, ) # is false -- a `Tuple` is never equal to a `Vector`

However, since to_vec(::Tuple) simply concatenates to_vec of each of the fields, we'll find that

to_vec(A)[1] == to_vec((A, ))[1]

This leads me to believe that it's likely not possible to define to_vec such that equality is preserved.

As such, I propose that we

  1. embrace, and explicitly document, that knowing whether to_vec(x)[1] == to_vec(y)[1] in general tells you nothing about whether x == y, and vice versa. We should point this out to explicitly warn people away from trying to check whether two things are equal based on their to_vec representation.
  2. refactor some of the things (like Tranpose and Adjoint) to use the struct representation, which is more computationally efficient.
  3. to_vec should purely be thought of as a way to make FiniteDifferences' life easier. We might even consider ceasing to export it.

Now that @oxinabox has added some new approximate equality checking to ChainRulesTestUtils, this ought to be less of a problem anyway, because it's now more straightforward to check whether things are approximately equal.

Of course, if anyone can see a different path forward, I'm all ears.

@willtebbutt willtebbutt changed the title Some ambiguity in to_vec to_vec doesn't preserve equality Jan 3, 2021
@willtebbutt
Copy link
Member Author

willtebbutt commented Jan 3, 2021

I also suspect that this change would make it much safer to rely on recursive definitions of to_vec. For example, we ought to be able to define a single (maybe @generated) fallback version of to_vec for struct-types.

@oxinabox
Copy link
Member

oxinabox commented Jan 3, 2021

Would be solved if we just entirely stopped using to_vec by doing #97
but til then your plan seems good.

Lets keep exporting it though, it is part of the public API, in that users are expected to extend it.

@willtebbutt
Copy link
Member Author

The other thing to say here is that the present design means that to_vec of a structured array can be of a different dimensionality from the tangent generated by rand_tangent which is, by design, an appropriate tangent in the ChainRulesCore-sense.

In particular, to_vec(Diagonal(randn(3)))[1] will have length 9, while to_vec(rand_tangent(Diagonal(3)))[1] will have length 3. The causes problems for our implementation of jvp, (presumably) among other things.

Again, this isn't to say that what we're presently doing in to_vec is necessarily wrong, just that it's inconsistent with what we're doing in other places.

@willtebbutt
Copy link
Member Author

willtebbutt commented Jan 7, 2021

Also, regarding the above plan, @sethaxen what are your thoughts on it in light of our adjoint-adventures the other day? I'm nervous that if I start changing things, things will start breaking in strange ways.

@sethaxen
Copy link
Member

Also, regarding the above plan, @sethaxen what are your thoughts on it in light of our adjoint-adventures the other day?

I agree in general with your comments. to_vec's current implementation worries me after the adventure in #121.

I need to formalize my thinking here, but I think what we have been doing so far is using to_vec to embed in some real space, and then (co)tangent vectors are elements of a (co)tangent subspace (e.g. mapping to points on a sphere with tangent vectors in a plane). For primals, I think we should prefer to_vec implementations that are charts, that is, maps to a real Euclidean space whose dimension is equivalent to the dimension of the point. So to_vec(::Diagonal) should extract the diagonal, and to_vec(::Hermitian) should extract a strict triangle and the real part of the diagonal. It might require more overloads, but this will at least ensure the right thing is being done, and the decreased dimensionality should make it faster. And of course, it can be recursive.

We need a different function to handle (co)tangents. Because we randomize the (co)tangents, the map here doesn't really matter all that much (though maybe it should). The important requirements are 1) it produces a real vector with the same dimension as the primal and 2) all valid equivalent representations for the same (co)tangent vector should map to the same real vector. So e.g. to_vec(primal::Adjoint, vector::Adjoint), to_vec(primal::Adjoint, vector::Composite{Adjoint}), to_vec(primal::Adjoint, vector::Matrix) all should produce the same output. I think this latter requirement is the harder one to fulfill, for all the same reasons that it's hard to support arbitrary cotangent vectors in pullbacks. That's why it's been easier so far to embed everything in the Matrixes through calling Matrix and then vectorize.

I'm nervous that if I start changing things, things will start breaking in strange ways.

We could always set up a ChainRules integration test, since arguably ChainRules's test suite puts this package through its paces more than our test suite can.

@willtebbutt
Copy link
Member Author

We could always set up a ChainRules integration test, since arguably ChainRules's test suite puts this package through its paces more than our test suite can.

I'm also still trying to figure out what I think about this whole thing, but I agree that we should really do this at the very least.

@sethaxen
Copy link
Member

Also, regarding the above plan, @sethaxen what are your thoughts on it in light of our adjoint-adventures the other day? I'm nervous that if I start changing things, things will start breaking in strange ways.

I'll get on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants