-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix some issues with equality of factorizations #41228
base: master
Are you sure you want to change the base?
Conversation
lq, | ||
lu, | ||
qr, | ||
x -> qr(x, ColumnNorm()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be manually changed to qr(x, Val(true))
when backporting this to 1.6. @KristofferC what is the best way to ensure this? Should I maybe remove the backport-1.6
label here and open a new PR with that change against the release-1.6
branch?
- `hash` did not respect the type of a factorization, so completely different factorizations with the same underlying data would result in same `hash` leading to inconsistencies with `isequal`. This likely doesn't occur very often in practice, but definitely seems worth fixing. - `==` and `isequal` only returned true if two factorizations are of exactly the same type, which is inconsistent with their implementation for other objects and with the definition of `hash` for factorizations. - Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. Perhaps `T` should be directly stored as `UpperTriangular` in `QRCompactWY`, but that seems potentially breaking. Relying on implementation details of `DataType` here is certainly less than ideal, but I could not come up with a nicer solution.
ae06e73
to
642e6e9
Compare
Base.:(==)( F::T, G::T) where {T<:Factorization} = all(f -> getfield(F, f) == getfield(G, f), 1:nfields(F)) | ||
Base.isequal(F::T, G::T) where {T<:Factorization} = all(f -> isequal(getfield(F, f), getfield(G, f)), 1:nfields(F))::Bool | ||
function Base.hash(F::Factorization, h::UInt) | ||
return mapreduce(f -> hash(getfield(F, f)), hash, 1:nfields(F); init=hash(typeof(F).name.wrapper, h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been told that you should never fetch the internal fields like this. Maybe you can hash propertynames
instead?
More generally, should the hash
traverse propertynames
instead of the fields? That should avoid the issue with non-active memory affecting the hash
and maybe make the specialized methods redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been told that you should never fetch the internal fields like this. Maybe you can hash propertynames instead?
I agree it's a bit ugly, but I am unsure just using propertynames
for the hash is quite enough, since a lot of them like vectors
and values
for Eigen are quite generic and not necessarily unique. Perhaps that is being a bit too paranoid, but hashing by type identity seems more formally correct, at least to me. For a stdlib, relying on internals is probably also not quite as bad, since it will always get tested and updated alongside other Base changes.
More generally, should the hash traverse propertynames instead of the fields?
Yeah, maybe? I think that would mean hashing the same data twice for some factorizations though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's a bit ugly
I don't think the reason to avoid it was aesthetics but it's really not my department. Others should comment that if it should be avoided. Maybe it used to be a problem but no longer is. Otherwise, let's just keep it.
I think that would mean hashing the same data twice for some factorizations though, right?
Yes. For e.g. Cholesky
it would probably end up hashing the same data three times so we should probably have specialized versions. However, hashing the fields seems wrong since the inactive memory will affect the hash so I think using the properties will be the correct, although slow, fallback. E.g.
julia> A = Symmetric(randn(3,3) + 10I, :U)
3×3 Symmetric{Float64, Matrix{Float64}}:
8.52303 0.491492 0.311278
0.491492 9.43415 -1.63795
0.311278 -1.63795 9.80197
julia> Ac = copy(A);
julia> Ac.data[3,1] += 1
1.9147549806916584
julia> hash(cholesky(A)) == hash(cholesky(Ac))
false
julia> cholesky(A).U == cholesky(Ac).U
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main reason we discourage use of internal APIs like this is because they might change at any time, which will break such code. I don't think there are any other glaring problems with this approach.
Using propertynames
for the fallback seems reasonable, I will change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, trying this locally, using propertynames
fails for bunchkaufman
.
For e.g. Cholesky it would probably end up hashing the same data three times so we should probably have specialized versions.
It seems like this is an issue for quite a lot of factorizations, so if we want this to be efficient, we would end up having to define this manually for a lot of cases, which kind of defeats the purpose of having this definition in the first place.
The more I think about it, the more I am convinced that we should just remove these definitions for the abstract type Factorization
, since they really don't make much sense in terms of the abstract Factorization
interface, but instead have a macro similar to https://github.com/andrewcooke/AutoHashEquals.jl, which defines these for each factorization separately. That would also avoid having to rely on F.name.wrapper
. What do you think? It might be somewhat breaking if users define their own factorization types though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump. How do you think we should proceed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the macro based version end up inspecting the fields which was what I argued against in #41228 (comment)
Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. Perhaps `T` should be directly stored as `UpperTriangular` in `QRCompactWY`, but that seems potentially breaking. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
* fix equality of QRCompactWY (#41363) Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again. (cherry picked from commit 74fab49)
Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from JuliaLang#41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
* fix equality of QRCompactWY (#41363) Equality for `QRCompactWY` did not ignore the subdiagonal entries of `T` leading to nondeterministic behavior. This is pulled out from #41228, since this change should be less controversial than the other changes there and this particular bug just came up in ChainRules again.
hash
did not respect the type of a factorization, so completelydifferent factorizations with the same underlying data would result in
same
hash
leading to inconsistencies withisequal
. This likelydoesn't occur very often in practice, but definitely seems worth
fixing.
==
andisequal
only returned true if two factorizations are ofexactly the same type, which is inconsistent with their implementation
for other objects and with the definition of
hash
for factorizations.QRCompactWY
did not ignore the subdiagonal entries ofT
leading to nondeterministic behavior. PerhapsT
should bedirectly stored as
UpperTriangular
inQRCompactWY
, but that seemspotentially breaking.
Relying on implementation details of
DataType
here is certainly lessthan ideal, but I could not come up with a nicer solution.