-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make AffineScalarFunc hashable for Pandas, Pint and Pint-Pandas #170
Make AffineScalarFunc hashable for Pandas, Pint and Pint-Pandas #170
Conversation
Add __hash__ so it works with Pandas, Pint, and Pint-Pandas Signed-off-by: Michael Tiemann <[email protected]>
uncertainties/core.py
Outdated
@@ -1849,6 +1849,10 @@ def std_dev(self): | |||
# Abbreviation (for formulas, etc.): | |||
s = std_dev | |||
|
|||
def __hash__(self): | |||
# Placeholder until we figure out how to really make these hashable | |||
return id(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this satisfy the invariance x == y \Rightarrow hash(x) == hash(y)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran some tests:
u = ufloat(1, 2)
v = ufloat(1, 2)
w = u+u
z = w / 2
# u != v, expected
# u == z, expected
# id(u) != id(v) != id(z)
a = float(1+2)
b = float(2+1)
c = a+a
d = c/2
# a == b, expected
# a == d, expected
# b == d, expected
# id(a) != id(b) != id(d)
I'll see what I can come up with to satisfy the invariant. Thanks!
Tweak hash function so that when x==y, hash(x)==hash(y) Test case: ``` import uncertainties from uncertainties import ufloat u = ufloat(1.23, 2.34) v = ufloat(1.23, 2.34) print(f"u{u} == v{v}: {u==v}") print(f"hash(u){hash(u)} == hash(v){hash(v)}: {hash(u)==hash(v)}") print(f"u{u} == (u+u)/2{(u+u)/2}: {u==(u+u)/2}") print(f"hash(u){hash(u)} == hash((u+u)/2){hash((u+u)/2)}: {hash(u)==hash((u+u)/2)}") ``` Signed-off-by: Michael Tiemann <[email protected]>
While I solved the invariant problem, when I ran pytest the pickle test failed. All others passed. I'll see what I can do about that... |
Ensure that we have a linear_combo attribute before trying to look up its keys. Also re-indent so our conditional test is not too long, and add test case to test_uncertainties. Now test_uncertainties passes all 31 tests original tests plus the new one, and test_umath passes all 9 tests. Signed-off-by: Michael Tiemann <[email protected]>
@@ -2404,3 +2404,21 @@ def test_correlated_values_correlation_mat(): | |||
assert arrays_close( | |||
numpy.array(cov_mat), | |||
numpy.array(uncert_core.covariance_matrix([x2, y2, z2]))) | |||
|
|||
def test_hash(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any NaN-likes that will have x!=x? if you want those to be considered "matching" for e.g. dict lookups (and pandas Index lookups) those will need matching hashes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that there is no canonical NaN in the uncertainties package:
import numpy as np
import uncertainties
from uncertainties import ufloat
xx = ufloat(np.nan, 0)
yy = ufloat(np.nan, 1)
# nan+/-0
print(xx + xx)
# nan+/-2
print(yy + yy)
# False, because nan != nan
print(xx == xx)
The uncertainties package has unumpy.isnan which works for both scalar and arrays. There's no dictionary compare against a nan. The PintArray type uses unp.isnan to check whether something is a nan or not. The pandas changes all use PintArray.isna to test NA-ness (including NaN-ness). But if I'm missing something, please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now changed the Pint-Pandas changes to use np.nan instead of pd.NA. While I was able to get everything to work with pd.NA, things were cleaner using np.nan as the NA value for uncertain values.
Call `expand` directly if self._linear_part is not yet expanded. Signed-off-by: Michael Tiemann <[email protected]>
Attempt to work around old version of nose that doesn't work with later versions of Python. Will it work? Signed-off-by: Michael Tiemann <[email protected]>
This reverts commit 7cca18c.
Attempt replacing `nose` (which has not been regularly supported for years) with `pytest` for better CI/CD experience. Signed-off-by: Michael Tiemann <[email protected]>
This reverts commit f3cb615.
Try running nosetests directly, rather than via setuptools, which trips up nose-1.3.7 for python >= 3.6. See nose-devs/nose#873 Signed-off-by: Michael Tiemann <[email protected]>
Install `future` to satisfy `builtins` for Python 2.7. Signed-off-by: Michael Tiemann <[email protected]>
All changes needed for Pandas 2.1 are now present and Pint/Pint-Pandas changes are ready for review. Could we get a fresh release with hashable AffineScalarFunc? |
I made similar changes in #189, just because I haven't seeen this MR before. I do not have any clue about uncertainties, but wouldn't it be necessary, to check the correlations as well? You may check my MR, I tried to do it there. |
@NelDav nice to meet you! The correlations are checked by expanding out the |
Hi @MichaelTiemannOSC, thank you for your answer. I merged your branch into mine an checked my hash function with your tests. Indeed, a few problems occoured. I then modified my hash calculation and now your tests are pasing. I think the new version is still simpler than your solution. However, I am far from an expert on how to calculate with uncertainties and how identity verification should be carried out. It would be very nice if you could look at my new proposal and share your impressions. |
See my comment on the other PR: #219 (comment) Closing this PR in favor of #219. Though that one will likely be closed as well since I think we discovered implementing a hash requires a bit of a larger refactor of the |
Add hash to AffineScalarFunc so it works with Pandas, Pint, and Pint-Pandas.
Part of pandas-dev/pandas#53970
and hgrecco/pint#1615
and hgrecco/pint-pandas#140
@jbrockmendel @hgrecco @andrewgsavage