-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize array_equivalent for NDFrame.equals #35328
Optimize array_equivalent for NDFrame.equals #35328
Conversation
restarting failed travis job
|
@@ -391,43 +399,28 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: | |||
if left.shape != right.shape: | |||
return False | |||
|
|||
if dtype_equal: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure i understand why you actually need this
the dtype check is cheap compared to the actual check no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- They add up, especially when doing things columnwise for small, wide dataframes
- We get to skip a few other checks, like swapping
np.isnan
forpd.isna
for float, and we can skip the empty check at https://github.com/pandas-dev/pandas/pull/35328/files#diff-ff8364cee9a3e1ef3a3825cb2cdd26d8L431.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's fine
pandas/core/dtypes/missing.py
Outdated
@@ -452,6 +445,45 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: | |||
return np.array_equal(left, right) | |||
|
|||
|
|||
def array_equivalent_float(left, right): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make all of these helpers private functions
@@ -452,6 +445,45 @@ def array_equivalent(left, right, strict_nan: bool = False) -> bool: | |||
return np.array_equal(left, right) | |||
|
|||
|
|||
def _array_equivalent_float(left, right): | |||
return ((left == right) | (np.isnan(left) & np.isnan(right))).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about use_inf_as_na?
Can this be re-used on L426? (also the prod(shape) check on L424-425 i think is redundant with earlier shape check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like 1.0.5 doesn't care about that
In [15]: df1 = pd.DataFrame({"A": np.array([np.nan, 1, np.inf])})
In [16]: df2 = pd.DataFrame({"A": np.array([np.nan, 1, np.nan])})
In [17]: with pd.option_context('mode.use_inf_as_na', True):
...: print(df1.equals(df2))
...:
...:
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for checking
Travis finished. I'm going to selfishly ignore #35328 (comment) so that I can get the RC process rolling, but happy to engage more on followups if I missed something. |
Closes #35249
And proof of this being better for short-circuiting:
Discussion: the key thing here is that
BlockManager.equals
requires that the dtypes ofleft
andright
match. If they don't then we've already returned False. That lets us do a few thingsis_foo_dtype
checks.np.isnan
np.array([1, 2])
equivalent tonp.array([1.0, 2.0])
.