Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frame ops prelims - de-duplicate, remove unused kwargs #19522

Merged
merged 8 commits into from
Feb 7, 2018

Conversation

jbrockmendel
Copy link
Member

Starting in on making DataFrame ops consistent with Series and Index ops. As a first pass, this goes through and removes some duplicate code and unused arguments.

Changes logic in exactly two corner cases: arithmetic operations frame.op(series, axis=None, fill_value=not_none) where one of series or frame has length zero. axis needs to be specifically passed as None to hit the changed case.

ser = pd.Series([])
df = pd.DataFrame([[1, 2], [3, 4]])

>>> df.add(ser, axis=None, fill_value='E')
>>> ser.to_frame().sub(df[0], axis=None, fill_value=3)

will now raise instead of returning

>>> df.add(ser, axis=None, fill_value='E')
    0   1
0 NaN NaN
1 NaN NaN

>>> ser.to_frame().sub(df[0], axis=None, fill_value=3)
Empty DataFrame
Columns: [0]
Index: []

@codecov
Copy link

codecov bot commented Feb 3, 2018

Codecov Report

Merging #19522 into master will decrease coverage by <.01%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19522      +/-   ##
==========================================
- Coverage    91.6%    91.6%   -0.01%     
==========================================
  Files         150      150              
  Lines       48750    48736      -14     
==========================================
- Hits        44657    44644      -13     
+ Misses       4093     4092       -1
Flag Coverage Δ
#multiple 89.97% <90.9%> (-0.01%) ⬇️
#single 41.74% <50%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.83% <100%> (-0.02%) ⬇️
pandas/core/indexes/base.py 96.43% <100%> (-0.02%) ⬇️
pandas/core/ops.py 95.22% <100%> (ø) ⬆️
pandas/core/frame.py 97.42% <80%> (-0.01%) ⬇️
pandas/core/reshape/merge.py 94.22% <0%> (ø) ⬆️
pandas/io/parsers.py 95.56% <0%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed10bf6...85d501c. Read the comment docs.

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Clean labels Feb 3, 2018
fill_value=fill_value,
try_cast=try_cast)
else:
if len(other) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not len

fill_value=None, try_cast=True):
if len(other) == 0:
return self * np.nan
if len(self) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

if len(other) == 0:
return self * np.nan
if len(self) == 0:
# Ambiguous case, use _series so works with DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally sure. Possibly a holdover from higher-dim implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blame shows that this comment was introduced in c35a39d in 2011. Looks like it replaced a comment # Ambiguous case, use _series so works with DataMatrix. @wesm is this still relevant? Suggestions for fleshing this out?

@@ -3960,52 +3959,40 @@ def f(i):

def _combine_series(self, other, func, fill_value=None, axis=None,
level=None, try_cast=True):
if fill_value is not None:
raise NotImplementedError("fill_value %r not supported." %
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this hit in tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use .format() here instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes to both. tests.frame.test_operators.TestDataFrameOperators.test_arith_flex_frame:

        with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
           self.frame.add(self.frame.iloc[0], fill_value=3)

@jreback
Copy link
Contributor

jreback commented Feb 6, 2018

so you need some tests to validate your claims at the top of the PR

Changes logic in exactly two corner cases: arithmetic operations frame.op(series, axis=None,

and a release note.

@@ -581,6 +581,7 @@ Numeric
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
- Bug in :class:`DataFrame` flex arithmetic methods that failed to raise ``NotImplementedError` in cases where either argument has length zero (:issue:`19522`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need double backticks on NotImplementedError. can you be slightly more descriptive here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a user is reading this and would want to know what if anything they need to change.

@@ -80,6 +80,26 @@ def _try_get_item(x):
return x


def _make_invalid_op(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving this function to core/ops (and parameterizing so you can use it for index/series)

@@ -451,6 +451,18 @@ def test_arith_flex_frame(self):
with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
self.frame.add(self.frame.iloc[0], axis='index', fill_value=3)

def test_arith_flex_zero_len_raises(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't these be in test_arithmetic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually yes. The more important grouping for now is that this belongs with the test above it, which I plan to split/parametrize before moving to test_arithmetic

df.add(ser, fill_value='E')

with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
ser.to_frame().sub(df[0], axis=None, fill_value=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just create the required frame here rather than using ser.to_frame()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

df = pd.DataFrame([[1, 2], [3, 4]])

with tm.assert_raises_regex(NotImplementedError, 'fill_value'):
df.add(ser, fill_value='E')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should raise ValueError, not NIE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing implementation raises NIE outside of these corner cases. For the time being I'm assuming there's a reason for that.

@jreback jreback added this to the 0.23.0 milestone Feb 7, 2018
@jreback jreback merged commit a7d1103 into pandas-dev:master Feb 7, 2018
@jreback
Copy link
Contributor

jreback commented Feb 7, 2018

thanks

@jbrockmendel jbrockmendel deleted the ops-frame3 branch February 11, 2018 21:38
harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants