-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Clarify and add fill_value example in arithmetic ops #19675
Conversation
Hello @HagaiHargil! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 21, 2018 at 06:55 Hours UTC |
pandas/core/ops.py
Outdated
missing, the result will be missing | ||
Fill existing missing (NaN) values, and any new element needed for | ||
successful array alignment, with this value before computation. | ||
If data in both corresponding DataFrame locations is missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for Series
, so change DataFrame
to Series
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe rephrase it as
Fill missing values with 'fill_value'. There are two sources of missing values
* Missing values present either Series before the operation
* Newly created missing values as a result of an alignment
In either case, missing values are not filled if both Series are missing after alignment.
At least one value from either Series must be not NA for 'fill_value' to be used.
Though I'm sure if this is any clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a problem with the first line - "Fill missing values with 'fill_value'". This is basically what bothered me in the first place, it doesn't exactly fill missing value, in the naive way you would expect it to.
If you insist I'll gladly rephrase my current wording. In the mean time I'll fix the rest of your remarks.
pandas/core/ops.py
Outdated
|
||
Examples | ||
-------- | ||
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a DataFrame for this example? It can just be a single-column DataFrame with these same values and index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to make a common shared doc-string would be nice as well (could be followon PR)
@@ -255,8 +255,10 @@ def _get_frame_op_default_axis(name): | |||
---------- | |||
other : Series or scalar value | |||
fill_value : None or float value, default None (NaN) | |||
Fill missing (NaN) values with this value. If both Series are | |||
missing, the result will be missing | |||
Fill existing missing (NaN) values, and any new element needed for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the 'and any new elemnt needed for successful array alignment', this is redundant with your last sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which sentence are you referring to. The "and any new element needed" sentence refers to the alignment process. The last sentence ("If data in both corresponding...") only deals with a "corner case" of two NaNs, without any direct reference to the data alignment process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use the word 'array' as these are not arrays. this is still not very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this phrasing much better. Could you change "new element" to "new missing values"
You can remove the commas around the "and any new element... " clause.
@@ -265,6 +267,18 @@ def _get_frame_op_default_axis(name): | |||
------- | |||
result : Series | |||
|
|||
Examples | |||
-------- | |||
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show a and b as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to have a line with just >>> a
on it before showing the Series. Likewise for b
.
@@ -280,8 +294,10 @@ def _get_frame_op_default_axis(name): | |||
axis : {0, 1, 'index', 'columns'} | |||
For Series input, axis to match Series index on | |||
fill_value : None or float value, default None | |||
Fill missing (NaN) values with this value. If both DataFrame locations are | |||
missing, the result will be missing | |||
Fill existing missing (NaN) values, and any new element needed for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/core/ops.py
Outdated
|
||
Examples | ||
-------- | ||
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show a and b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob want to show say a with 1 column and b with 2
pandas/core/ops.py
Outdated
@@ -321,6 +351,18 @@ def _get_frame_op_default_axis(name): | |||
------- | |||
result : DataFrame | |||
|
|||
Examples | |||
-------- | |||
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/core/ops.py
Outdated
|
||
Examples | ||
-------- | ||
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob want to show say a with 1 column and b with 2
Codecov Report
@@ Coverage Diff @@
## master #19675 +/- ##
==========================================
+ Coverage 91.57% 91.61% +0.03%
==========================================
Files 150 150
Lines 48817 48882 +65
==========================================
+ Hits 44704 44782 +78
+ Misses 4113 4100 -13
Continue to review full report at Codecov.
|
I'm also not sure what a shared docstring really is in this context :) |
pandas/core/ops.py
Outdated
b 1.0 | ||
c 1.0 | ||
d NaN | ||
>>> b = pd.DataFrame([[1, np.nan], [np.nan, 2], [1, np.nan], [np.nan, 2]], index=['a', 'b', 'c_', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will not pass the linter, need to format this, prob easier to use a dict to construct as well.
@@ -255,8 +255,10 @@ def _get_frame_op_default_axis(name): | |||
---------- | |||
other : Series or scalar value | |||
fill_value : None or float value, default None (NaN) | |||
Fill missing (NaN) values with this value. If both Series are | |||
missing, the result will be missing | |||
Fill existing missing (NaN) values, and any new element needed for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use the word 'array' as these are not arrays. this is still not very clear.
I changed the confusing "array" into series\dataframe. Regarding the "this is still not very clear" comment - First, I guess you agree with me that the previous text was plain wrong. Now I'm trying to find a clear enough sentence that fits the actual behavior of this keyword, but inconsistent behavior leads to badly-phrased docs. I'm really not sure what to say anymore. |
Looks like there are some linting errors failing the build: #19675 (comment) This also seems to cause issues with our |
It was an interesting bug with the |
pandas/core/ops.py
Outdated
c 1.0 | ||
d NaN | ||
dtype: float64 | ||
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'c_', 'd']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would prefer just using abde or something, the c_ is confusing
thanks @HagaiHargil |
git diff upstream/master -u -- "*.py" | flake8 --diff