Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Clarify and add fill_value example in arithmetic ops #19675

Merged
merged 9 commits into from
Feb 22, 2018

Conversation

HagaiHargil
Copy link
Contributor

@pep8speaks
Copy link

pep8speaks commented Feb 13, 2018

Hello @HagaiHargil! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 21, 2018 at 06:55 Hours UTC

missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
successful array alignment, with this value before computation.
If data in both corresponding DataFrame locations is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for Series, so change DataFrame to Series.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe rephrase it as

Fill missing values with 'fill_value'. There are two sources of missing values

    * Missing values present either Series before the operation
    * Newly created missing values as a result of an alignment    

In either case, missing values are not filled if both Series are missing after alignment.
At least one value from either Series must be not NA for 'fill_value' to be used.

Though I'm sure if this is any clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a problem with the first line - "Fill missing values with 'fill_value'". This is basically what bothered me in the first place, it doesn't exactly fill missing value, in the naive way you would expect it to.

If you insist I'll gladly rephrase my current wording. In the mean time I'll fix the rest of your remarks.


Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a DataFrame for this example? It can just be a single-column DataFrame with these same values and index.

@gfyoung gfyoung added Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 14, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to make a common shared doc-string would be nice as well (could be followon PR)

@@ -255,8 +255,10 @@ def _get_frame_op_default_axis(name):
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill missing (NaN) values with this value. If both Series are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the 'and any new elemnt needed for successful array alignment', this is redundant with your last sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which sentence are you referring to. The "and any new element needed" sentence refers to the alignment process. The last sentence ("If data in both corresponding...") only deals with a "corner case" of two NaNs, without any direct reference to the data alignment process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use the word 'array' as these are not arrays. this is still not very clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this phrasing much better. Could you change "new element" to "new missing values"

You can remove the commas around the "and any new element... " clause.

@@ -265,6 +267,18 @@ def _get_frame_op_default_axis(name):
-------
result : Series

Examples
--------
>>> a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show a and b as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to have a line with just >>> a on it before showing the Series. Likewise for b.

@@ -280,8 +294,10 @@ def _get_frame_op_default_axis(name):
axis : {0, 1, 'index', 'columns'}
For Series input, axis to match Series index on
fill_value : None or float value, default None
Fill missing (NaN) values with this value. If both DataFrame locations are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show a and b.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob want to show say a with 1 column and b with 2

@@ -321,6 +351,18 @@ def _get_frame_op_default_axis(name):
-------
result : DataFrame

Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


Examples
--------
>>> a = pd.DataFrame([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob want to show say a with 1 column and b with 2

@jreback jreback changed the title Clarify and add fill_value example in arithmetic ops DOC: Clarify and add fill_value example in arithmetic ops Feb 15, 2018
@codecov
Copy link

codecov bot commented Feb 18, 2018

Codecov Report

Merging #19675 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19675      +/-   ##
==========================================
+ Coverage   91.57%   91.61%   +0.03%     
==========================================
  Files         150      150              
  Lines       48817    48882      +65     
==========================================
+ Hits        44704    44782      +78     
+ Misses       4113     4100      -13
Flag Coverage Δ
#multiple 89.98% <100%> (+0.03%) ⬆️
#single 41.79% <100%> (+0.06%) ⬆️
Impacted Files Coverage Δ
pandas/core/ops.py 96.86% <100%> (+0.03%) ⬆️
pandas/core/arrays/base.py 60% <0%> (-0.61%) ⬇️
pandas/core/series.py 94.46% <0%> (-0.11%) ⬇️
pandas/core/sparse/series.py 95.25% <0%> (-0.02%) ⬇️
pandas/core/indexes/multi.py 95.06% <0%> (-0.01%) ⬇️
pandas/core/sparse/frame.py 94.81% <0%> (ø) ⬆️
pandas/core/groupby.py 92.2% <0%> (ø) ⬆️
pandas/core/indexes/api.py 98.78% <0%> (ø) ⬆️
pandas/io/parquet.py 71.79% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9551c8...3192301. Read the comment docs.

@HagaiHargil
Copy link
Contributor Author

I'm also not sure what a shared docstring really is in this context :)

b 1.0
c 1.0
d NaN
>>> b = pd.DataFrame([[1, np.nan], [np.nan, 2], [1, np.nan], [np.nan, 2]], index=['a', 'b', 'c_', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will not pass the linter, need to format this, prob easier to use a dict to construct as well.

@@ -255,8 +255,10 @@ def _get_frame_op_default_axis(name):
----------
other : Series or scalar value
fill_value : None or float value, default None (NaN)
Fill missing (NaN) values with this value. If both Series are
missing, the result will be missing
Fill existing missing (NaN) values, and any new element needed for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use the word 'array' as these are not arrays. this is still not very clear.

@HagaiHargil
Copy link
Contributor Author

I changed the confusing "array" into series\dataframe.

Regarding the "this is still not very clear" comment -

First, I guess you agree with me that the previous text was plain wrong. Now I'm trying to find a clear enough sentence that fits the actual behavior of this keyword, but inconsistent behavior leads to badly-phrased docs. I'm really not sure what to say anymore.

@TomAugspurger
Copy link
Contributor

Looks like there are some linting errors failing the build: #19675 (comment)

This also seems to cause issues with our _make_flex_doc helper. Are you able to import pandas after making your changes locally?

@HagaiHargil
Copy link
Contributor Author

It was an interesting bug with the {} dict constructor.

c 1.0
d NaN
dtype: float64
>>> b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'c_', 'd'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer just using abde or something, the c_ is confusing

@jreback jreback added this to the 0.23.0 milestone Feb 22, 2018
@jreback jreback merged commit 4ed8313 into pandas-dev:master Feb 22, 2018
@jreback
Copy link
Contributor

jreback commented Feb 22, 2018

thanks @HagaiHargil

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Clarifiy fill_value behavior in arithmetic ops
5 participants