Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation of Dataset.count() is ambiguous #8055

Closed
Articoking opened this issue Aug 8, 2023 · 3 comments · Fixed by #8057
Closed

Documentation of Dataset.count() is ambiguous #8055

Articoking opened this issue Aug 8, 2023 · 3 comments · Fixed by #8057

Comments

@Articoking
Copy link
Contributor

What is your issue?

Hello everyone!

I've noticed that the documentation for Dataset.count and DataArray.count is a little unclear with respect to what it actually does. In my experience, it's not immediately obvious when looking at this site whether zero values are counted or not.

The given example does not contain a 0 value that would clarify this, and what is worse is that the "see also" section points to np.count and to dask.array.count which do not exist. The closest NumPy equivalent is np.count_nonzero, which has a different behavior to the xarray count method. The xarray count is actually equivalent to Pandas DataFrame.count, which count all non-NA values including 0.

I would take a shot at writing a clearer docstring but I'm not sure how to do that using the generate_aggregations.py file...

@Articoking Articoking added the needs triage Issue that has not been reviewed by xarray team member label Aug 8, 2023
@headtr1ck headtr1ck added contrib-help-wanted topic-documentation and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 8, 2023
@headtr1ck
Copy link
Collaborator

Thanks for the issue, many aggregations could indeed use some extra explanation.

Concerning the dead links, that's really not good. Probably we should look into letting sphinx report them.

@keewis
Copy link
Collaborator

keewis commented Aug 8, 2023

that's what nitpicky is for, but we had a lot of these warnings the last time I checked, so we'd need to clean them up first (or mark them as ignored, some of them are descriptions where there should be type descriptions, which we mostly inherit from pandas / matplotlib / numpy).

@dcherian
Copy link
Contributor

dcherian commented Aug 8, 2023

See also #7378

The xarray count is actually equivalent to Pandas DataFrame.count, which count all non-NA values including 0.

This is correct.

I'm not sure how to do that using the generate_aggregations.py file

I would start by adding a long_name and/or description property on Method that lets you add customizable text for each method.

what is worse is that the "see also" section points to np.count and to dask.array.count which do not exist.

The "See Also" piece is harder. Potentially instead of the current template we generate that too by adding a see_also_modules to Method. By default this would be ['numpy', 'dask.array', 'pandas.DataFrame'] but for count we can just have ['pandas.DataFrame'].


An alternate solution would be to manually fix the docstring for count and copy it to all the necessary classes. It is a bit of an outlier in that it's not a "standard" array-api type reduction.

@Articoking Articoking mentioned this issue Aug 8, 2023
2 tasks
dcherian pushed a commit that referenced this issue Aug 9, 2023
* Changed aggregation example to include 0 value

* Fixed broken links in count docs (#8055)

* Added entry to whats-new.rst

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Guillermo Cossio <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants