Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fix docstring validation errors for pandas.Series #59592

Open
natmokval opened this issue Aug 24, 2024 · 36 comments
Open

DOC: fix docstring validation errors for pandas.Series #59592

natmokval opened this issue Aug 24, 2024 · 36 comments
Labels
Code Style Code style, linting, code_checks Docs good first issue

Comments

@natmokval
Copy link
Contributor

natmokval commented Aug 24, 2024

follow up on issues #56804, #59458 and #58063
pandas has a script for validating docstrings:

pandas/ci/code_checks.sh

Lines 155 to 187 in 0cdc6a4

-i "pandas.Series.sparse.fill_value SA01" \
-i "pandas.Series.sparse.from_coo PR07,SA01" \
-i "pandas.Series.sparse.npoints SA01" \
-i "pandas.Series.sparse.sp_values SA01" \
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \
-i "pandas.Series.std PR01,RT03,SA01" \
-i "pandas.Series.str.capitalize RT03" \
-i "pandas.Series.str.casefold RT03" \
-i "pandas.Series.str.center RT03,SA01" \
-i "pandas.Series.str.decode PR07,RT03,SA01" \
-i "pandas.Series.str.encode PR07,RT03,SA01" \
-i "pandas.Series.str.index RT03" \
-i "pandas.Series.str.ljust RT03,SA01" \
-i "pandas.Series.str.lower RT03" \
-i "pandas.Series.str.lstrip RT03" \
-i "pandas.Series.str.match RT03" \
-i "pandas.Series.str.normalize RT03,SA01" \
-i "pandas.Series.str.partition RT03" \
-i "pandas.Series.str.repeat SA01" \
-i "pandas.Series.str.replace SA01" \
-i "pandas.Series.str.rindex RT03" \
-i "pandas.Series.str.rjust RT03,SA01" \
-i "pandas.Series.str.rpartition RT03" \
-i "pandas.Series.str.rstrip RT03" \
-i "pandas.Series.str.strip RT03" \
-i "pandas.Series.str.swapcase RT03" \
-i "pandas.Series.str.title RT03" \
-i "pandas.Series.str.upper RT03" \
-i "pandas.Series.str.wrap RT03,SA01" \
-i "pandas.Series.str.zfill RT03" \
-i "pandas.Series.struct.dtypes SA01" \
-i "pandas.Series.to_markdown SA01" \
-i "pandas.Series.update PR07,SA01" \

Currently, some methods fail docstring validation check.
The task here is:

  • take 2-4 methods
  • run: scripts/validate_docstrings.py <method-name>
  • fix the docstrings according to whatever error is reported
  • remove those methods from code_checks.sh script
  • commit, push, open pull request

Example:

scripts/validate_docstrings.py pandas.Series.prod

pandas.Series.prod fails with the ES01 and RT03 errors

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found for `pandas.Series.prod`:
        ES01    No extended summary found
        RT03    Return value has no description

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.

If you're new contributor, please check the contributing guide

@natmokval natmokval added Docs Code Style Code style, linting, code_checks good first issue labels Aug 24, 2024
@ivonastojanovic
Copy link
Contributor

ivonastojanovic commented Aug 24, 2024

I'll take these:

 -i "pandas.Series.sparse.fill_value SA01" \ 
 -i "pandas.Series.sparse.from_coo PR07,SA01" \ 
 -i "pandas.Series.sparse.npoints SA01" \ 
 -i "pandas.Series.sparse.sp_values SA01" \ 
 -i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \ 

@wenchen-cai
Copy link
Contributor

wenchen-cai commented Aug 24, 2024

I'll take these:

 -i "pandas.Series.str.wrap RT03,SA01" \ 
 -i "pandas.Series.str.zfill RT03" \ 

@ivonastojanovic
Copy link
Contributor

Working on these:

 -i "pandas.Series.str.match RT03" \ 
 -i "pandas.Series.str.normalize RT03,SA01" \ 
 -i "pandas.Series.str.repeat SA01" \ 
 -i "pandas.Series.str.replace SA01" \ 

@githubalexliu
Copy link
Contributor

githubalexliu commented Aug 25, 2024

I'll take these:

-i "pandas.Series.struct.dtypes SA01" \ 
-i "pandas.Series.to_markdown SA01" \ 

@hlakams
Copy link
Contributor

hlakams commented Aug 25, 2024

Here's a filtered list of pandas.Series docstring issues that still need to be addressed:

        ...
        -i "pandas.Series.dt.as_unit PR01,PR02" \
        ...
        -i "pandas.Series.dt.round PR01,PR02" \
        ...
        -i "pandas.Series.dt.unit GL08" \
        ...
        -i "pandas.Series.pad PR01,SA01" \
        ...

I went ahead and removed methods that were already claimed/addressed by open + merged PRs.
(Last updated 9/2/2024)

@hlakams
Copy link
Contributor

hlakams commented Aug 25, 2024

I'll take these:

 -i "pandas.Series.pop SA01" \
 -i "pandas.Series.list.__getitem__ SA01" \
 -i "pandas.Series.list.flatten SA01" \
 -i "pandas.Series.list.len SA01" \
 -i "pandas.Series.reorder_levels RT03,SA01" \
 -i "pandas.Series.sparse.density SA01" \
 -i "pandas.Series.gt SA01" \
 -i "pandas.Series.lt SA01" \
 -i "pandas.Series.ne SA01" \
 -i "pandas.Series.prod RT03" \
 -i "pandas.Series.product RT03" \

@Pranav-Wadhwa
Copy link
Contributor

I will take

-i "pandas.Series.dt.strftime PR01,PR02" \
        -i "pandas.Series.dt.to_period PR01,PR02" \
        -i "pandas.Series.dt.total_seconds PR01" \
        -i "pandas.Series.dt.tz_convert PR01,PR02" \
        -i "pandas.Series.dt.tz_localize PR01,PR02" \
        -i "pandas.Series.dt.unit GL08" \

@james-magee
Copy link
Contributor

I'll take

 -i "pandas.Series.std PR01,RT03,SA01" \ 
 -i "pandas.Series.sem PR01,RT03,SA01" \

@Tmthang1601
Copy link

Tmthang1601 commented Aug 27, 2024

I followed the instructions and encountered this issue: I added 'See Also' to the function fill_value(self) in ./pandas/core/arrays/sparse/array.py. After running the command python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value, I received the message:

thang123456@MSI:/mnt/c/Users/ADMIN/Desktop/pandas/pandas$ python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value

################################################################################
################# Docstring (pandas.Series.sparse.fill_value) #################
################################################################################

Elements in data that are fill_value are not stored.

For memory savings, this should be the most common value in the array.

Examples

ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]")
ser.sparse.fill_value
0
spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2)
ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype)
ser.sparse.fill_value
2

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found for pandas.Series.sparse.fill_value:
SA01 See Also section not found
I checked very carefully but still couldn't fix the error. Can someone help me understand what is going wrong?

image

@Gesare5
Copy link

Gesare5 commented Aug 27, 2024

I will take:

-i "pandas.Series.dt.floor PR01,PR02" \
-i "pandas.Series.dt.ceil PR01,PR02" \

@pol-rius
Copy link
Contributor

I'll take these:

-i "pandas.Series.sparse PR01,SA01" \
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \

@githubalexliu
Copy link
Contributor

I'll take these:

-i "pandas.Series.dt.normalize PR01" \
-i "pandas.Series.dt.qyear GL08" \

@hlakams
Copy link
Contributor

hlakams commented Aug 28, 2024

@Tmthang1601 The pandas prefix is not needed for SparseDtype and SparseArray. Remove that prefix and the validation command should pass.

See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.

@Tmthang1601
Copy link

Tmthang1601 commented Aug 28, 2024

@Tmthang1601 The pandas prefix is not needed for SparseDtype and SparseArray. Remove that prefix and the validation command should pass.

See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.

@hlakams
Originally there was no line "See Also

SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data." in the String Docs of the def fill_value function, I added it by mistake for the purpose of no more errors, I didn't think after I removed it it would go away, and I tried, of course it didn't go away

@hlakams
Copy link
Contributor

hlakams commented Aug 28, 2024

@Tmthang1601 Can you push up your changes in a new PR?

@Tmthang1601
Copy link

@hlakams According to the instructions, you need to complete 2 to 4 methods and run the script successfully before pushing to a new PR, but I'm having trouble.

@hlakams
Copy link
Contributor

hlakams commented Aug 28, 2024

@Tmthang1601 I'm not sure what the issue is, but try replacing lines 620:639 from #59592 (comment) with the following docstring:

        """
        Elements in `data` that are `fill_value` are not stored.

        For memory savings, this should be the most common value in the array.

        See Also
        --------
        SparseDtype : Dtype for sparse array.
        SparseArray : Array of sparse data.

        Examples
        --------
        >>> ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]")
        >>> ser.sparse.fill_value
        0
        >>> spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2)
        >>> ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype)
        >>> ser.sparse.fill_value
        2
        """

Run pre-commit once this change from #59592 (comment) is committed (assuming it was configured correctly) + address possible lint errors and you should be able to push up to your fork.

@yinglyu
Copy link

yinglyu commented Sep 3, 2024

I will take these:

        -i "pandas.Series.dt.day_name PR01,PR02" \
        -i "pandas.Series.dt.month_name PR01,PR02" \

@doshi-kevin
Copy link
Contributor

I will take this -
-i "pandas.Series.update PR07,SA01" \

@blackhole-hoop
Copy link

I'll work on this:

-i "pandas.Series.str.swapcase RT03" \

@chalky25
Copy link

chalky25 commented Sep 6, 2024

I'll work on these:
-i "pandas.Series.dt.nanoseconds SA01" \\
-i "pandas.Series.dt.seconds SA01"

@chalky25
Copy link

chalky25 commented Sep 6, 2024

I'll work on this:

-i "pandas.Series.str.swapcase RT03" \

it seems that pandas.Series.str.swapcase has already been done.

@blackhole-hoop
Copy link

Sorry, I am a first time contributor. May I know how to check whether something is done or not? I searched for the keyword "swapcase" on this page and didn't see anyone was working on this. @chalky25

@ammar-qazi
Copy link
Contributor

Welcome to contributing, @blackhole-hoop. I also started three days ago.

I'm also not sure who fixed it or how it got fixed — because none of the merged commits mention it.

That said, I just tested it using the following command:

scripts/validate_docstrings.py pandas.Series.str.swapcase

And, I got the following in the result.


################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################

Convert strings in the Series/Index to be swapcased.

Equivalent to :meth:`str.swapcase`.

Returns
-------
Series or Index of objects
    A Series or Index where the strings are modified by :meth:`str.swapcase`.

See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
    remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
    remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
    uppercase.
Series.str.casefold: Removes all case distinctions in the string.

Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object

>>> s.str.lower()
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object

>>> s.str.upper()
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object

>>> s.str.title()
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object

>>> s.str.capitalize()
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object

>>> s.str.swapcase()
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.str.swapcase" correct. :)

In other words, you use the script given by the original poster to check the docstring.

@pratik305
Copy link

i will work on this

  • "pandas.Series.str.lower RT03" \
  • "pandas.Series.str.center RT03,SA01" \
  • "pandas.Series.str.title RT03" \
  • "pandas.Series.str.lstrip RT03" \

@pratik305
Copy link

i started contributing found some are already solve without mentioning.
I try to run some code that already merge they also showing error like
python scripts/validate_docstrings.py pandas.Series.str.swapcase

Result

################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################

Convert strings in the Series/Index to be swapcased.

Equivalent to :meth:`str.swapcase`.

Returns
-------
Series or Index of object

See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
    remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
    remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
    uppercase.
Series.str.casefold: Removes all case distinctions in the string.

Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object

>>> s.str.lower()
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object

>>> s.str.upper()
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object

>>> s.str.title()
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object

>>> s.str.capitalize()
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object

>>> s.str.swapcase()
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found for `pandas.Series.str.swapcase`:
        RT03    Return value has no description

and in code.sh file there is no pandas.String.str. related code line
is all str related doc fixed

@Pekka20123
Copy link

Pekka20123 commented Sep 17, 2024

I'll take these:

 -i "pandas.Series.str.rjust RT03,SA01" \ 
 -i "pandas.Series.str.rpartition RT03" \ 
 -i "pandas.Series.str.rstrip RT03" \ 

jyotirjoshi added a commit to jyotirjoshi/pandas that referenced this issue Sep 21, 2024
Don't know exactly .that does this work this the first time. I am contributing.
Sorry for the mistake
@syeda-fajar
Copy link

I want to work on these issues:

-i "pandas.Series.sparse.sp_values SA01,ES01" \
-i "pandas.Series.str.match ES01" \

@dhelms33
Copy link

Hello! I am new to the pandas community. It seems like most of these are already taken. Is there any way to filter which methods have been run already?

@techie505
Copy link

/Assign

@dylanpanton
Copy link

dylanpanton commented Nov 10, 2024

Hi! I will take these:
-i "pandas.core.groupby.DataFrameGroupBy.nth PR02" \
-i "pandas.core.groupby.SeriesGroupBy.nth PR02" \

@Ivruix
Copy link
Contributor

Ivruix commented Nov 20, 2024

Seems that most of these were fixed. Working on this one:

-i "pandas.Series.dt.freq GL08" \

@OscarGB
Copy link
Contributor

OscarGB commented Dec 3, 2024

I'll take:

    -i "pandas.Series.dt.unit GL08" \
    -i "pandas.Series.pad PR01,SA01" \

#60481

@karnbirrandhawa
Copy link

I'll take

-i "pandas.arrays.NumpyExtensionArray SA01" \

@Ishancorp
Copy link

I'll take

-i "pandas.Series.str.rindex RT03" \

@dajale423
Copy link

I'll take

-i "pandas.Series.str.strip RT03" \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Docs good first issue
Projects
None yet
Development

No branches or pull requests