CI: Better error control in the validation of docstrings #57879

datapythonista · 2024-03-18T01:30:35Z

Making the validation of docstrings more robust, main changes:

If we ignore an error that doesn't fail, the CI will report it and break (there were 30 errors being ignored that are already fixed and removed from the list of ignores here)
Instead of specifying all the errors to validate (that are all but one now) we can specify errors to skip (there is one only that we skip, the lack of an extended summary in a docstring). This simplifies the script a bit
In case an unknown error code is used when ignoring errors, the message should be more descriptive
I created an alias for --ignore_errors as -i, so the list of errors to ignore is a bit easier to read

jordan-d-murphy · 2024-03-18T01:43:24Z

jordan-d-murphy · 2024-03-18T01:52:36Z

with this ticket (#57879), and now that CI: speedup docstring check consecutive runs #57826 is merged in, I'm closing the following issues And opening a new issue to address these based on the new approach we've implemented.

DOC: fix GL08 errors in docstrings
DOC: fix PR01 errors in docstrings
DOC: fix PR07 errors in docstrings
DOC: fix SA01 errors in docstrings
DOC: fix RT03 errors in docstrings
DOC: fix PR02 errors in docstrings

Thanks for all the work that's gone into this! this is a much cleaner approach, and fixing these will now be more straightforward. Big win in my opinion!

datapythonista · 2024-03-18T02:09:05Z

And opening a new issue to address these based on the new approach we've implemented.

What I'd do is create a master issue if there is not one already to fix the docstrings, and then create smaller issues labelled as "good first issue". For example:

Issue 1 to address:

        -i pandas.Categorical.__array__ SA01\
        -i pandas.Categorical.codes SA01\
        -i pandas.Categorical.dtype SA01\
        -i pandas.Categorical.from_codes SA01\
        -i pandas.Categorical.ordered SA01\
        -i pandas.CategoricalDtype.categories SA01\
        -i pandas.CategoricalDtype.ordered SA01\
        -i pandas.CategoricalIndex.codes SA01\
        -i pandas.CategoricalIndex.ordered SA01\

Issue 2 to address:

        -i pandas.HDFStore.append PR01,SA01\
        -i pandas.HDFStore.get SA01\
        -i pandas.HDFStore.groups SA01\
        -i pandas.HDFStore.info RT03,SA01\
        -i pandas.HDFStore.keys SA01\
        -i pandas.HDFStore.put PR01,SA01\
        -i pandas.HDFStore.select SA01\
        -i pandas.HDFStore.walk SA01\

Issue 3 to address:

        -i pandas.Int16Dtype SA01\
        -i pandas.Int32Dtype SA01\
        -i pandas.Int64Dtype SA01\
        -i pandas.Int8Dtype SA01\

Issue 4 to address:

        -i pandas.Interval PR02\
        -i pandas.Interval.closed SA01\
        -i pandas.Interval.left SA01\
        -i pandas.Interval.mid SA01\
        -i pandas.Interval.right SA01\

...

I think it'll make the work of contributors easier by addressing those in groups. In particular, the see also section of many of those would be quite easy since the docstrings they'll be cross-referencing each other in many cases.

If you don't have triagge permissions in this repo, please let me know, I'll give them to you, so you can labelled the issues as "good first issue" and anything else needed.

jordan-d-murphy · 2024-03-18T02:15:44Z

Thanks for the guidance, @datapythonista !

I agree, that sounds like a great approach. I'll set it up once this gets merged in so I can grab the updated code snippets from main.

I don't have those permissions, it would be helpful if you can grant them to me. Thank you!

dontgoto · 2024-03-18T13:42:08Z

That's some great simplifications for the error handling logic!

…_error_control

datapythonista

@mroeschke if you have time, do you mind having a look at this? We changed how we ignore the pending docstring errors, both in #57826 and here again. And PRs fixing docstrings are conflicting, and they'll conflict again after this one. So it'd be good to merge this as soon as reasonable so contributors need to fix the conflicts once.

datapythonista · 2024-03-18T22:28:04Z

scripts/tests/test_validate_docstrings.py

            ignore_deprecated=False,
            ignore_errors=None,
        )
-        assert exit_status == 0
+        assert exit_status == 3


When calling the script for a single function, until now it always returned an exit status of 0 even when there were errors. We don't really check this status anywhere right now, but I think it makes more sense that it also returns the number of errors, as we do when we call the script for all functions.

This is why I the exit status needs to be changed here.

datapythonista · 2024-03-18T22:30:16Z

scripts/tests/test_validate_docstrings.py

-        assert exit_status == 2*2
-        assert exit_status_ignore_func == exit_status - 1
+        # two functions * two not global ignored errors - one function ignored error
+        assert exit_status == 2 * 2 - 1


The diff of all this part seems a bit complex, but I just reordered the two calls since the test was a bit difficult to read before, as it was calling the two functions first, and then asserting the exit codes in the reverse order as they were being called. There is not change in logic other than replacing the error parameter with ignore_errors as in the rest.

scripts/validate_docstrings.py

mroeschke · 2024-03-19T00:07:27Z

scripts/validate_docstrings.py

+    if raw_ignore_errors:
+        for obj_name, error_codes in raw_ignore_errors:
+            # function errors "pandas.Series PR01,SA01"
+            if obj_name != "*":


Could we just use a separate flag for ignoring all errors for a specific code?

I think it's simpler this way. In a follow up PR I'll try to remove the star. So, we'll be able to simply use --ignore-errors PR01. I guess what you don't like is the star?

And I may use None for the key when the error should always be ignored. But since this PR became already too big, I preferred not to also edit the argparse here.

What do you think?

Yeah the *, while commonplace in other tools, is still a little more opaque to me than --ignore-all CODE or similar.

Not a blocker to me, but would be nice to consider in a followup

mroeschke

Generally I like this approach as well. Just a few comments

Co-authored-by: Matthew Roeschke <[email protected]>

mroeschke · 2024-03-19T01:05:44Z

Thanks @datapythonista

jordan-d-murphy · 2024-03-29T06:48:03Z

Opened DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063 as a parent issue for fixing docstrings based on the refactoring in code_checks.sh

Feel free to swing by and help out! 🙂

…57879) * CI: Better error control in the validation of docstrings * Fix CI errors * Fixing tests * Update scripts/validate_docstrings.py Co-authored-by: Matthew Roeschke <[email protected]> --------- Co-authored-by: Matthew Roeschke <[email protected]>

CI: Better error control in the validation of docstrings

b128b94

datapythonista added Docs CI Continuous Integration labels Mar 18, 2024

datapythonista requested a review from mroeschke as a code owner March 18, 2024 01:30

Fix CI errors

66ae136

dontgoto mentioned this pull request Mar 18, 2024

CI: make docstring checks "instantaneous" #57878

Closed

3 tasks

datapythonista added 2 commits March 18, 2024 23:25

Fixing tests

fe2353f

Merge remote-tracking branch 'upstream/main' into validate_docstrings…

65a7fa6

…_error_control

datapythonista commented Mar 18, 2024

View reviewed changes

mroeschke reviewed Mar 19, 2024

View reviewed changes

scripts/validate_docstrings.py Outdated Show resolved Hide resolved

mroeschke reviewed Mar 19, 2024

View reviewed changes

Update scripts/validate_docstrings.py

f5fcd4e

Co-authored-by: Matthew Roeschke <[email protected]>

mroeschke added this to the 3.0 milestone Mar 19, 2024

mroeschke approved these changes Mar 19, 2024

View reviewed changes

mroeschke merged commit 37b9303 into pandas-dev:main Mar 19, 2024
46 of 47 checks passed

datapythonista mentioned this pull request Mar 19, 2024

CI: Improve API of --ignore_errors in validate_docstrings.py #57908

Merged

jordan-d-murphy mentioned this pull request Mar 29, 2024

DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063

Open

ThomasBur mentioned this pull request Mar 29, 2024

DOC: Enforce Numpy Docstring Validation | pandas.Datetime #58066

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Better error control in the validation of docstrings #57879

CI: Better error control in the validation of docstrings #57879

datapythonista commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024 •

edited

Loading

datapythonista commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024

dontgoto commented Mar 18, 2024

datapythonista left a comment

datapythonista Mar 18, 2024

datapythonista Mar 18, 2024

mroeschke Mar 19, 2024

datapythonista Mar 19, 2024

mroeschke Mar 19, 2024

mroeschke left a comment

mroeschke commented Mar 19, 2024

jordan-d-murphy commented Mar 29, 2024

CI: Better error control in the validation of docstrings #57879

CI: Better error control in the validation of docstrings #57879

Conversation

datapythonista commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024 • edited Loading

datapythonista commented Mar 18, 2024

jordan-d-murphy commented Mar 18, 2024

dontgoto commented Mar 18, 2024

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista Mar 18, 2024

Choose a reason for hiding this comment

datapythonista Mar 18, 2024

Choose a reason for hiding this comment

mroeschke Mar 19, 2024

Choose a reason for hiding this comment

datapythonista Mar 19, 2024

Choose a reason for hiding this comment

mroeschke Mar 19, 2024

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke commented Mar 19, 2024

jordan-d-murphy commented Mar 29, 2024

jordan-d-murphy commented Mar 18, 2024 •

edited

Loading