Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix docstring of read_csv and related methods #23496

Closed
datapythonista opened this issue Nov 4, 2018 · 7 comments · Fixed by #23517
Closed

DOC: Fix docstring of read_csv and related methods #23496

datapythonista opened this issue Nov 4, 2018 · 7 comments · Fixed by #23517
Labels
Milestone

Comments

@datapythonista
Copy link
Member

The next Excel related functions contain many errors as reported by scripts/validate_docstrings.py:

  • pandas.read_table
  • pandas.read_csv
  • pandas.read_fwf

We should fix them, following our standards: https://pandas.pydata.org/pandas-docs/stable/contributing_docstring.html

The validation script should not report errors after they are fixed.

@thoo
Copy link
Contributor

thoo commented Nov 4, 2018

@datapythonista I am going to work on them.

@datapythonista
Copy link
Member Author

Thanks @thoo

I think it makes sense that the same person does the 3, as they are very similar. But feel free to send separate PRs. That probably will makes things simpler. But up to you.

@thoo
Copy link
Contributor

thoo commented Nov 4, 2018

Sure I will work on all three.

@thoo
Copy link
Contributor

thoo commented Nov 5, 2018

@datapythonista
When I run validate_docstrings.py, I got a few errors which I think validate_docstrings.py should not raise. For instance,

Parameter "tupleize_cols" has no description

but I have the following description.

tupleize_cols : bool, default False
    .. deprecated:: 0.21.0
       This argument will be removed and will always convert to MultiIndex

    Leave a list of tuples on columns as is (default is to convert to
    a MultiIndex on the columns).

Even though the docstring output from python validate_docstrings.py pandas.read_csv doesn't contain more than one blank line , I still keep getting:

Use only one blank line to separate sections or paragraphs

Let me know if you want to see the whole output.

@datapythonista
Copy link
Member Author

For 1, I think there is a missing blank line between the name/type line, and the directive. Not sure if that's the cause. But I guess it's related to the .. deprecated:: directive. You can also do a grep of that directive, see if docstrings containing it for a parameter have the same validation error, and if you think it's a problem in the validation script, ignore the error and create an issue (mention #20298 and myself in the issue, so I add it to the list).

For 2, you can add a couple of prints in the property double_blank_lines for debugging (don't commit that).One that prints every line, and one that prints when the two blank lines are found. That should give you a better idea of what's the exact problem.

Let me know if you need help.

@thoo
Copy link
Contributor

thoo commented Nov 5, 2018

@datapythonista I still have one issue. When I run scripts/validate_docstrings.py pandas.read_fwf , I got this error.

Parameters {**kwds} not documented
                Unknown parameters {mangle_dupe_cols, compression, doublequote, warn_bad_lines, quotechar, usecols, na_values, converters, chunksize, skiprows, na_filter, true_values, escapechar, comment, memory_map, delim_whitespace, squeeze, low_memory, index_col, parse_dates, lineterminator, float_precision, iterator, dtype, keep_default_na, dialect, infer_datetime_format, encoding, dayfirst, decimal, verbose, delimiter, skip_blank_lines, quoting, names, tupleize_cols, error_bad_lines, header, nrows, keep_date_col, thousands, false_values, skipfooter, date_parser, prefix, skipinitialspace}

This one should have been fixed by #20061. I will leave a comment there.

@datapythonista
Copy link
Member Author

This one is a bit tricky. We validate that the signature and the documented parameters match. So:

def foo(arg1, **kwargs):
    """
    Parameters
    -----------
    arg1 : str
    arg2 : str
    """

Will generate an error that arg2 is unknown, and kwargs is not documented. Which makes sense.

What happen is that read_fwf was implemented, so most of the parameters are not explicit in the signature, but kwargs is used instead.

Personally, I think the right solution is to change the signature, and add all them. But that's trickier than just changing the docstrings. Shouldn't break anything, but it's more risky than just touching docs.

@jreback any reason why we should not add these parameters to the signature?

@jreback jreback added this to the 0.24.0 milestone Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants