[QST] Use Numpydoc for docstrings? #652

mrocklin · 2019-01-09T18:44:04Z

It looks like we currently use straight sphinx for docstrings. This looks like the following:

    .. code-block:: python

          from cudf.dataframe import DataFrame
          df = DataFrame()
          df['key'] = [0, 1, 2, 3, 4]
          df['val'] = [float(i + 10) for i in range(5)]  # insert column
          print(df)

I'd like to raise the possibility that we move to numpydoc. There are a few reasons for this:

It seems to be the standard in PyData projects today
It includes results in the docstrings, which is nice for people reading them, and also provides us something to test if we like

Here is what the example above would look like

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df
<cudf.DataFrame ncols=2 nrows=5 >

The text was updated successfully, but these errors were encountered:

kkraus14 · 2019-01-09T21:37:35Z

Agreed this is the right direction and we should do this.

mrocklin · 2019-02-19T20:41:47Z

@kkraus14 is there someone we know that can do this work? I wonder if @taureandyernv is around and has time?

mrocklin · 2019-02-19T20:50:52Z

I've also changed the example above to prefer global namespaced imports like

import cudf
df = cudf.DataFrame(...)

rather than highly specific imports

from cudf.dataframe.dataframe import DataFrame
df = DataFrame(...)

This is because:

Namespaces are good so that users can tell if a DataFrame call is from pandas or cudf or other
This allows us to move around modules in the future without breaking user code

mrocklin · 2019-02-19T20:58:51Z

Also cc @randerzander

randerzander · 2019-02-21T22:22:12Z

@mrocklin we previously used numpy example formatting, including the ">>>" ipython style prompts, but actually moved away from it to the .. code-block:: python delineation for two reasons:

Users complained about difficultly copy/pasting example snippets into their own code.

This is entirely subjective and based on individual preference. I searched for an extension for the front-end that makes copy/paste exclude interactive prompt characters but didn't find anything. If you and @kkraus14 feel strongly about using numpydoc standards anyway, I wouldn't oppose.

That said, doing so probably would not happen before 0.6.

Sphinx rendering of examples with >>> often broke; sometimes with unintelligible indentation warnings, and sometimes silently resulting with "successful" publishing with a broken render.

To be fair to numpydoc, Sphinx's .. code-block::python rendering has enough of its own issues that I'd tried using literalinclude and writing a pytest wrapper to test the snippets and expected output, mostly because setting up doctest with DataFrame output was yet another finnicky process and resulted in unhelpful warnings from difficult to find gremlin characters.

This could & should be fixed on an example by example basis, and CI should fail a build if doc builds result in warnings.

I've also changed the example above to prefer global namespaced imports like

import cudf
df = cudf.DataFrame(...)

rather than highly specific imports

from cudf.dataframe.dataframe import DataFrame
df = DataFrame(...)

This is because:

1. Namespaces are good so that users can tell if a DataFrame call is from pandas or cudf or other

2. This allows us to move around modules in the future without breaking user code

I believe we fixed all of these in the 0.5+ docs. If we missed any of them, let's get issues opened for the specific examples.

mrocklin · 2019-02-21T22:30:00Z

This is entirely subjective and based on individual preference. I searched for an extension for the front-end that makes copy/paste exclude interactive prompt characters but didn't find anything. If you and @kkraus14 feel strongly about using numpydoc standards anyway, I wouldn't oppose

I feel somewhat strongly about this. Almost all PyData projects use numpydoc standard today.

That said, doing so probably would not happen before 0.6.

OK, maybe I just go ahead and do it? I think that this is a couple hours of work.

Sphinx rendering of examples with >>> often broke; sometimes with unintelligible indentation warnings, and sometimes silently resulting with "successful" publishing with a broken render.

I'm surprised to hear this. Were we using the numpydoc sphinx extension?

I believe we fixed all of these in the 0.5+ docs. If we missed any of them, let's get issues opened for the specific examples.

I think that we generally do the convention above today. Here is one example:

cudf/python/cudf/dataframe/dataframe.py

Lines 361 to 363 in 866bded

    
                       from cudf.dataframe import DataFrame 
        
                       df = DataFrame()

To be clear I'm not saying "imports are wrong today" I'm saying "we're encouraging users into behaviors such that us changing module structure in the future will make imports wrong."

randerzander · 2019-02-21T23:20:26Z

Ah. Tail is new in branch-0.6.

I did a search for "cudf.dataframe" in the API docs page and didn't find any.

Yes, we're using the numpydoc extension

For example, if you build docs from branch-0.6 right now, there are quite a few warnings and rendering issues with the new file reader/writer docstrings copied straight from Pandas. I spent awhile fighting with them, but there's more to do.

kkraus14 · 2019-03-05T17:35:39Z

With #1036 merged I'm closing this as it's resolved 😄

mrocklin added question Further information is requested Needs Triage Need team to review and classify labels Jan 9, 2019

kkraus14 added doc Documentation Python Affects Python cuDF API. proposal Change current process or code and removed Needs Triage Need team to review and classify question Further information is requested labels Jan 9, 2019

randerzander self-assigned this Feb 21, 2019

mrocklin mentioned this issue Feb 23, 2019

[REVIEW] Switch to Numpydoc for docstrings #1036

Merged

kkraus14 closed this as completed Mar 5, 2019

benfred mentioned this issue Nov 2, 2022

[FEA] Use doctest for testing python docstrings rapidsai/raft#981

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Use Numpydoc for docstrings? #652

[QST] Use Numpydoc for docstrings? #652

mrocklin commented Jan 9, 2019 •

edited

Loading

kkraus14 commented Jan 9, 2019

mrocklin commented Feb 19, 2019

mrocklin commented Feb 19, 2019

mrocklin commented Feb 19, 2019

randerzander commented Feb 21, 2019 •

edited

Loading

mrocklin commented Feb 21, 2019

randerzander commented Feb 21, 2019

kkraus14 commented Mar 5, 2019

[QST] Use Numpydoc for docstrings? #652

[QST] Use Numpydoc for docstrings? #652

Comments

mrocklin commented Jan 9, 2019 • edited Loading

kkraus14 commented Jan 9, 2019

mrocklin commented Feb 19, 2019

mrocklin commented Feb 19, 2019

mrocklin commented Feb 19, 2019

randerzander commented Feb 21, 2019 • edited Loading

mrocklin commented Feb 21, 2019

randerzander commented Feb 21, 2019

kkraus14 commented Mar 5, 2019

mrocklin commented Jan 9, 2019 •

edited

Loading

randerzander commented Feb 21, 2019 •

edited

Loading