Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Use Numpydoc for docstrings? #652

Closed
mrocklin opened this issue Jan 9, 2019 · 8 comments
Closed

[QST] Use Numpydoc for docstrings? #652

mrocklin opened this issue Jan 9, 2019 · 8 comments
Assignees
Labels
doc Documentation proposal Change current process or code Python Affects Python cuDF API.

Comments

@mrocklin
Copy link
Collaborator

mrocklin commented Jan 9, 2019

It looks like we currently use straight sphinx for docstrings. This looks like the following:

    .. code-block:: python

          from cudf.dataframe import DataFrame
          df = DataFrame()
          df['key'] = [0, 1, 2, 3, 4]
          df['val'] = [float(i + 10) for i in range(5)]  # insert column
          print(df)

I'd like to raise the possibility that we move to numpydoc. There are a few reasons for this:

  1. It seems to be the standard in PyData projects today
  2. It includes results in the docstrings, which is nice for people reading them, and also provides us something to test if we like

Here is what the example above would look like

>>> import cudf
>>> df = cudf.DataFrame()
>>> df['key'] = [0, 1, 2, 3, 4]
>>> df['val'] = [float(i + 10) for i in range(5)]  # insert column
>>> df
<cudf.DataFrame ncols=2 nrows=5 >
@mrocklin mrocklin added question Further information is requested Needs Triage Need team to review and classify labels Jan 9, 2019
@kkraus14
Copy link
Collaborator

kkraus14 commented Jan 9, 2019

Agreed this is the right direction and we should do this.

@kkraus14 kkraus14 added doc Documentation Python Affects Python cuDF API. proposal Change current process or code and removed Needs Triage Need team to review and classify question Further information is requested labels Jan 9, 2019
@mrocklin
Copy link
Collaborator Author

@kkraus14 is there someone we know that can do this work? I wonder if @taureandyernv is around and has time?

@mrocklin
Copy link
Collaborator Author

I've also changed the example above to prefer global namespaced imports like

import cudf
df = cudf.DataFrame(...)

rather than highly specific imports

from cudf.dataframe.dataframe import DataFrame
df = DataFrame(...)

This is because:

  1. Namespaces are good so that users can tell if a DataFrame call is from pandas or cudf or other
  2. This allows us to move around modules in the future without breaking user code

@mrocklin
Copy link
Collaborator Author

Also cc @randerzander

@randerzander randerzander self-assigned this Feb 21, 2019
@randerzander
Copy link
Contributor

randerzander commented Feb 21, 2019

@mrocklin we previously used numpy example formatting, including the ">>>" ipython style prompts, but actually moved away from it to the .. code-block:: python delineation for two reasons:

  1. Users complained about difficultly copy/pasting example snippets into their own code.

This is entirely subjective and based on individual preference. I searched for an extension for the front-end that makes copy/paste exclude interactive prompt characters but didn't find anything. If you and @kkraus14 feel strongly about using numpydoc standards anyway, I wouldn't oppose.

That said, doing so probably would not happen before 0.6.

  1. Sphinx rendering of examples with >>> often broke; sometimes with unintelligible indentation warnings, and sometimes silently resulting with "successful" publishing with a broken render.

To be fair to numpydoc, Sphinx's .. code-block::python rendering has enough of its own issues that I'd tried using literalinclude and writing a pytest wrapper to test the snippets and expected output, mostly because setting up doctest with DataFrame output was yet another finnicky process and resulted in unhelpful warnings from difficult to find gremlin characters.

This could & should be fixed on an example by example basis, and CI should fail a build if doc builds result in warnings.

I've also changed the example above to prefer global namespaced imports like

import cudf
df = cudf.DataFrame(...)

rather than highly specific imports

from cudf.dataframe.dataframe import DataFrame
df = DataFrame(...)

This is because:

1. Namespaces are good so that users can tell if a DataFrame call is from pandas or cudf or other

2. This allows us to move around modules in the future without breaking user code

I believe we fixed all of these in the 0.5+ docs. If we missed any of them, let's get issues opened for the specific examples.

@mrocklin
Copy link
Collaborator Author

This is entirely subjective and based on individual preference. I searched for an extension for the front-end that makes copy/paste exclude interactive prompt characters but didn't find anything. If you and @kkraus14 feel strongly about using numpydoc standards anyway, I wouldn't oppose

I feel somewhat strongly about this. Almost all PyData projects use numpydoc standard today.

That said, doing so probably would not happen before 0.6.

OK, maybe I just go ahead and do it? I think that this is a couple hours of work.

  1. Sphinx rendering of examples with >>> often broke; sometimes with unintelligible indentation warnings, and sometimes silently resulting with "successful" publishing with a broken render.

I'm surprised to hear this. Were we using the numpydoc sphinx extension?

I believe we fixed all of these in the 0.5+ docs. If we missed any of them, let's get issues opened for the specific examples.

I think that we generally do the convention above today. Here is one example:

from cudf.dataframe import DataFrame
df = DataFrame()

To be clear I'm not saying "imports are wrong today" I'm saying "we're encouraging users into behaviors such that us changing module structure in the future will make imports wrong."

@randerzander
Copy link
Contributor

Ah. Tail is new in branch-0.6.

I did a search for "cudf.dataframe" in the API docs page and didn't find any.

Yes, we're using the numpydoc extension

For example, if you build docs from branch-0.6 right now, there are quite a few warnings and rendering issues with the new file reader/writer docstrings copied straight from Pandas. I spent awhile fighting with them, but there's more to do.

@kkraus14
Copy link
Collaborator

kkraus14 commented Mar 5, 2019

With #1036 merged I'm closing this as it's resolved 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation proposal Change current process or code Python Affects Python cuDF API.
Projects
None yet
Development

No branches or pull requests

3 participants