Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Decorator to generate docstrings with autodetection of parameters #2635

Merged
merged 36 commits into from
Aug 19, 2020

Conversation

dantegd
Copy link
Member

@dantegd dantegd commented Aug 3, 2020

This PR will close the following issues:

closes #1674
closes #2481
closes #2243

It is done via 2 decorators to generate common docstrings in the codebase. It'll greatly reduce the maintenance cost of the most common docstrings and reduce the amount of discrepancies, typos and outdated types in the relatively high number of estimators and methods we have now.

The docstrings are generated at import time, and the impact in import time is in the hundredths of seconds as far as I've measured, which is very much within the margin of error of cuml module import times. For functions that use either of the decorators:

  • generate_docstring: Meant to be used by fit/predict/et.al methods that have the typical signatures (i.e. fit(x,y) or predict(x)). It detects the parameters and default values and generates the appropriate docstring, with som econfigurable for shapes and formats.

  • insert_into_docstring: More flexible but less automatic method, meant to be used by functions that use our common dense or sparse datatypes, but have many more custom parameters that are particular to the class(es) as opposed to being common in the codebase. Allows to keep our documentation up to date and correct with minimal changes by keeping our common datatypes concentrated here. NearestNeigbors is a good example of this use case.

Docstrings for the functions look like this:

image

cuml.dask datatype version of the docstrings will come in a future update.

@dantegd dantegd added 2 - In Progress Currenty a work in progress doc Documentation Cython / Python Cython or Python issue labels Aug 3, 2020
@dantegd dantegd requested a review from a team as a code owner August 3, 2020 15:58
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

docs/source/index.rst Outdated Show resolved Hide resolved
@dantegd dantegd requested a review from a team as a code owner August 5, 2020 22:58
@dantegd dantegd changed the title [WIP] Decorator to generate docstrings with autodetection of parameters [REVIEW] Decorator to generate docstrings with autodetection of parameters Aug 6, 2020
@dantegd dantegd added the 3 - Ready for Review Ready for review by team label Aug 6, 2020
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach! Haven't reviewed every line in detail, but just a few high level comments - simplifying that decorator would help, and I think the insert_into_docstring could be a little more explicit (see below). generate_docstring already seems to work great.

python/cuml/cluster/kmeans.pyx Show resolved Hide resolved
python/cuml/common/doc_utils.py Outdated Show resolved Hide resolved
python/cuml/common/doc_utils.py Show resolved Hide resolved
parameters=False,
return_values=False):
def deco(func):
@wraps(func)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phew, this function got a little complex... maybe there is room to break it up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to make it more clear. Didn't want to break it into further functions to avoid any potential overheads (even if small)

python/cuml/svm/svr.pyx Show resolved Hide resolved
…ltinomialNB before (which needs to be skipped)
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any ops-codeowner files that were changed in this PR, so I'm assuming we were tagged for a review for awareness for doc builds. is that right, @dantegd? if so, this looks good. let me know if there's anything in particular you want us to look at here.

@dantegd dantegd requested a review from JohnZed August 12, 2020 15:52
@dantegd dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Aug 12, 2020
@mdemoret-nv mdemoret-nv self-requested a review August 12, 2020 17:07
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael covered the more substantive stuff... Just some small changes in there. Additionally, this could use some tests (which will be kind of interesting to write!). Otherwise, looks very far along.

python/cuml/svm/svr.pyx Show resolved Hide resolved
python/cuml/common/doc_utils.py Show resolved Hide resolved
Comment on lines 246 to 249
@wraps(func)
def docstring_wrapper(*args, **kwargs):
return func(*args, **kwargs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with this suggestion too. Function wrapping adds more runtime complexity, so the __doc__ modification seems desirable.

)

if(len(to_add) > 0):
docstring_wrapper.__doc__ = str(func.__doc__).format(*to_add)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Aug 12, 2020
@dantegd dantegd changed the title [REVIEW] Decorator to generate docstrings with autodetection of parameters [skip ci] [REVIEW] Decorator to generate docstrings with autodetection of parameters Aug 19, 2020
@dantegd
Copy link
Member Author

dantegd commented Aug 19, 2020

rerun tests

@dantegd dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress 4 - Waiting on Author Waiting for author to respond to review labels Aug 19, 2020
@dantegd
Copy link
Member Author

dantegd commented Aug 19, 2020

@JohnZed @mdemoret PR is ready for re-review, hopefully merge before code freeze. I opened issue #2714 for the 2 features I haven't had time to implement: automatic indentation detection and named parameters for insert_into_docstring.

The autodetection of indentation is not particularly important at this moment since this PR has 0 sphinx warnings at its current status (except for /home/galahad/miniconda3/envs/ns0812/lib/python3.8/site-packages/numpydoc/docscrape.py:418: UserWarning: Unknown section Output in the docstring of <functools._lru_cache_wrapper object at 0x7f2d7d758c10> in None. which is outside our code). That said I think its a good idea for a future improvement that'll make adding new docstrings more robust and easier.

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for those two small vestiges of conditional generation that could be removed

from inspect import signature

# if docs need to be autogeneretad in other environments, add checks here
try:
from IPython import get_ipython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about interactive python? Or Colab?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in Colab, get_ipython actually works fine... still I'm not sure about making this conditional. Seems less error-prone to just generate all the time to me.

@@ -20,6 +20,8 @@
import sys
sys.path.insert(0, os.path.abspath('../../python'))

os.environ["BUILD_CUML_DOCSTRINGS"] = "True"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no longer necessary?

@@ -85,6 +87,10 @@
from cuml.common.memory_utils import set_global_output_type, using_output_type


# docstring generation

_generate_pydocstrings = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still used?

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'd like to wait on another look from MD, but good to go from my perspectvie

Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me minus a couple small comments and suggestions. For the sorting and prepending suggestions, I can add them to #2714 and push from 0.15.

Looks good.

Comment on lines 159 to 167
def generate_docstring(X='dense',
X_shape='(n_samples, n_features)',
y='dense',
y_shape='(n_samples, 1)',
convert_dtype_cast=False,
skip_parameters=[],
skip_parameters_heading=False,
parameters=False,
return_values=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still really wish there was a way to sort the parameters by the order they are listed in the function signature. However, since most of the auto generated parameters will be at the front, maybe we can add a prepend_to_parameters=True argument which would insert items at the front of the parameter list instead of the back? Seems like a quick compromise and will work for 90% of cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea indeed and if I'm not mistaken it would work for all current instances we have currently, so will quickly add it

Comment on lines +246 to +250
if(('X' in params or 'y' in params or parameters) and not
skip_parameters_heading):

func.__doc__ += \
'\nParameters\n----------\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to just auto detect the parameters heading and only insert it if its not found? Should always be:

Parameters
----------

Copy link
Member Author

@dantegd dantegd Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about it and wanted to avoid having to look for the heading in the string for every occurrence of the decorator in the __doc__ string. So I left it to dev control with the skip_parameters_heading, particularly since the (pretty big) majority of use cases don't skip generating the heading, what do you think?

python/cuml/test/test_fit_function.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond Cython / Python Cython or Python issue doc Documentation
Projects
None yet
5 participants