-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Decorator to generate docstrings with autodetection of parameters #2635
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach! Haven't reviewed every line in detail, but just a few high level comments - simplifying that decorator would help, and I think the insert_into_docstring could be a little more explicit (see below). generate_docstring already seems to work great.
python/cuml/common/doc_utils.py
Outdated
parameters=False, | ||
return_values=False): | ||
def deco(func): | ||
@wraps(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phew, this function got a little complex... maybe there is room to break it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments to make it more clear. Didn't want to break it into further functions to avoid any potential overheads (even if small)
…ltinomialNB before (which needs to be skipped)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any ops-codeowner
files that were changed in this PR, so I'm assuming we were tagged for a review for awareness for doc builds. is that right, @dantegd? if so, this looks good. let me know if there's anything in particular you want us to look at here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael covered the more substantive stuff... Just some small changes in there. Additionally, this could use some tests (which will be kind of interesting to write!). Otherwise, looks very far along.
python/cuml/common/doc_utils.py
Outdated
@wraps(func) | ||
def docstring_wrapper(*args, **kwargs): | ||
return func(*args, **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with this suggestion too. Function wrapping adds more runtime complexity, so the __doc__
modification seems desirable.
python/cuml/common/doc_utils.py
Outdated
) | ||
|
||
if(len(to_add) > 0): | ||
docstring_wrapper.__doc__ = str(func.__doc__).format(*to_add) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
rerun tests |
@JohnZed @mdemoret PR is ready for re-review, hopefully merge before code freeze. I opened issue #2714 for the 2 features I haven't had time to implement: automatic indentation detection and named parameters for insert_into_docstring. The autodetection of indentation is not particularly important at this moment since this PR has 0 sphinx warnings at its current status (except for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for those two small vestiges of conditional generation that could be removed
python/cuml/common/doc_utils.py
Outdated
from inspect import signature | ||
|
||
# if docs need to be autogeneretad in other environments, add checks here | ||
try: | ||
from IPython import get_ipython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about interactive python? Or Colab?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in Colab, get_ipython actually works fine... still I'm not sure about making this conditional. Seems less error-prone to just generate all the time to me.
docs/source/conf.py
Outdated
@@ -20,6 +20,8 @@ | |||
import sys | |||
sys.path.insert(0, os.path.abspath('../../python')) | |||
|
|||
os.environ["BUILD_CUML_DOCSTRINGS"] = "True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe no longer necessary?
python/cuml/__init__.py
Outdated
@@ -85,6 +87,10 @@ | |||
from cuml.common.memory_utils import set_global_output_type, using_output_type | |||
|
|||
|
|||
# docstring generation | |||
|
|||
_generate_pydocstrings = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'd like to wait on another look from MD, but good to go from my perspectvie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me minus a couple small comments and suggestions. For the sorting and prepending suggestions, I can add them to #2714 and push from 0.15.
Looks good.
python/cuml/common/doc_utils.py
Outdated
def generate_docstring(X='dense', | ||
X_shape='(n_samples, n_features)', | ||
y='dense', | ||
y_shape='(n_samples, 1)', | ||
convert_dtype_cast=False, | ||
skip_parameters=[], | ||
skip_parameters_heading=False, | ||
parameters=False, | ||
return_values=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still really wish there was a way to sort the parameters by the order they are listed in the function signature. However, since most of the auto generated parameters will be at the front, maybe we can add a prepend_to_parameters=True
argument which would insert items at the front of the parameter list instead of the back? Seems like a quick compromise and will work for 90% of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea indeed and if I'm not mistaken it would work for all current instances we have currently, so will quickly add it
if(('X' in params or 'y' in params or parameters) and not | ||
skip_parameters_heading): | ||
|
||
func.__doc__ += \ | ||
'\nParameters\n----------\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to just auto detect the parameters heading and only insert it if its not found? Should always be:
Parameters
----------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was thinking about it and wanted to avoid having to look for the heading in the string for every occurrence of the decorator in the __doc__
string. So I left it to dev control with the skip_parameters_heading
, particularly since the (pretty big) majority of use cases don't skip generating the heading, what do you think?
This PR will close the following issues:
closes #1674
closes #2481
closes #2243
It is done via 2 decorators to generate common docstrings in the codebase. It'll greatly reduce the maintenance cost of the most common docstrings and reduce the amount of discrepancies, typos and outdated types in the relatively high number of estimators and methods we have now.
The docstrings are generated at import time, and the impact in import time is in the hundredths of seconds as far as I've measured, which is very much within the margin of error of cuml module import times. For functions that use either of the decorators:
generate_docstring
: Meant to be used by fit/predict/et.al methods that have the typical signatures (i.e. fit(x,y) or predict(x)). It detects the parameters and default values and generates the appropriate docstring, with som econfigurable for shapes and formats.insert_into_docstring
: More flexible but less automatic method, meant to be used by functions that use our common dense or sparse datatypes, but have many more custom parameters that are particular to the class(es) as opposed to being common in the codebase. Allows to keep our documentation up to date and correct with minimal changes by keeping our common datatypes concentrated here. NearestNeigbors is a good example of this use case.Docstrings for the functions look like this:
cuml.dask datatype version of the docstrings will come in a future update.