Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Convert all remaining Python docstrings to pydoc and examples to doctest #2415

Closed
cjnolet opened this issue Jun 12, 2020 · 9 comments
Closed
Assignees
Labels
doc Documentation good first issue Good for newcomers

Comments

@cjnolet
Copy link
Member

cjnolet commented Jun 12, 2020

There are several places in the codebase currently that are not using the proper docstrings format [1].

It would be worth scraping through the codebase and updating these.

[1] https://numpydoc.readthedocs.io/en/latest/format.html

@cjnolet cjnolet added ? - Needs Triage Need team to review and classify doc Documentation labels Jun 12, 2020
@dantegd dantegd added good first issue Good for newcomers and removed ? - Needs Triage Need team to review and classify labels Jun 13, 2020
@yuqli
Copy link

yuqli commented Aug 13, 2020

This seems something I can handle. If nobody else is working on it, can I claim this issue?

@dantegd
Copy link
Member

dantegd commented Aug 14, 2020

@yuqli I think some of these might have been fixed by #2649 and others will soon be automatically handled by a decorator #2635 (automatically generating docstrings for fit/predict/transform/etc methods). Though that decorator doesn't touch the cuml.dask docstrings, so overall checking the status around using 2635 as a base might be a very good idea and very much welcomed!

@mdemoret-nv
Copy link
Contributor

It would be great to update some of the python docstrings to use the doctest format in this issue. Right now most of our examples look like:

Examples
--------
.. code-block:: python
    import cupy as cp
    from cuml.metrics import pairwise_distances

    X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]])
    Y = cp.array([[1.0, 0.0], [2.0, 1.0]])

    # Euclidean Pairwise Distance, Single Input:
    pairwise_distances(X, metric='euclidean')

    # Cosine Pairwise Distance, Multi-Input:
    pairwise_distances(X, Y, metric='cosine')

    # Manhattan Pairwise Distance, Multi-Input:
    pairwise_distances(X, Y, metric='manhattan')

Output:

.. code-block:: python

    array([[0.        , 2.23606798, 5.83095189],
        [2.23606798, 0.        , 3.60555128],
        [5.83095189, 3.60555128, 0.        ]])

    array([[0.4452998 , 0.13175686],
        [0.48550424, 0.15633851],
        [0.47000106, 0.14671817]])

    array([[ 4.,  2.],
        [ 7.,  5.],
        [12., 10.]])

Instead, the doctest format would be (generated by literally copying the example section into a python interactive session):

Examples
--------

>>> import cupy as cp
>>> from cuml.metrics import pairwise_distances
>>>
>>> X = cp.array([[2.0, 3.0], [3.0, 5.0], [5.0, 8.0]])
>>> Y = cp.array([[1.0, 0.0], [2.0, 1.0]])
>>>
>>> # Euclidean Pairwise Distance, Single Input:
>>> pairwise_distances(X, metric='euclidean')
array([[0.        , 2.23606798, 5.83095189],
    [2.23606798, 0.        , 3.60555128],
    [5.83095189, 3.60555128, 0.        ]])
>>>
>>> # Cosine Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='cosine')
array([[0.4452998 , 0.13175686],
    [0.48550424, 0.15633851],
    [0.47000106, 0.14671817]])
>>>
>>> # Manhattan Pairwise Distance, Multi-Input:
>>> pairwise_distances(X, Y, metric='manhattan')
array([[ 4.,  2.],
    [ 7.,  5.],
    [12., 10.]])

Which doesnt look better in Github markdown, but is a big improvement in Sphinx. This allows the user to see the output inline with the code that generated it and the >>> and output can be removed for easy Copy/Paste using the toggle button. See one of Scikit-Learns examples here: https://scikit-learn.org/stable/modules/metrics.html#metrics

@yuqli
Copy link

yuqli commented Aug 19, 2020

Thanks for the reply and the instruction. Sure I will take care of them. Thanks.

@dantegd dantegd changed the title [DOC] Convert all remaining Python docstrings to pydoc [DOC] Convert all remaining Python docstrings to pydoc and examples to doctest Aug 27, 2020
@yuqli
Copy link

yuqli commented Oct 14, 2020

Sorry for the delay. I have converted the "Example" section to doctest for some modules. Just wondering how big should a pull request be? Should I submit a PR after finishing all the changes, or should I submit a PR for say every ~200 lines of code change?

Thanks!

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@beckernick
Copy link
Member

We should revive this issue. cuGraph has an open issue for this and cuDF just did this and found quite a few issues.

It will fix broken documentation examples like the following, which will definitely fail as written (gpu_model is an invalid parameter): https://docs.rapids.ai/api/cuml/nightly/api.html?highlight=kernelexplainer#cuml.explainer.KernelExplainer

from cuml import SVR
from cuml import make_regression
from cuml import train_test_split

from cuml.explainer import KernelExplainer

X, y = make_regression(
    n_samples=102,
    n_features=10,
    noise=0.1,
    random_state=42)

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=2,
    random_state=42)

model = SVR().fit(X_train, y_train)

cu_explainer = KernelExplainer(
    model=model.predict,
    data=X_train,
    gpu_model=True)

cu_shap_values = cu_explainer.shap_values(X_test)
cu_shap_values

@dantegd
Copy link
Member

dantegd commented Mar 7, 2022

Linking PR #4618

@lowener lowener closed this as completed Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants